Big Data Challenges in Open Source Management

Big Data Challenges in Open Source Management

The distributors and creators of open source software projects must attach or maintain relevant licenses, notices or both along with their corresponding open source projects to help users consume the projects in a compliant way. However, we know that the reality on the ground is very different for developers. 

Often, due to the inherent “open” nature of the projects, the code becomes pervasive, in part or as a whole, in multiple open source projects. That creates potential compliance and security risks for consumers of these projects. The growing number of open source projects (already into millions) and their scalability (with hundreds of containers having millions of lines of code) have made open source ecosystems even more complex. This makes finding the true ingredients and origins of the open source projects a very challenging task for consumers.

BigDataChallenge.png
Unique Identifiers

Black Duck works hard to identify millions of open source projects and provide accurate and complete data on those projects. These data enable us to effectively manage compliance and security issues for our customers. In this context, classifying millions of open source projects using various features such as project names, vendors (or providers) and repositories (that are publicly available on the Web) are very crucial to accurately identify security and legal compliance issues with open source projects. Furthermore, software from one open source project often encroach into other open source projects, making it even more difficult to identify open source projects uniquely. To that end, billions of features are needed from millions of open source projects to uniquely identify them automatically through computational techniques.

Big Data Challenges

That makes it a challenging big data problem. Did you know that Black Duck applies state of the art data mining solutions to achieve the information in our knowledgebase? Black Duck Hub uses billions of features generated from millions of open source projects (collectively representing terabytes of data) to uniquely identify various projects, which eventually help in mitigating compliance and security risks.

Visit Black Duck Research to learn more about our cutting edge research and innovation projects. Watch a 3 Minute Demo of the Black Duck Hub  

0 Comments
Sorry we missed you! We close comments for older posts, but we still want to hear from you. Tweet @black_duck_sw to continue the discussion.
0 Comments

MORE BY THIS AUTHOR

A Methodology for Quantifying Risks from Web Services

| Jun 27, 2017

In my previous blogs, I explored the challenges of managing Web Services in applications, including the ones that use Open Source. In this blog, I have described a methodology that our research team has developed to quantify the risks that come with using Web Services that make calls to various

| MORE >

Security, Compliance Risks in Web Services in Open Source

| May 15, 2017

REST and SOAP based Web Services have become a new way of building and delivering software systems. In particular, mobile and cloud applications, social networking websites, and automated business processes are among the key technological drivers that are fueling the growth of RESTful APIs. At

| MORE >

An Overview of Open Standards for IoT Communication Protocols

| Feb 1, 2017

The number of “smart” applications will only increase in 2017 as vendors seek to differentiate themselves in their various marketplaces. This point was made abundantly clear at CES recently as part of the “Trillion Dollar IoT Opportunity.” With an explosion of vendors seeking to make our homes,

| MORE >