Big Data Challenges in Open Source Management

Big Data Challenges in Open Source Management

The distributors and creators of open source software projects must attach or maintain relevant licenses, notices or both along with their corresponding open source projects to help users consume the projects in a compliant way. However, we know that the reality on the ground is very different for developers. 

Often, due to the inherent “open” nature of the projects, the code becomes pervasive, in part or as a whole, in multiple open source projects. That creates potential compliance and security risks for consumers of these projects. The growing number of open source projects (already into millions) and their scalability (with hundreds of containers having millions of lines of code) have made open source ecosystems even more complex. This makes finding the true ingredients and origins of the open source projects a very challenging task for consumers.

BigDataChallenge.png
Unique Identifiers

Black Duck works hard to identify millions of open source projects and provide accurate and complete data on those projects. These data enable us to effectively manage compliance and security issues for our customers. In this context, classifying millions of open source projects using various features such as project names, vendors (or providers) and repositories (that are publicly available on the Web) are very crucial to accurately identify security and legal compliance issues with open source projects. Furthermore, software from one open source project often encroach into other open source projects, making it even more difficult to identify open source projects uniquely. To that end, billions of features are needed from millions of open source projects to uniquely identify them automatically through computational techniques.

Big Data Challenges

That makes it a challenging big data problem. Did you know that Black Duck applies state of the art data mining solutions to achieve the information in our knowledgebase? Black Duck Hub uses billions of features generated from millions of open source projects (collectively representing terabytes of data) to uniquely identify various projects, which eventually help in mitigating compliance and security risks.

Visit Black Duck Research to learn more about our cutting edge research and innovation projects.  Watch a 3 Minute Demo of the Black Duck Hub   

0 Comments
Sorry we missed you! We close comments for older posts, but we still want to hear from you. Tweet @black_duck_sw to continue the discussion.
0 Comments

MORE BY THIS AUTHOR

An Overview of Open Standards for IoT Communication Protocols

| Feb 1, 2017

The number of “smart” applications will only increase in 2017 as vendors seek to differentiate themselves in their various marketplaces. This point was made abundantly clear at CES recently as part of the “Trillion Dollar IoT Opportunity.” With an explosion of vendors seeking to make our homes,

| MORE >

Classification of Open Source Licenses: A Developer’s Perspective

| Dec 30, 2016

Throughout my career, I have used various Open Source libraries (software or freeware) to build software systems primarily for data management and analytics applications. I knew Open Source software may be governed by different types of licenses, but I did not necessarily know the details, in

| MORE >