Artificial Intelligence (AI) is revolutionizing the way we live, work and think. In recent times, computing machines have become intelligent enough to recognize real world objects, recognize speech, learn programs, paint like an artist, or even dream like humans.
Security and reliability of software systems, which is enormously important to our modern economy, is also benefiting from advances in AI research. Although open source is no more or less secure than other software, given the availability of source codes, detection and exploitation of security vulnerabilities in open source presents an easier target. Figure 1 below reveals the number of vulnerabilities reported in National Vulnerability Database (NVD). Note that there are many more vulnerabilities that never make it to NVD, a topic that I’ll address in my FLIGHT 2017 presentation with Nathan Zhang, a data scientist on my team.
Figure 1: Distribution of vulnerabilities reported in NVD by year
The recent exploitation of vulnerability (CVE-2017-5638) in Apache Struts reminds us of severe consequences that enterprises (as well individuals) face when they don't secure and manage the open source in their applications. As various open source solutions expand to different industries and markets, the timely discovery and mitigation of publicly known vulnerabilities has become increasingly important. Unfortunately, the security experts who often discover these vulnerabilities (with the intention of mitigating the risks) are finding it extremely difficult to analyze the vulnerabilities. For instance, to determine various threat levels and exploitability factors, security experts are often required to determine: (1) access/authentication complexity, (2) confidentiality, integrity and availability impacts of vulnerabilities, and (3) numerical scores to quantify the items mentioned in (1) and (2). NVD is one of the several good sources for vulnerability assessment methodologies.
Artificial Intelligence in Vulnerability Analysis?
Overall, vulnerability analysis is a time-consuming task, which unfortunately must be done in a time-sensitive manner without compromising with the essential steps of analysis needed to mitigate the risks in an effective way. Unfortunately, this situation is becoming worse due to the increased number of vulnerabilities that are being discovered (recall Figure 1). On a given day, our security experts at Black Duck could end up analyzing tens of vulnerabilities to make the consumers of affected open source solutions more secure. In this context, we are using AI solutions to help our security experts conduct vulnerability analysis at a large scale quickly and accurately. If computing machines (powered by AI solutions) can do this analysis independently and automatically it will be be incredibly time-effective and cost-effective. While a worthy goal, we first need to understand where the challenges lie.
Training Computing Machines
An important part of AI driven security solutions is training computing machines with real world datasets. At Black Duck Research, we are fortunate to have the world’s largest database of open source software, supplemented by important pieces of meta data such as publicly known vulnerabilities, licenses, vendor information, and so on. Our data scientists and security experts are utilizing these data to build the next generation of open source security solutions. In this context, training a computing machine is very important. To train a machine, you essentially need to provide a relevant and sufficient amount of data to your algorithms so that they can continue to learn from the evolving data as new open source solutions become available and new vulnerabilities are discovered.
The Ever Evolving Data in Open Source
These constantly evolving data pose several challenges that need to be overcome before we can realize effective AI driven security solutions. Many of these challenges stem from the fact that open source projects entail large volumes of structured and unstructured data that are difficult to find, manage and analyze. We are applying various Data Mining, Machine Learning and Natural Language Processing solutions to solve some of the most challenging problems related to open source security. Following are some examples of our AI driven solutions.
- Automatically map publicly known vulnerabilities to open source projects (which could be known differently within various open source and security communities).
- Automatically conduct a preliminary analysis of vulnerabilities to determine their severity and importance so that vulnerability analysis can be prioritized. Our AI driven solution evaluates these risks in the context of applications and their business impact.
- Automatically find relationships between various Open Source projects that are detected within your code. Our AI driven solution helps in a better understanding of your code dependencies to mitigate security and compliance risks at the file and directory level.
- Automatically analyze hundreds of legal documents (licenses, terms of services, privacy statements, privacy laws such as HIPAA, DMCA, and others) to determine the compliance risks.
Essentially, AI cannot fully automate the process of open source security or open source risk management. Nonetheless, we've seen success in experimenting and implementing various AI driven solutions that are stepping stones toward a fully automated open source risk management solution.
Do you want to know more about our AI based approaches or get involved in our research projects? Contact us for more details.