PhD Proposal: Data-driven techniques for estimating vulnerability severity
Security vulnerabilities have been puzzling the community for decades. As highlighted by the 2017 WannaCry and NotPetya ransomware campaigns, which resulted in billions of dollars of losses, weaponized exploits against vulnerabilities remain one of the main tools for cybercrime. The upward trend in the number of vulnerabilities reported annually and technical challenges in the way of remediation lead to large exposure windows for the vulnerable populations. However, due to sustained efforts in application and operating system security to hinder exploitation, only a small fraction of vulnerabilities are exploited in real world attacks. Existing standards for severity assessments err on the side of caution and overestimate the risk posed by vulnerabilities, further affecting remediation efforts that often rely on prioritization. For more precise assessments, severity metrics need to take into account observations from real-world exploits.
We frame vulnerability severity assessment as the task of estimating the likelihood of observing weaponized exploits. Within this framework, we investigate opportunities for early detection and prediction of exploits based on publicly available vulnerability information. Our analysis aims to answer three questions: What features can be used for the task? Why are they effective? When can such predictions be trusted?To answer the first two questions, we begin by analyzing the security community on social media platforms and investigating its ability to capture information about which vulnerabilities are exploited in the wild. Next, we show how functionalities from proof-of-concept (PoC) exploits — program fragments released during vulnerability disclosure to aid reproducibility — can help discover exploit variants and indicate weaponization. These techniques allow us to provide timely predictions about the existence of exploits and to improve the precision of severity-based remediation recommendations.To answer the third question, we analyze the risks associated with data-driven security inferences, specifically the reliance on untrustworthy data sources. We evaluate the resilience of predictors against causative, training-time attacks, when modeling realistic adversarial capabilities. Our analysis across three prediction tasks highlights possible strategies for future defenses.In order to better understand the risk posed by published PoC exploits, as future work we propose a technique to gain a semantic understanding of functionalities implemented in exploits, by monitoring their runtime behavior and interactions with vulnerabilitiesExamining Committee:
Chair: Dr. Tudor Dumitras Dept rep: Dr. Ashok Agrawala Members: Dr. Joseph JaJa Dr. Tom Goldstein Dr. Jiyong Jang