Empirical Investigation of the Characteristics of Security Vulnerabilities Identified through Peer Code Review


People

Amiangshu Bosu, Jeffrey C. Carver
Department of Computer Science
University of Alabama
Tuscaloosa, AL USA
Munawar Hafiz
Department of Computer Science
Auburn University
Auburn, AL USA
Patrick Hilley
Department of Math and Computer Science
Providence College
Providence, RI USA
Derek Janni
Department of Mathematical Sciences
Lewis & Clark College
Portland, OR USA

Summary

To provide insight into the characteristics of security vulnerabilities, this study analyzed peer code review data from 10 popular Open Source Software (OSS) projects. Using a set of empirically built and validated keywords, we identified 11,043 code review requests posted in Gerrit for those 10 projects. Using a three stage manual analysis process we identified 413 potentially vulnerable code changes (VCC). Some key results include: 1) the most experienced contributors authored the majority of the VCCs, 2) while less experienced authors wrote fewer VCCs, their code changes were 1.5 to 24 times more likely to be vulnerable, 3) employees of organizations sponsoring the OSS projects are more likely to write VCCs, 4) the likelihood of a VCC increases with the number of lines changed, and 5) modified files are more likely to contain vulnerabilities than new files. Based on the results, we recommend that projects: a) create or adapt secure coding guidelines, b) create a security review team, c) ensure detailed comments during review to help knowledge dissemination, d) make incremental rather than monolithic changes, and e) develop tools to support automated security reviews.


Gerrit repositories


Project Gerrit URL
1. Android https://android-review.googlesource.com/#/q/status:open,n,z
2. Chromium https://gerrit.chromium.org/gerrit/#/q/status:open,n,z
3. Gerrit https://gerrit-review.googlesource.com/#/q/status:open,n,z
4. ITK/VTK http://review.source.kitware.com/#/q/status:open,n,z
5. MediaWiki https://gerrit.wikimedia.org/r/#/q/status:open,n,z
6. OmapZoom http://review.omapzoom.org/#/q/status:open,n,z
7. OpenAFS http://gerrit.openafs.org/#q,status:open,n,z
8. oVirt http://gerrit.ovirt.org/#/q/status:open,n,z
9. Qt https://codereview.qt-project.org/#q,status:open,n,z
10. Typo3 https://review.typo3.org/#/q/status:open,n,z



Replication package


Database schema  |  Sample database mined from Gerrit  | Sample excel file after comments inspection  |  Sample excel file after thorough inspection  |  Sample final dataset

Keywords Associated with Vulnerabilities


Vulnerability Type Keywords
Buffer Overflow buffer, overflow, stack
Format String format, string, printf, scanf
Integer Overflow integer, overflow, signedness, widthness, underflow
Cross Site Scripting cross site, CSS, XSS, htmlspecialchar (PHP only)
SQL Injection SQL, SQLI, injection
Race Condition / Deadlock race, racy, deadlock
Improper Access Control improper, unauthenticated, gain access, permission
Denial of Service / Crash denial service, DOS
Cross Site Request Forgery cross site, request forgery, CSRF, XSRF, forged
Common security, vulnerability, vulnerable, hole, exploit, attack, bypass, backdoor, crash
Common (added later) threat, expose, breach, violate, blacklist, overrun, insecure

Sample SQL Query for filtering

 
select 'typo3' as project,request_id,message from
(select message,request_id from review_comments 
UNION 
select message,request_id from inline_comments ) all_message
where  (message like '%buffer%' or message like '%integer%' 
or message like '%stack%' or message like '%printf%'
or message like '%scanf%' or (message like '%format%' and message like '%string%')
or message like '%signedness%' or message like '%widthness%'
or message like '%cross%site%' or message like '%CSS%'
or message like '%XSS%' or message like '%SQLI%'
or message like '%injection%' or message like '%CMDI%'
or message like '%denial%' 
or ((message like '%race%') and (message not like '%trace%' and message not like '%brace%'))
or message like '%racy%' or message like '%deadlock%'
or message like '%forgery%' or message like '%forged%'
or message like '%CSRF%' or message like '%XSRF%'
or ((message like '%gain%' or message like '%improper%' 
or message like '%unauthenticated%'  ) and (message like '%access%'))
or message like '%attack%' or message like '%security%'
or message like '%vulnerabl%' or message like '% hole%'
or message like '%exploit%'  or message like '%bypass%' 
or message like '%overflow%' or message like '%underflow%'
or message like '%crash%' or message like '%threat%'
or message like '%expose%' or message like '%breach%'
or message like '%violate%' or message like '%blacklist%'
or message like '%overrun%' or message like '%insecure%'
or message like '%backdoor%' or message like '%htmlspecialchar%' )
order by request_id

Experience calculation

We calculated author's experience based on number of prior code review requests during posting a vulnerable code change. For example,
in the Typo3 project, the author with ID 7036 posted the code review request with ID 6664 on 11/14/2011 13:15:43. Therefore, the author's
experience at that time was calculated using following query:
------------------------------------------------------------------------------------------------------------------------------------------
SELECT count(request_id) as author_experience FROM request_detail
    WHERE (author=9250 ) 
		and uploaded < STR_TO_DATE('11/14/2011 13:15:43','%m/%d/%Y  %H:%i:%s');
------------------------------------------------------------------------------------------------------------------------------------------

Again, the projects had different total number of review requests posted. To counterbalance those differences, we calculated the
percetile ranks of the authors and reviewers for comparing across the projects. Continuing with the previous example, the above query 
returned the experience of the author 7036 as 95 on 11/14/2011(i.e. has posted 95 code review requests prior to posting 6664). We 
calculated percentile ranks of the authors using following pseudocode.
-----------------------------------------------------------------------------------------------------------------------------------------
AUTHOR_EXPERIENCE <- 95

NUMBER_OF_AUTHORS <- select count(distinct(owner_id)) as number_author FROM request_detail
                        where uploaded < STR_TO_DATE('11/14/2011 13:15:43','%m/%d/%Y  %H:%i:%s');

AUTHOR_RANK <- SELECT count(owner_id) as author_rank FROM 
                        ( SELECT COUNT( request_id ) num_posts, owner_id
			                 FROM request_detail
			                     WHERE uploaded < STR_TO_DATE('11/14/2011 13:15:43','%m/%d/%Y  %H:%i:%s') 
                                    GROUP BY owner_id
                        ) rank_calc
			         WHERE num_posts < AUTHOR_EXPERIENCE


AUTHOR_PERCENTILE_RANK= FLOOR((AUTHOR_RANK /NUMBER_OF_AUTHORS)*100);
-----------------------------------------------------------------------------------------------------------------------------------------



Publications


  1. Bosu, A., "Characteristics of the Vulnerable Code Changes Identified through Peer Code Review", Proceedings of the 36th International Conference on Software Engineering (ACM SRC track), 2014 [To appear], Hyderabad, India
  2. Bosu, A., Carver, J., Hafiz, M., Hilley, P., and Janni, D, "When are OSS developers more likely to introduce vulnerable code changes? A case study", Proceedings of the 10th International Conference on Open Source Systems (OSS), 2014 [To appear], San Jose, Costa Rica
  3. Bosu, A., Carver, J., Hafiz, M., Hilley, P., and Janni, D, "Identifying the Characteristics of Vulnerable Code Changes: An Empirical Study", In submission to the 22nd ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE 2014)


Additional Results/Charts


The figure on the right shows the percentile rank distribution of the VCC authors. From the figure, we can see that more than 50% of the vulnerable code changes were introduced by the authors in the top 20 percentiles. Again, around 30% of the VCC authors were in the top 10 percentiles.
The figure on the left shows the percentile rank distribution of the VCC reviewers. From the figure, we can see that more than 75% of the vulnerable code changes were identified by the reviewers in the top 20 percentiles. Those reviewers were also very experienced authors and around 70% of the reviewers were among the top 20 percent authors.
Again, around 60% of the VCC reviewers were in the top 10 percent reviewers. They were also among top 50% authors.
The figure on the right shows the percentile rank distribution of the VCC files/patch-sets. From the figure, we can see that around 50% of the vulnerable files/patchsets were in top 20% large files/patch-sets. Again, around 30% of the VCC files/patch-set were in the top 10% based on size.


Acknowledgements


This research is partially supported by the NC State Science of Security lablet, and NSF-1156563.