Locus: Locating Bugs from Software Changes

Abstract

Various information retrieval (IR) based techniques have been proposed recently to locate bugs automatically at file level. However, their usefulness is often compromised by the coarse granularity of files and the lack of contextual information. To address this, we propose to locate bugs using software changes, which over offer granularity than files and provide important contextual clues for bug-fixing. We observed that bug inducing changes can facilitate the bug fixing process. For example, it helps triage the bug fixing task to the developers who committed the bug inducing changes or enables developers to fix bugs by reverting these changes. Our study further identifies that change logs and the naturally small granularity of changes can help boost the performance of IR-based bug localization. Motivated by these observations, we propose an IR-based approach Locus to locate bugs from software changes, and evaluate it on six large open source projects. The results show that Locus outperforms existing techniques at source file level localization signifcantly, specifcally the MAP and MRR have been improved for 20:1% and 20:5% on average. Besides, Locus is also capable of locating the inducing changes within top 5 for 41:0% of the bugs. The results show that Locus can signifcantly reduce the number of lines need to be scanned to locate the bug compared with existing techniques.

Dataset

  1. ZXing: ZXing ("zebra crossing") is an open-source, multi-format 1D/2D barcode image processing library implemented in Java, with ports to other languages. [Repository] [Dataset][SourceFile]
  2. AspectJ: AspectJ is an aspect-oriented programming (AOP) extension created at PARC for the Java programming language.[Repository] [Dataset][SourceFile]
  3. SWT 3.1: SWT is an open source widget toolkit for Java designed to provide efficient, portable access to the user-interface facilities of the operating systems on which it is implemented.[Repository] [Dataset][SourceFile]
  4. JDT Core 4.5: This is the core part of Eclipse's Java development tools. It contains the non-UI support for compiling and working with Java code. [Repository] [Dataset][SourceFile]
  5. PDE UI 4.4: The Plug-in Development Environment (PDE) provides tools to create, develop, test, debug, build and deploy Eclipse plug-ins, fragments, features, update sites and RCP products. [Repository] [Dataset][SourceFile]
  6. Tomcat 8.0: The Apache Tomcat® software is an open source implementation of the Java Servlet, JavaServer Pages, Java Expression Language and Java WebSocket technologies. [Repository] [Dataset][SourceFile]
For each project, the zip file contains the following six files:
bugRepository.xml: This contains the information of all bugs and the format is in consistent with the benchmark BugLocator.
bugLink.txt: This file contains the links between bugs and the fixing changes. The format is <Bug Id>\t<Commit Id>.
changeOracles.txt: This file contains the links between bugs and the inducing changes. The inducing changes are automatically identified by the SZZ algorithm . The format is <Bug Id>\t<Commit Id>\t<Commit Id>....
sourceFilesIndex.txt: This file contains the list of the source files. The format is <File Id>\t<File Location>.
fileFixSus.txt: This file contains the suspicious value of each source file towards a bug. The format is <Bug Id>\t<File Id>:<Suspicious>\t<File Id>:<Suspicious>....

Implementation

Please refer to project Locus. If you have any questions for reproducing the results or intentions for cooporation, you are welcome to drop me an email. Thanks.

Citation

 @inproceedings{wen2016locus,
	title={Locus: locating bugs from software changes},
	author={Wen, Ming and Wu, Rongxin and Cheung, Shing-Chi},
	booktitle={Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering},
	pages={262--273},
	year={2016},
	organization={ACM}
 }
This page was updated on by Ming.