The dataset contains bug localization data from three open source projects:
- Angular.js (Project Id: 460078)
- Corecrl (Project Id: 30092893)
- Kubernetes (Project Id: 20580498)
Each dataset consists of the following folder and files:
This folder contains files that holds issues from the project in their original form. Each file contains multiple issue stored in xml format. Each issue contains its Id, IssueNumber, AssigneeId, CreatedBy UserId, Title, and Body.
This folder contains multiple folders, each representing an issue used for bug localization. Each folder contains 4 files:
This file contains the bug report text stemmed and camel-case split.
This file contains the list of all files in the system. For each file, the relative path of the file and an arbitrary unique id, index, is provided. The file is in xml format.
This file contains the list of all relevant file(s) for the issue. For each file, the relative path of the file and the index is provided. The file is in xml format.
This file contains the source code for the entire system. The source code belongs to the version of the project where the fix was applied. Each source code file is represented as a single line in Source.txt. The text are stemmed and camel-case split. The file is in following format:
FileIndex1##word11,word12,word13,...,word1n
FileIndex2##word21,word22,word23,...,word2m
This file contains all commits made in the project. Each commit info contains the commit's Sha, data and time the commit was made, and the UserId of the committer. The file is in xml format.
This file contains the list of source code files modified in each commit. It contains the source file's relative file path and it's status. The file is in xml format.
This file contains the issue numbers and the commit's sha associated with that issue. The file is in xml format.
This file contains every tag name, color, and the url. Each tag is assigned an arbitrary, unique id. The file is in xml format.
The file contains theissue numbers and the tags associated with each issue. The file is in the following format:
IssueNumber1##Tag11,Tag12,Tag13,...,Tag1n
IssueNumber2##Tag21,Tag22,Tag23,...,Tag2m
The downloader.exe download all files required for creating bug localization dataset. The downloader.exe contains the following main sections:
This section logs user to Github account. A Github account is required to download project information off Github.
This section sets the github repository and the directory where the repository is to be downloaded.
This section provides different functionality to download tags, bug reports, source codes, commits, and to create the dataset. The Bug Localization button opens a form, Localization Form, to run IR methods for bug localization.
The downloader.exe is in the exe folder.
The code implementation is in the src folder.
The dataset for the projects can be downloaded from: http://seel.cse.lsu.edu/data/ase192.zip