Skip to content

skhatiwada01/ASE192

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

OSS Bug Localization Dataset

The dataset contains bug localization data from three open source projects:

  1. Angular.js (Project Id: 460078)
  2. Corecrl (Project Id: 30092893)
  3. Kubernetes (Project Id: 20580498)

Dataset Structure

Each dataset consists of the following folder and files:

Issues

This folder contains files that holds issues from the project in their original form. Each file contains multiple issue stored in xml format. Each issue contains its Id, IssueNumber, AssigneeId, CreatedBy UserId, Title, and Body.

Localization

This folder contains multiple folders, each representing an issue used for bug localization. Each folder contains 4 files:

BugReport.txt

This file contains the bug report text stemmed and camel-case split.

FileList.txt

This file contains the list of all files in the system. For each file, the relative path of the file and an arbitrary unique id, index, is provided. The file is in xml format.

RelevantList.txt

This file contains the list of all relevant file(s) for the issue. For each file, the relative path of the file and the index is provided. The file is in xml format.

Source.txt

This file contains the source code for the entire system. The source code belongs to the version of the project where the fix was applied. Each source code file is represented as a single line in Source.txt. The text are stemmed and camel-case split. The file is in following format:

FileIndex1##word11,word12,word13,...,word1n

FileIndex2##word21,word22,word23,...,word2m

AllCommits.txt

This file contains all commits made in the project. Each commit info contains the commit's Sha, data and time the commit was made, and the UserId of the committer. The file is in xml format.

CommitChanges.txt

This file contains the list of source code files modified in each commit. It contains the source file's relative file path and it's status. The file is in xml format.

IssueCommits.txt

This file contains the issue numbers and the commit's sha associated with that issue. The file is in xml format.

Tags.txt

This file contains every tag name, color, and the url. Each tag is assigned an arbitrary, unique id. The file is in xml format.

IssueTags.txt

The file contains theissue numbers and the tags associated with each issue. The file is in the following format:

IssueNumber1##Tag11,Tag12,Tag13,...,Tag1n

IssueNumber2##Tag21,Tag22,Tag23,...,Tag2m

Dataset Creator

The downloader.exe download all files required for creating bug localization dataset. The downloader.exe contains the following main sections:

Github Login

This section logs user to Github account. A Github account is required to download project information off Github.

Repository Info

This section sets the github repository and the directory where the repository is to be downloaded.

Action

This section provides different functionality to download tags, bug reports, source codes, commits, and to create the dataset. The Bug Localization button opens a form, Localization Form, to run IR methods for bug localization.

Contents

The downloader.exe is in the exe folder.

The code implementation is in the src folder.

The dataset for the projects can be downloaded from: http://seel.cse.lsu.edu/data/ase192.zip

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages