[go: up one dir, main page]

MSR 2025
Mon 28 - Tue 29 April 2025 Ottawa, Ontario, Canada
co-located with ICSE 2025

Accepted Papers

Title
A Dataset of Contributor Activities in the NumFocus Open-Source Community
Data and Tool Showcase Track
Pre-print
A Dataset of Software Bill of Materials for Evaluating SBOM Consumption Tools
Data and Tool Showcase Track
CARDS: A collection of package, revision, and miscelleneous dependency graphs
Data and Tool Showcase Track
Pre-print
CodeFix-Bench: A Large-scale Benchmark for Learning to Localize Code Changes from Issue Reports
Data and Tool Showcase Track
CoDocBench: A Dataset for Code-Documentation Alignment in Software Maintenance
Data and Tool Showcase Track
CoMRAT: Commit Message Rationale Analysis Tool
Data and Tool Showcase Track
Media Attached
CoPhi - Mining C/C++ Packages for Conan Ecosystem Analysis
Data and Tool Showcase Track
Pre-print
CoUpJava: A Dataset of Code Upgrade Histories in Open-Source Java Repositories
Data and Tool Showcase Track
DataTD: A Dataset of Java Projects Including Test Doubles
Data and Tool Showcase Track
DPy: Code Smells Detection Tool for Python
Data and Tool Showcase Track
Pre-print
Drawing Pandas: A Benchmark for LLMs in Generating Plotting Code
Data and Tool Showcase Track
Pre-print
E2EGit: A Dataset of End-to-End Web Tests in Open Source Projects
Data and Tool Showcase Track
EvoChain: A Framework for Tracking and Visualizing Smart Contract Evolution
Data and Tool Showcase Track
FormalSpecCpp: A Dataset of C++ Formal Specifications Created Using LLMs
Data and Tool Showcase Track
GHALogs: Large-scale dataset of GitHub Actions runs
Data and Tool Showcase Track
GitProjectHealth: an Extensible Framework for Git Social Platform Mining
Data and Tool Showcase Track
HaPy-Bug - Human Annotated Python Bug Resolution Dataset
Data and Tool Showcase Track
HyperAST: Incrementally Mining Large Source Code Repositories
Data and Tool Showcase Track
Pre-print
ICVul: A Well-labeled C/C++ Vulnerability Dataset with Comprehensive Metadata and VCCs
Data and Tool Showcase Track
JPerfEvo: A Tool for Tracking Method-Level Performance Changes in Java Projects
Data and Tool Showcase Track
Jupyter Notebook Activity Dataset
Data and Tool Showcase Track
MaLAware: Automating the Comprehension of Malicious Software Behaviours using Large Language Models (LLMs)
Data and Tool Showcase Track
MARIN: A Research-Centric Interface for Querying Software Artifacts on Maven Repositories
Data and Tool Showcase Track
Pre-print
Mining Bug Repositories for Multi-Fault Programs
Data and Tool Showcase Track
Myriad People. Open Source Software for New Media Arts
Data and Tool Showcase Track
OpenMent: A Dataset of Mentor-Mentee Interactions in Google Summer of Code
Data and Tool Showcase Track
OSPtrack: A Labeled Dataset Targeting Simulated Execution of Open-Source Software
Data and Tool Showcase Track
OSS License Identification at Scale: A Comprehensive Dataset Using World of Code
Data and Tool Showcase Track
pyMethods2Test: A Dataset of Python Tests Mapped to Focal Methods
Data and Tool Showcase Track
Pre-print
RefExpo: Unveiling Software Project Structures through Advanced Dependency Graph Extraction
Data and Tool Showcase Track
RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering
Data and Tool Showcase Track
SCRUBD: Smart Contracts Reentrancy and Unhandled Exceptions Vulnerability Dataset
Data and Tool Showcase Track
SnipGen: A Mining Repository Framework for Evaluating LLMs for Code
Data and Tool Showcase Track
SPRINT: An Assistant for Issue Report Management
Data and Tool Showcase Track
TerraDS: A Dataset for Terraform HCL Programs
Data and Tool Showcase Track
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest
Data and Tool Showcase Track
Pre-print
Under the Blueprints: Parsing Unreal Engine’s Visual Scripting at Scale
Data and Tool Showcase Track
Wild SBOMs: a Large-scale Dataset of Software Bills of Materials from Public Code
Data and Tool Showcase Track

Call for Papers

The MSR Data and Tool Showcase Track aims to actively promote and recognize the creation of reusable datasets and tools that are designed and built not only for a specific research project but for the MSR community as a whole. These datasets and tools should enable other practitioners and researchers to jumpstart their research efforts, and also allow the reproducibility of earlier work. The MSR Data and Tool Showcase papers can be descriptions of datasets or tools built by the authors that can be used by other practitioners or researchers, and/or descriptions of the use of tools built by others to obtain specific research results.

MSR’25 Data and Tool Showcase Track will accept two types of submissions:

  1. Data showcase submissions are expected to include:

    • a description of the data source,
    • a description of the methodology used to gather the data (including provenance and the tool used to create/generate/gather the data, if any),
    • a description of the storage mechanism, including a schema if applicable,
    • if the data has been used by the authors or others, a description of how this was done including references to previously published papers,
    • a description of the originality of the dataset (that is, even if the dataset has been used in a published paper, its complete description must be unpublished) and similar existing datasets (if any),
    • ideas for future research questions that could be answered using the dataset,
    • ideas for further improvements that could be made to the dataset, and
    • any limitations and/or challenges in creating or using the dataset.
  2. Reusable Tool showcase submissions are expected to include:

    • a description of the tool, which includes the background, motivation, novelty, overall architecture, detailed design, and preliminary evaluation of the tool, as well as the link to download or access the tool,
    • a description of the design of the tool, and how to use the tool in practice,
    • clear installation instructions and example datasets that allow the reviewers to run the tool,
    • if the tool has been used by the authors or others, a description of how the tool was used, including references to previously published papers,
    • ideas for future reusability of the tool, and
    • any limitations of using the tool.

The dataset or tool should be made available at the time of submission of the paper for review but will be considered confidential until publication of the paper. The dataset or tool should include detailed instructions about how to set up the environment (e.g., requirements.txt), how to use the dataset or tool (e.g., how to import the data or how to access the data once it has been imported, how to use the tool with a running example). At a minimum, upon publication of the paper, the authors should archive the data or tool on a persistent repository that can provide a digital object identifier (DOI) such as zenodo.org, figshare.com, Archive.org, or institutional repositories. In addition, the DOI-based citation of the dataset or the tool should be included in the camera-ready version of the paper. GitHub provides an easy way to make source code citable (with third tools and with a CITATION file).

Data and Tool showcase submissions are not: * empirical studies, or * datasets that are based on poorly explained or untrustworthy heuristics for data collection, or results of trivial application of generic tools.

If custom tools have been used to create the dataset, we expect the paper to be accompanied by the source code of the tools, along with clear documentation on how to run the tools to recreate the dataset. The tools should be open source, accompanied by an appropriate license; the source code should be citable, i.e., refer to a specific release and have a DOI. If you cannot provide the source code or the source code clause is not applicable (e.g., because the dataset consists of qualitative data), please provide a short explanation of why this is not possible.

Evaluation Criteria

The Review Criteria for the Data/Tool Showcase submissions are as follows:

  • value, usefulness, and reusability of the datasets or tools.
  • quality of the presentation.
  • clarity of relation with related work and its relevance to mining software repositories.
  • availability of the datasets or tools.

Important Dates

  • Abstract Deadline: November 29, 2024
  • Paper Deadline: December 2, 2024
  • Author Notification: January 12, 2025
  • Camera Ready Deadline: February 5, 2025

Awards

The best dataset/tool paper(s) will be awarded with a Distinguished Paper Award.

Submission

Submit your paper (maximum 4 pages, plus 1 additional page of references) via the HotCRP submission site: https://msr2025-data-tool.hotcrp.com/.

Submitted papers will undergo single-anonymous peer review. We opt for a single-anonymous peer review (i.e., authors’ names should be listed on the manuscript, as opposed to the double-anonymous peer review of the main track) due to the requirement above to describe the ways how data has been used in the previous studies, including the bibliographic reference to those studies. Such a reference is likely to disclose the authors’ identity.

To make research datasets and research software accessible and citable, we further encourage authors to attend to the FAIR rules, i.e., data should be: Findable, Accessible, Interoperable, and Reusable.

Submissions must conform to the IEEE formatting instructions IEEE Conference Proceedings Formatting Guidelines (title in 24pt font and full text in 10pt type, LaTeX users must use \documentclass[10pt,conference]{IEEEtran} without including the compsoc or compsocconf options).

Papers submitted for consideration should not have been published elsewhere and should not be under review or submitted for review elsewhere for the duration of consideration. ACM plagiarism policies and procedures shall be followed for cases of double submission. The submission must also comply with the IEEE Policy on Authorship. Please read the ACM Policy on Plagiarism, Misrepresentation, and Falsification and the IEEE - Introduction to the Guidelines for Handling Plagiarism Complaints before submitting.

Upon notification of acceptance, all authors of accepted papers will be asked to complete a copyright form and will receive further instructions for preparing their camera-ready versions. At least one author of each paper is expected to register and present the results at the MSR 2025 conference. All accepted contributions will be published in the conference’s electronic proceedings.