0% found this document useful (0 votes)

12 views10 pages

HAv2 PR and Problem Statement

The document outlines the process for reviewing a GitHub pull request (PR) and creating a proper problem statement. It emphasizes the importance of distinguishing between the original issue and the changes made in the PR, ensuring the problem statement accurately describes what was broken before the fix. Additionally, it provides guidelines for categorizing issues and evaluating their specificity to aid in effective problem statement creation.

Uploaded by

dineshbin334

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views10 pages

HAv2 PR and Problem Statement

Uploaded by

dineshbin334

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 10

Let's start with the first key step, reviewing the pull request and the initial

problem statement. Your first task is to check whether there's a linked issue in
the GitHub PR or the problem description. If there is, that makes things a lot
easier.

In many cases, you can reuse that issue directly. But if there's only a pull
request or a commit message, you'll likely need to rewrite the summary into a
proper problem statement. Once you know whether you're reusing or rewriting, make
sure to read the original problem statement carefully.

Then read the golden patch summary in the GitHub pull request. This summary usually
describes what was changed in the code. After that, ask yourself, does the problem
statement describe what was broken before the fix? If it does, you're probably good
to go.

If it only explains what the PR did, then it needs to be rewritten. This is

important because the goal is to describe the original issue, not the solution. The
rewritten version should focus on what went wrong, or what was missing before the
patch, not just what was added or changed.

in detail:

Step 1: Review the GitHub PR and Initial Problem Statement

In this step, you will do the following:

Read the initial problem statement and the golden patch in the GitHub PR (the code
changes that resolved the issue).
Use the golden patch summary to find the edited files that are part of the golden
patch.
The diff are only the changes in the commit related to the files in the golden
patch.

Example:

Step 3: Categorize the Issue on Several Axes

You will categorize the original issue on the following dimensions:

Good vs. Bad Issue
Specificity Rating (This is only for reviewers now)
Issue Type
Knowledge Tag

⚠️A task is considered a bad issue if it meets one of the following criteria:⚠️

An issue falls into this category when:

The initial problem statement contains foreign languages.

If you select any category of bad issue, you will have to justify it and end the
task.

Specificity Rating (This is only for reviewers now May 3)

Update for May 15 - Added Explicit Criteria to Define Specificity
Contributors should rate the specificity of the original problem statement (if
available) or PR description (if no issues are available). The specificity rating
is a measure of how much detail the problem statement/PR contains and how clear the
desired solution is. The desired solution is defined by what
behaviors/functionalities are tested for by the unit test suite.
Imagine that you are an experienced software engineer who has been instructed to
create a PR that successfully resolves the above GitHub issue. You have full access
to the codebase and can see the issue description above. However, you are not able
to ask for clarification and would need to work exclusively from this information.
What rating would you give this issue?
5/16 Update - A well specified issue or requirement should contain all of the
information that is needed to pass the unit tests associated with the test patch of
the PR.
Specificity ratings apply to both the problem statement and the requirement. When
evaluating the specificity of the requirement, you should consider the problem
statement to be included as well (i.e., the specificity is the superset of the
problem statement and requirement).
The rubric for specificity is below:
Criteria
Description
1
2
3
4
5
Issue Specificity
Imagine that you are an experienced software engineer who has been instructed to
create a PR that successfully resolves the above GitHub issue. You have full access
to the codebase, and can see the issue description as it is above. But you are not
able to ask for clarification and would need to work exclusively from this
information.
Would you have enough information to create a solution that passes all the unit
tests in the test patch?
It is almost impossible to understand what is being asked without further
information.
The statement does not define what is needed to pass the unit tests and it would be
impossible for an agent to create a valid solution that passes unit tests without
additional information.
The issue is vague and difficult to understand but the agent should have some idea
of what needs to be implemented to solve the problem.
The statement misses critical information needed to pass at least one unit test.
This information could not be reasonably assumed from the code base.

The issue is vague and there is room for ambiguity. It is unclear what a successful
solution would look like.
The statement provides a high level overview of the behavior needed to pass and an
agent might reasonably infer a valid solution to the unit tests, but enough details
are vague where the agent may address the problem without passing the unit tests.
There are some blanks to fill in, but there is a sensible interpretation of what is
required for a solution.
The statement does not explicitly define all configurations and behaviors needed to
pass the unit test but the proper implementation can be inferred from the code
base.

The issue is well-specified and it is clear what is required for a successful

solution.
The statement explicitly includes specifications on expected behavior and if this
expected behavior is implemented, it will satisfy all unit tests.

Issue Classification

Category
Subcategory
Tags
Bug
Critical
System crash, data loss, security breach, blocking functionality

Major
Significant impact on core functionality but with workarounds

Minor
Non-critical issues with minimal impact on users

Regression
Previously working feature now broken

Performance
Slowness, memory leaks, resource consumption

Security
Vulnerabilities, authentication issues, data exposure

Data
Data corruption, validation errors, inconsistency

UI/UX
Visual glitches, usability problems, layout issues

Compatibility
Browser/device/version compatibility issues

Edge Case
Rare conditions or unusual input handling

Integration
Issues with external systems, APIs, or third-party services
Feature
Core
Essential application functionality

UI/UX
User interface and experience improvements

API
API development, endpoints, or documentation

Integration
External system connections or third-party services

Performance
Speed and resource optimization

Accessibility
A11y improvements for diverse users

Localization
i18n and l10n support for multiple regions/languages

Analytics
Metrics, reporting, dashboards, and insights

Security
Authentication, authorization, and data protection

Customization
User preferences, themes, and configuration options

Mobile
Mobile-specific functionality or responsive design
Enhancement
Performance
Speed, memory usage, resource optimization

Code Quality
Readability, maintainability, documentation

Refactoring
Restructuring without changing behavior

Testing
Test coverage, quality, and automation improvements

UI/UX
User interface and experience refinements

Accessibility
A11y compliance and improvements

Technical Debt
Addressing accumulated implementation shortcuts

DevOps
CI/CD, deployment, monitoring improvements

Security
Hardening existing features, compliance

Scalability
Handling increased load, users, or data volume

Documentation
Improving user guides, API docs, or code comments

Knowledge Category

Knowledge Category
Core Development
Frontend

Backend

Full Stack

Web

Mobile

Desktop

Database
API
Infrastructure & Operations
Infrastructure

DevOps

Cloud

Networking
Security
Security

Authentication/Authorization
Data & AI
Data Science

ML/AI
Design & UX
UI/UX

Accessibility
Quality & Testing
QA/Testing

Performance
Specialized Domains
Game Development

AR/VR

IoT/Embedded

Blockchain

Step 4: Problem Statement Creation

In this step, you will do the following:

Determine whether the initial problem statement describes the actual issue or if it
only summarizes the changes in the codebase (golden patch).

In some cases, you will be able to copy a linked issue as the problem statement.
This is encouraged.

Example
PR Link: https://github.com/deepset-ai/haystack/pull/8729
PR Description: fixes Document Splitter always returns 1 document for
split_type="passage" in pdfs #8491
Linked Issue: https://github.com/deepset-ai/haystack/issues/8491

Good Problem Statement

# Document Splitter always returns 1 document for split_type="passage" in pdfs

Describe the bug

When using Document Splitter with pdf and split_type="passage", the result is
always one document. This is using pypdf.

**Expected behavior**
The understanding I have is that it splits based on at least two line breaks \n\n

**Additional context**
When I tested using plain text it seems to be splitting correctly

**To Reproduce**

dir = '...'
files = [
{"filename": "rules.pdf", "meta": {"split_by" : "passage", "split_length":1,
"split_overlap":0, "split_threshold":0}},
{"filename": "rules.txt", "meta": {"split_by" : "passage", "split_length":1,
"split_overlap":0, "split_threshold":0}}
]
for file in files:
# set the filepath
file_path = Path(dir) / file["filename"]
router_res = file_type_router.run(sources=[file_path])
txt_docs = []
if 'text/plain' in router_res:
txt_docs = text_file_converter.run(sources=router_res['text/plain'])
elif 'application/pdf' in router_res:
txt_docs = pdf_converter.run(sources=router_res['application/pdf'])
elif 'text/markdown' in router_res:
txt_docs = markdown_converter.run(sources=router_res['text/markdown'])
document_splitter = DocumentSplitter(
split_by=file['meta']['split_by'],
split_length=file['meta']['split_length'],
split_overlap=file['meta']['split_overlap'],
split_threshold=file['meta']['split_threshold']
)
splitter_res = document_splitter.run([txt_docs['documents'][0]])
print(len(splitter_res['documents']))

**System:**

OS: Mac OS 14.6.1

GPU/CPU: CPU
Haystack version (commit or version number): 2.6.0
DocumentStore: Chromadb
Splitter: DocumentSplitter

A good problem statement explains what was wrong before the fix, while a golden
patch description only describes the changes made in the PR. If the problem
statement only summarizes the fix instead of identifying the issue, it must be
rewritten.

A good problem statement should try to match the specificity of the original PR. If
the original PR was had a specificity score of 2, your problem statement should
also match a specificity of 2.
DO NOT provide more context than the original description has provided. That
context will be provided in the requirements section.

Example

Original PR
Fixed a bug when a user double clicks the Share button

❌Bad Example
**Title: Double Click Bug **

**Description**
When a user double clicks the share button, an error occurs causing the share
screen to render twice, crashing the entire app

✅Good Example
**Title: Double Click Bug **

**Description**
When a user double clicks the share button a bug occurs.

How to Evaluate the Problem Statement: UPDATE MAY 8

Determine if there are linked issues
In some problems, there will be one or more linked issue URLs listed. When these
are listed, you can review the issue statements and select the best issue
statement.
If the best linked issue statement is accurate and free from misleading information
about the golden patch that could misguide the agent, copy and paste it directly
into the problem statement box.
If the linked issue statement contains factual errors or inaccuracies, you must fix
them before copying and pasting—ensure the level of detail remains the same without
adding more specificity.

Analyze the initial problem statement:

Does it describe what was broken before the fix?
Or does it only describe the code change?
If the problem statement describes what was changed in the PR rather than what was
wrong with the code before the fix, it must be rewritten.

Examine the golden patch (code fix):

Does the problem statement align with what was fixed?
Or does it need a rewrite to describe the issue?

Problem Statement Formatting

Problem statements should be formatted like real GitHub issues with titles and
issue headings that are marked as issue headings using clear formatting.

To identify good formatting, you can use the following sources:

Copy paste from the GitHub repository’s issue template which typically can be
found in the .github/Issue Template folder
Use the template formatting provided by the synthetic prompt
The below formatting provides a high level template of what a good issue should
look like, though there is considerable flexibility:

Title # E.g., Fix bug where 1GB+ uploads casue a system crash

Field Name 1 OR ## Field Name 1

Text

Field Name 2 OR ## Field Name 2

Text

etc.

Examples of Good vs. Bad Problem Statements

✅ Good Problem Statement (Describes the Issue):

"We are currently using an outdated version of the Google Analytics API, and we
should upgrade to the latest version (v4)."

❌ Bad Problem Statement (Golden Patch Description, Needs Rewriting):

"This PR adds support for the Google Analytics v4 API."
Another example of a golden patch description that needs rewriting:
" - Add TryIt to index.ts - Add test"

How to Rewrite the Problem Statement

Using the rewritten model-generated problem statements as a starting point, rewrite

the initial message in the form of a problem statement describing an issue to be
solved.

The rewritten problem statement must:

Be faithful to the original commit/pr/issue without adding or removing information.

Maintain the same level of specificity as the original (e.g., if the original lacks
detail, the rewrite should not introduce additional details).
Identify what was broken or missing before the fix.
Avoid simply restating what was changed in the PR.
Focus on why the issue mattered—what impact did it have?
Determine what requirement is necessary for the problem to be considered resolved.
Write a clear and concise issue description.
Do not mention how the problem was fixed.
Ensure the issue statement is neutral and factual—do not assume intent.
Include any specific requirements necessary for the fix.
If the initial problem statement was written as a golden patch description, it must
be rewritten into a clear issue statement that includes a requirement.
Original Golden Patch Description (Needs Rewriting):

❌ "- Add TryIt to index.ts - Add test"

Finalized Problem Statement Rewrite (Corrected Issue Description with Requirement):

✅ "The try function in Radash cannot be imported directly due to JavaScript’s
reserved keyword conflict. This forces users to manually alias it, adding an
unnecessary step. The only alternative is importing the entire library, which is
not always ideal. Since ‘try’ is a known reserved word, the library should handle
this internally to avoid requiring manual aliasing."

SWE Java Task
No ratings yet
SWE Java Task
5 pages
HAv2 Review The Test Coverage
No ratings yet
HAv2 Review The Test Coverage
4 pages
(External) Hyperion Augmentations - CB Specifications v3
No ratings yet
(External) Hyperion Augmentations - CB Specifications v3
4 pages
HAv2 Write The Requirement
No ratings yet
HAv2 Write The Requirement
5 pages
Revised HAv2 Create or Rewrite The Problem Statement
No ratings yet
Revised HAv2 Create or Rewrite The Problem Statement
2 pages
0-To-PyPI - Developing Open Source Z - OS Python Packages - 26501 - 0toPyPIDevelopingOpenSourcezOSPythonPackages
No ratings yet
0-To-PyPI - Developing Open Source Z - OS Python Packages - 26501 - 0toPyPIDevelopingOpenSourcezOSPythonPackages
38 pages
03 Bug Reporting & JIRA
No ratings yet
03 Bug Reporting & JIRA
15 pages
Revised Hyperion Augmentation v2 Intro
No ratings yet
Revised Hyperion Augmentation v2 Intro
2 pages
Common Errors
No ratings yet
Common Errors
5 pages
Early 2025 AI Experienced OS Devs Study-20
No ratings yet
Early 2025 AI Experienced OS Devs Study-20
2 pages
How To Open Source-Cheatsheets
No ratings yet
How To Open Source-Cheatsheets
6 pages
Reviewer Workflow and Instructions
No ratings yet
Reviewer Workflow and Instructions
7 pages
Code Extensions - Instructions
No ratings yet
Code Extensions - Instructions
19 pages
Engineerin 220224 172246 3
No ratings yet
Engineerin 220224 172246 3
5 pages
Message
No ratings yet
Message
3 pages
Align
No ratings yet
Align
5 pages
Notes 1
No ratings yet
Notes 1
3 pages
Blackhat OpenEval Instructions
No ratings yet
Blackhat OpenEval Instructions
5 pages
Code Extensions - Instructions
No ratings yet
Code Extensions - Instructions
16 pages
Guide To Report A Bug
No ratings yet
Guide To Report A Bug
3 pages
Chap.1introduction To Test Case Design
No ratings yet
Chap.1introduction To Test Case Design
11 pages
302-211067 BSSE 7th (B)
No ratings yet
302-211067 BSSE 7th (B)
2 pages
RAG Analysis
No ratings yet
RAG Analysis
8 pages
AutoCodeRover: The Future of Program Improvement and GitHub Issue Resolution
No ratings yet
AutoCodeRover: The Future of Program Improvement and GitHub Issue Resolution
9 pages
Description of All Fields in Jira During Create A Bug
No ratings yet
Description of All Fields in Jira During Create A Bug
3 pages
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
No ratings yet
Clsami46c02qx072ibtdtavm1 - Project Blackhat Code Eval Correctness
5 pages
Document 1
No ratings yet
Document 1
14 pages
RESM Lecture-15
No ratings yet
RESM Lecture-15
41 pages
4 2-Github Action
No ratings yet
4 2-Github Action
16 pages
Lead Qualification and Scoring Criteria
No ratings yet
Lead Qualification and Scoring Criteria
6 pages
Unit 3
No ratings yet
Unit 3
11 pages
Snippets For Common MR Feedback-2025061016514217
No ratings yet
Snippets For Common MR Feedback-2025061016514217
10 pages
Model Answer Paper
No ratings yet
Model Answer Paper
15 pages
Examples
No ratings yet
Examples
4 pages
12 Marks ST Important Questions
No ratings yet
12 Marks ST Important Questions
45 pages
Professional Cloud Devops Engineer - 6
No ratings yet
Professional Cloud Devops Engineer - 6
10 pages
q1 Merged
No ratings yet
q1 Merged
10 pages
Jag An Report
No ratings yet
Jag An Report
13 pages
Jira
No ratings yet
Jira
4 pages
SWE-Factory: Your Automated Factory For Issue Resolution Training Data and Evaluation Benchmarks
No ratings yet
SWE-Factory: Your Automated Factory For Issue Resolution Training Data and Evaluation Benchmarks
14 pages
Message
No ratings yet
Message
3 pages
Agent Prompt v1.2
No ratings yet
Agent Prompt v1.2
14 pages
Bug Reporting Guide for QA Teams
No ratings yet
Bug Reporting Guide for QA Teams
23 pages
Project Instructions
No ratings yet
Project Instructions
12 pages
How To Write Complex Questions
No ratings yet
How To Write Complex Questions
6 pages
Early 2025 AI Experienced OS Devs Study-17
No ratings yet
Early 2025 AI Experienced OS Devs Study-17
2 pages
Possible Interview Questions For BT (Nexxt Gen)
No ratings yet
Possible Interview Questions For BT (Nexxt Gen)
8 pages
Ubuntu Touch User & Dev Guide
No ratings yet
Ubuntu Touch User & Dev Guide
214 pages
Devguide Python Org en Latest
No ratings yet
Devguide Python Org en Latest
197 pages
Cpython Devguide PDF
No ratings yet
Cpython Devguide PDF
161 pages
Emmanuel God With Us Seminar Work
No ratings yet
Emmanuel God With Us Seminar Work
19 pages
Python Developer's Guide Documentation: Brett Cannon
No ratings yet
Python Developer's Guide Documentation: Brett Cannon
169 pages
Devguide Python Org en Latest
No ratings yet
Devguide Python Org en Latest
171 pages
Problem-Solving For Developers - A Beginner's Guide (English (Auto-Generated) ) (DownloadYoutubeSubtitles - Com)
No ratings yet
Problem-Solving For Developers - A Beginner's Guide (English (Auto-Generated) ) (DownloadYoutubeSubtitles - Com)
13 pages
Software Testing Documentation
No ratings yet
Software Testing Documentation
16 pages
Bug Life Cycle
No ratings yet
Bug Life Cycle
30 pages
HAv2Categorize The Issue
No ratings yet
HAv2Categorize The Issue
1 page
Revised HAv2 Validate Test Coverage
No ratings yet
Revised HAv2 Validate Test Coverage
2 pages
Requiremwnt Examples
No ratings yet
Requiremwnt Examples
1 page
HAv2 Final Checklist Before Submission
No ratings yet
HAv2 Final Checklist Before Submission
1 page
Appendix
No ratings yet
Appendix
1 page
Broken Authentication
No ratings yet
Broken Authentication
31 pages
Econ 102 Homework Chapter 2-3
No ratings yet
Econ 102 Homework Chapter 2-3
6 pages
FINAL Painless Algebra For Davao
No ratings yet
FINAL Painless Algebra For Davao
28 pages
Top 10 SEO Tips For Beginners
No ratings yet
Top 10 SEO Tips For Beginners
3 pages
As 2773.1-1998 Ultrasonic Cleaners For Health Care Facilities Non-Portable
No ratings yet
As 2773.1-1998 Ultrasonic Cleaners For Health Care Facilities Non-Portable
8 pages
Understanding Health Insurance: A Guide To Billing and Reimbursement 2020, 15th
No ratings yet
Understanding Health Insurance: A Guide To Billing and Reimbursement 2020, 15th
404 pages
RL3-NAC Info
No ratings yet
RL3-NAC Info
11 pages
By Code 66 Manual and Tutorial: Welcome To Content Checkup
No ratings yet
By Code 66 Manual and Tutorial: Welcome To Content Checkup
5 pages
Pasado, Presente, Futuro - Live Worksheets
No ratings yet
Pasado, Presente, Futuro - Live Worksheets
3 pages
PST RF Reciept
No ratings yet
PST RF Reciept
1 page
The Scalar Kalman Filter
100% (4)
The Scalar Kalman Filter
16 pages
How Artificial Intelligence Works
No ratings yet
How Artificial Intelligence Works
10 pages
Plumes - Delineation & Transport - D. James Benton
No ratings yet
Plumes - Delineation & Transport - D. James Benton
140 pages
IT Applications in Business UNIT I NOTES
No ratings yet
IT Applications in Business UNIT I NOTES
50 pages
Statistical Analysis and Histogram
No ratings yet
Statistical Analysis and Histogram
8 pages
Bonfiglioli Transmission PVT LTD, Chennai - 21aug18 PDF
No ratings yet
Bonfiglioli Transmission PVT LTD, Chennai - 21aug18 PDF
1 page
Spesifikasi DJI Phantom 4 Pro V2.0 Aircraft
No ratings yet
Spesifikasi DJI Phantom 4 Pro V2.0 Aircraft
5 pages
Software Engineer Career Profile
No ratings yet
Software Engineer Career Profile
2 pages
Two Marks Q&A Unit - I: CO Beo
No ratings yet
Two Marks Q&A Unit - I: CO Beo
21 pages
Ultraflux MinisonicP Manual
No ratings yet
Ultraflux MinisonicP Manual
26 pages
Siemens PLC SL 200 Manual
No ratings yet
Siemens PLC SL 200 Manual
24 pages
Irzal Dwi Rahadianto CV
No ratings yet
Irzal Dwi Rahadianto CV
2 pages
1 20180701 080058
No ratings yet
1 20180701 080058
7 pages
Multicommunicating
No ratings yet
Multicommunicating
15 pages
Helm Manual: Developed By: Matt Tytel
No ratings yet
Helm Manual: Developed By: Matt Tytel
32 pages
JAND Use Manual
No ratings yet
JAND Use Manual
147 pages
P vs NP: The $1M Millennium Problem
No ratings yet
P vs NP: The $1M Millennium Problem
11 pages
Stb170Nf04: N-Channel 40 V, 4.4 M Ω Typ., 80 A Stripfet™ Ii Power Mosfet Inad Pak Package
No ratings yet
Stb170Nf04: N-Channel 40 V, 4.4 M Ω Typ., 80 A Stripfet™ Ii Power Mosfet Inad Pak Package
15 pages
SJ-20110624091725-007 ZXR10 8900&8900E (V3.00.01) Series Switch Debugging Guide
No ratings yet
SJ-20110624091725-007 ZXR10 8900&8900E (V3.00.01) Series Switch Debugging Guide
65 pages
CN 1st IA QP 2024
No ratings yet
CN 1st IA QP 2024
2 pages