[go: up one dir, main page]

0% found this document useful (0 votes)
12 views10 pages

HAv2 PR and Problem Statement

The document outlines the process for reviewing a GitHub pull request (PR) and creating a proper problem statement. It emphasizes the importance of distinguishing between the original issue and the changes made in the PR, ensuring the problem statement accurately describes what was broken before the fix. Additionally, it provides guidelines for categorizing issues and evaluating their specificity to aid in effective problem statement creation.

Uploaded by

dineshbin334
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views10 pages

HAv2 PR and Problem Statement

The document outlines the process for reviewing a GitHub pull request (PR) and creating a proper problem statement. It emphasizes the importance of distinguishing between the original issue and the changes made in the PR, ensuring the problem statement accurately describes what was broken before the fix. Additionally, it provides guidelines for categorizing issues and evaluating their specificity to aid in effective problem statement creation.

Uploaded by

dineshbin334
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 10

Let's start with the first key step, reviewing the pull request and the initial

problem statement. Your first task is to check whether there's a linked issue in
the GitHub PR or the problem description. If there is, that makes things a lot
easier.

In many cases, you can reuse that issue directly. But if there's only a pull
request or a commit message, you'll likely need to rewrite the summary into a
proper problem statement. Once you know whether you're reusing or rewriting, make
sure to read the original problem statement carefully.

Then read the golden patch summary in the GitHub pull request. This summary usually
describes what was changed in the code. After that, ask yourself, does the problem
statement describe what was broken before the fix? If it does, you're probably good
to go.

If it only explains what the PR did, then it needs to be rewritten. This is


important because the goal is to describe the original issue, not the solution. The
rewritten version should focus on what went wrong, or what was missing before the
patch, not just what was added or changed.

in detail:

Step 1: Review the GitHub PR and Initial Problem Statement

In this step, you will do the following:

Read the initial problem statement and the golden patch in the GitHub PR (the code
changes that resolved the issue).
Use the golden patch summary to find the edited files that are part of the golden
patch.
The diff are only the changes in the commit related to the files in the golden
patch.

Example:

Step 3: Categorize the Issue on Several Axes

You will categorize the original issue on the following dimensions:


Good vs. Bad Issue
Specificity Rating (This is only for reviewers now)
Issue Type
Knowledge Tag

⚠️A task is considered a bad issue if it meets one of the following criteria:⚠️

An issue falls into this category when:

The initial problem statement contains foreign languages.

If you select any category of bad issue, you will have to justify it and end the
task.

Specificity Rating (This is only for reviewers now May 3)


Update for May 15 - Added Explicit Criteria to Define Specificity
Contributors should rate the specificity of the original problem statement (if
available) or PR description (if no issues are available). The specificity rating
is a measure of how much detail the problem statement/PR contains and how clear the
desired solution is. The desired solution is defined by what
behaviors/functionalities are tested for by the unit test suite.
Imagine that you are an experienced software engineer who has been instructed to
create a PR that successfully resolves the above GitHub issue. You have full access
to the codebase and can see the issue description above. However, you are not able
to ask for clarification and would need to work exclusively from this information.
What rating would you give this issue?
5/16 Update - A well specified issue or requirement should contain all of the
information that is needed to pass the unit tests associated with the test patch of
the PR.
Specificity ratings apply to both the problem statement and the requirement. When
evaluating the specificity of the requirement, you should consider the problem
statement to be included as well (i.e., the specificity is the superset of the
problem statement and requirement).
The rubric for specificity is below:
Criteria
Description
1
2
3
4
5
Issue Specificity
Imagine that you are an experienced software engineer who has been instructed to
create a PR that successfully resolves the above GitHub issue. You have full access
to the codebase, and can see the issue description as it is above. But you are not
able to ask for clarification and would need to work exclusively from this
information.
Would you have enough information to create a solution that passes all the unit
tests in the test patch?
It is almost impossible to understand what is being asked without further
information.
The statement does not define what is needed to pass the unit tests and it would be
impossible for an agent to create a valid solution that passes unit tests without
additional information.
The issue is vague and difficult to understand but the agent should have some idea
of what needs to be implemented to solve the problem.
The statement misses critical information needed to pass at least one unit test.
This information could not be reasonably assumed from the code base.

The issue is vague and there is room for ambiguity. It is unclear what a successful
solution would look like.
The statement provides a high level overview of the behavior needed to pass and an
agent might reasonably infer a valid solution to the unit tests, but enough details
are vague where the agent may address the problem without passing the unit tests.
There are some blanks to fill in, but there is a sensible interpretation of what is
required for a solution.
The statement does not explicitly define all configurations and behaviors needed to
pass the unit test but the proper implementation can be inferred from the code
base.

The issue is well-specified and it is clear what is required for a successful


solution.
The statement explicitly includes specifications on expected behavior and if this
expected behavior is implemented, it will satisfy all unit tests.

Issue Classification

Category
Subcategory
Tags
Bug
Critical
System crash, data loss, security breach, blocking functionality

Major
Significant impact on core functionality but with workarounds

Minor
Non-critical issues with minimal impact on users

Regression
Previously working feature now broken

Performance
Slowness, memory leaks, resource consumption

Security
Vulnerabilities, authentication issues, data exposure

Data
Data corruption, validation errors, inconsistency

UI/UX
Visual glitches, usability problems, layout issues

Compatibility
Browser/device/version compatibility issues

Edge Case
Rare conditions or unusual input handling

Integration
Issues with external systems, APIs, or third-party services
Feature
Core
Essential application functionality

UI/UX
User interface and experience improvements

API
API development, endpoints, or documentation

Integration
External system connections or third-party services

Performance
Speed and resource optimization

Accessibility
A11y improvements for diverse users

Localization
i18n and l10n support for multiple regions/languages

Analytics
Metrics, reporting, dashboards, and insights

Security
Authentication, authorization, and data protection

Customization
User preferences, themes, and configuration options

Mobile
Mobile-specific functionality or responsive design
Enhancement
Performance
Speed, memory usage, resource optimization

Code Quality
Readability, maintainability, documentation

Refactoring
Restructuring without changing behavior

Testing
Test coverage, quality, and automation improvements

UI/UX
User interface and experience refinements

Accessibility
A11y compliance and improvements

Technical Debt
Addressing accumulated implementation shortcuts

DevOps
CI/CD, deployment, monitoring improvements

Security
Hardening existing features, compliance

Scalability
Handling increased load, users, or data volume

Documentation
Improving user guides, API docs, or code comments

Knowledge Category

Knowledge Category
Core Development
Frontend

Backend

Full Stack

Web

Mobile

Desktop

Database
API
Infrastructure & Operations
Infrastructure

DevOps

Cloud

Networking
Security
Security

Authentication/Authorization
Data & AI
Data Science

ML/AI
Design & UX
UI/UX

Accessibility
Quality & Testing
QA/Testing

Performance
Specialized Domains
Game Development

AR/VR

IoT/Embedded

Blockchain

Step 4: Problem Statement Creation

In this step, you will do the following:

Determine whether the initial problem statement describes the actual issue or if it
only summarizes the changes in the codebase (golden patch).

In some cases, you will be able to copy a linked issue as the problem statement.
This is encouraged.

Example
PR Link: https://github.com/deepset-ai/haystack/pull/8729
PR Description: fixes Document Splitter always returns 1 document for
split_type="passage" in pdfs #8491
Linked Issue: https://github.com/deepset-ai/haystack/issues/8491

Good Problem Statement


# Document Splitter always returns 1 document for split_type="passage" in pdfs

**Describe the bug**


When using Document Splitter with pdf and split_type="passage", the result is
always one document. This is using pypdf.

**Expected behavior**
The understanding I have is that it splits based on at least two line breaks \n\n

**Additional context**
When I tested using plain text it seems to be splitting correctly

**To Reproduce**

dir = '...'
files = [
{"filename": "rules.pdf", "meta": {"split_by" : "passage", "split_length":1,
"split_overlap":0, "split_threshold":0}},
{"filename": "rules.txt", "meta": {"split_by" : "passage", "split_length":1,
"split_overlap":0, "split_threshold":0}}
]
for file in files:
# set the filepath
file_path = Path(dir) / file["filename"]
router_res = file_type_router.run(sources=[file_path])
txt_docs = []
if 'text/plain' in router_res:
txt_docs = text_file_converter.run(sources=router_res['text/plain'])
elif 'application/pdf' in router_res:
txt_docs = pdf_converter.run(sources=router_res['application/pdf'])
elif 'text/markdown' in router_res:
txt_docs = markdown_converter.run(sources=router_res['text/markdown'])
document_splitter = DocumentSplitter(
split_by=file['meta']['split_by'],
split_length=file['meta']['split_length'],
split_overlap=file['meta']['split_overlap'],
split_threshold=file['meta']['split_threshold']
)
splitter_res = document_splitter.run([txt_docs['documents'][0]])
print(len(splitter_res['documents']))

**System:**

OS: Mac OS 14.6.1


GPU/CPU: CPU
Haystack version (commit or version number): 2.6.0
DocumentStore: Chromadb
Splitter: DocumentSplitter

A good problem statement explains what was wrong before the fix, while a golden
patch description only describes the changes made in the PR. If the problem
statement only summarizes the fix instead of identifying the issue, it must be
rewritten.

A good problem statement should try to match the specificity of the original PR. If
the original PR was had a specificity score of 2, your problem statement should
also match a specificity of 2.
DO NOT provide more context than the original description has provided. That
context will be provided in the requirements section.

Example

Original PR
Fixed a bug when a user double clicks the Share button

❌Bad Example
**Title: Double Click Bug **

**Description**
When a user double clicks the share button, an error occurs causing the share
screen to render twice, crashing the entire app

✅Good Example
**Title: Double Click Bug **

**Description**
When a user double clicks the share button a bug occurs.

How to Evaluate the Problem Statement: UPDATE MAY 8


Determine if there are linked issues
In some problems, there will be one or more linked issue URLs listed. When these
are listed, you can review the issue statements and select the best issue
statement.
If the best linked issue statement is accurate and free from misleading information
about the golden patch that could misguide the agent, copy and paste it directly
into the problem statement box.
If the linked issue statement contains factual errors or inaccuracies, you must fix
them before copying and pasting—ensure the level of detail remains the same without
adding more specificity.

Analyze the initial problem statement:


Does it describe what was broken before the fix?
Or does it only describe the code change?
If the problem statement describes what was changed in the PR rather than what was
wrong with the code before the fix, it must be rewritten.

Examine the golden patch (code fix):


Does the problem statement align with what was fixed?
Or does it need a rewrite to describe the issue?

Problem Statement Formatting

Problem statements should be formatted like real GitHub issues with titles and
issue headings that are marked as issue headings using clear formatting.

To identify good formatting, you can use the following sources:


Copy paste from the GitHub repository’s issue template which typically can be
found in the .github/Issue Template folder
Use the template formatting provided by the synthetic prompt
The below formatting provides a high level template of what a good issue should
look like, though there is considerable flexibility:

Title # E.g., Fix bug where 1GB+ uploads casue a system crash

**Field Name 1** OR ## Field Name 1

Text

**Field Name 2** OR ## Field Name 2

Text

etc.

Examples of Good vs. Bad Problem Statements

✅ Good Problem Statement (Describes the Issue):


"We are currently using an outdated version of the Google Analytics API, and we
should upgrade to the latest version (v4)."

❌ Bad Problem Statement (Golden Patch Description, Needs Rewriting):


"This PR adds support for the Google Analytics v4 API."
Another example of a golden patch description that needs rewriting:
" - Add TryIt to index.ts - Add test"

How to Rewrite the Problem Statement

Using the rewritten model-generated problem statements as a starting point, rewrite


the initial message in the form of a problem statement describing an issue to be
solved.

The rewritten problem statement must:

Be faithful to the original commit/pr/issue without adding or removing information.


Maintain the same level of specificity as the original (e.g., if the original lacks
detail, the rewrite should not introduce additional details).
Identify what was broken or missing before the fix.
Avoid simply restating what was changed in the PR.
Focus on why the issue mattered—what impact did it have?
Determine what requirement is necessary for the problem to be considered resolved.
Write a clear and concise issue description.
Do not mention how the problem was fixed.
Ensure the issue statement is neutral and factual—do not assume intent.
Include any specific requirements necessary for the fix.
If the initial problem statement was written as a golden patch description, it must
be rewritten into a clear issue statement that includes a requirement.
Original Golden Patch Description (Needs Rewriting):

❌ "- Add TryIt to index.ts - Add test"

Finalized Problem Statement Rewrite (Corrected Issue Description with Requirement):


✅ "The try function in Radash cannot be imported directly due to JavaScript’s
reserved keyword conflict. This forces users to manually alias it, adding an
unnecessary step. The only alternative is importing the entire library, which is
not always ideal. Since ‘try’ is a known reserved word, the library should handle
this internally to avoid requiring manual aliasing."

You might also like