Ballerina Capuchina: How to write complex Questions.
What do we consider a good question?
A good question isn’t necessarily one that's difficult for a human developer—it’s one that
exposes real challenges for current language models. The goal isn't to maximize complexity, but
to target areas where even the best LLMs tend to struggle.
Complex question in coding environments:
Think about common LLM weaknesses when working with production code—like losing context
across files, failing to link related components in different modules, or misunderstanding
config-to-r
💥 Hard Question Types for LLMs on GitHub Repos
Introduction
Below are real-world question types that LLMs often fail to solve well—and why.
1. Cross-File “Glue” Questions
Prompt Example:
“Can you trace how a request handled in api/order.js ultimately triggers the
email-sending logic in mailer/sendInvoice.ts, and explain how data flows between
these modules—including any intermediate services, function calls, or shared utilities
involved in the process?”
Why does it fails: The model must reason across multiple files and track a call chain.
With limited context, it may miss or misinterpret the link between modules.
2. Cyclic Imports or Dependency Graph Fixes
Prompt Example:
“How would you refactor the Go packages to break the cyclic dependency between
model/user.go and state/session.go, particularly around the NewUser() and
InitSession() functions, and what structural changes would preserve their behavior
while decoupling their imports?”
Why does it fails: Requires global insight into package structure. Many LLMs fail to
suggest viable restructuring (e.g., introducing a new shared module), even though that’s
a known pattern among experienced devs.
3. Dynamic Analyses (AST, PropTypes, Runtime Checks)
Prompt Example:
“What could cause the React prop-types checker to flag only during CI runs, and how
might differences in environment, build configuration (e.g., .babelrc,
webpack.config.js), or module resolution impact the behavior of prop-types validation
in lib/rules/prop-types.js when analyzing components/Card.jsx during
production versus local development?”
Why does it fails: Requires deep knowledge of AST traversal + runtime execution + test
orchestration. LLMs struggle to combine these layers without concrete traces.
Step-by-Step on how to discover a complex question.
This section offers a friendly, practical guide to help you
begin understanding the context and structure of the
repository you're working with.
🧐 1. Understand the Lay of the Land: Build a Mental Model
👀 2. Dig Deeper: Map Cross-File Relationships
🧠 3. Brainstorm and Formulate Your Question
1. Understand the Lay of the Land: Build a Mental Model
Before you can ask a good question, you need to know what you're looking at.
● Get a Quick Overview: Use Cursor to get a high-level summary of the repository.
What are the main directories? What are the key components, and how do they
interact?
● Scan the Directory Tree: Identify the major areas of the codebase, such as the API,
UI, data handling, tests, and build processes.
● Identify Key Structures: Look for public APIs, class hierarchies, shared utilities,
and points where dependencies are injected.
● Trace a Workflow: Follow at least one complete data or control flow. For example,
trace a user request from the handler to the service that interacts with the
database.
Review Recent Changes: Look at your given PR. These often reveal hidden connections
and dependencies between different parts of the code. This can help you determine
which collection of files would work best for your question.
2. Dig Deeper: Map Cross-File Relationships
Understand how different parts of the codebase connect with each other. For example,
look for:
● Function Calls: Where does a function in one file call a function in another?
● Shared Information: Is there shared state or configuration, like environment
variables or global settings?
● Interfaces and Implementations: Where are interfaces defined, and where are they
implemented?
Data Models: Find where data models are defined and then see how they are used in
other places like migrations, serializers, or tests.
3. Brainstorm and Formulate Your Question
Now that you have a good understanding of the code, you can start thinking about what
to ask.
● Think Like a Hacker or Tester:
○ What happens if a function returns an unexpected value?
○ What if two different commits change the same configuration in conflicting
ways?
○ Could an old, outdated dependency cause a problem with a new feature?
○ Come up with three to five realistic scenarios that involve multiple files.
● Craft a Strong Question: A good question will:
○ Set the Scene: "Suppose the UserService.create() function returns None..."
○ Point to the Right Places / Reference Multiple Files: "...explain how that
affects order_processor.py and notification_mailer.go."
○ Be Answerable: Ensure the question can be answered using only the
provided code and its history.
Discuss the interaction between functions of multiple files: It will be easier to stump the
model if your question references an interaction between a Class/Function or
Datastructure between different files.
Example Approach on how to come up with good questions
Example Foundational Questions to Get Started
If you're just starting to explore a repository, here are some good initial questions to ask:
● What is the main purpose of this repository?
● What is its primary function?
● What are the most important directories?
● Can you give me a detailed explanation of how the top three key functionalities in
lib/util are used, with examples?
🚨🚨🚨 ** Once you have come up with the question you will use in your
task → Remember to always start a new chat before asking the question
you will use in your task.**
Assess your question before starting
GENERAL INSTRUCTIONS
📌 Focus on different criteria related to the repository and the code functionalities in the
code source files. **Check the **Question Styles and Diversity section for different types
of questions you may ask in different tasks.
📌 Try to ask hard questions that will stump the model. That is, questions that the model
is not able to provide a full answer.
📌
** **Harder prompts rely on data from multiple parts of the codebase that require the
agent to synthesize knowledge from multiple files, especially how multiple files
interact. The questions should be about the code in the repository. For example, if you’re
working on the Pandas repository, you should ask about the implementation of Pandas
as if you were a developer working on the Pandas implementation, not something like
“how do I create a DataFrame in Pandas”
📌 The questions need to be realistic (NO FANCY FORMATTING such as markdown).
● ❌ Avoid backticks and markdown, such as ###, in the questions.
● ❌ Don’t just copy the question examples as templates. Be creative.
● ❌ Don’t be formal. Use casual language.
● ❌ Avoid writing questions that look like GitHub issue descriptions: Think about
what a developer may ask a coding agent when using a repository.
If your task has an issue description with replication code, you can use that as
INSPIRATION for the files and modules you may want to mention in your question.
❌
● Be precise: Try to match the level of precision you’d normally provide when
prompting an LLM, but do not leak the PR solution if you are using the PR as
inspiration!
● For debugging-type questions, describe or mention the relevant existing code or
functionality if needed: The questions should include the relevant information for
the Agent to address the problem.
Reference relevant files: Reference other files by using the @file_name convention in
Cursor that provides that file as context.
Quick Checklist for a Good Question
Use this checklist to make sure your question is solid:
● ✅ Does my question involve an interaction between at least two/three different
files?
● ✅ Does answering the question require combining information from those
✅ Is the scenario I'm presenting realistic and based on how the repository
different files?
●
✅ Is it completely clear what the person answering the question needs to
actually works?
●
provide?