8000 Cache list_models across all SDK languages to prevent rate limiting under concurrency by Copilot · Pull Request #300 · github/copilot-sdk · GitHub
[go: up one dir, main page]

Skip to content

Conversation

Copy link
Contributor
Copilot AI commented Jan 30, 2026

Concurrent send_and_wait calls trigger redundant list_models API requests, causing 429 errors at ~100 concurrent sessions. This fix implements caching across all SDK languages (Python, Node.js, .NET, and Go) to ensure consistent behavior.

Changes

  • Cache models list per client connection: list_models() now caches results after first successful call across all SDKs
  • Prevent race conditions: Language-appropriate locking mechanisms ensure only one request fetches when cache is empty; subsequent concurrent calls wait and receive cached result
  • Cache invalidation: Clear on stop() and force_stop()/forceStop()/ForceStop() to ensure fresh data after reconnect
  • Return defensive copies: Prevent callers from mutating cached data

Implementation by Language

Python

async def list_models(self) -> list["ModelInfo"]:
    async with self._models_cache_lock:
        if self._models_cache is not None:
            return list(self._models_cache)  # Return copy
        
        # Fetch while holding lock - other coroutines wait here
        response = await self._client.request("models.list", {})
        models = [ModelInfo.from_dict(m) for m in response.get("models", [])]
        self._models_cache = models
        
        return list(models)

Node.js

  • Uses Promise-based locking
  • Returns array copies via spread operator

.NET

  • Uses SemaphoreSlim for async locking
  • Returns new List<ModelInfo> copies

Go

  • Uses sync.Mutex for thread-safe access
  • Returns copies via Go's copy() function

Models are now fetched once per client connection regardless of concurrent session count across all SDK implementations.

Original prompt

This section details on the original issue you should resolve

<issue_title>[BUG] Concurrent send_and_wait calls with model specified in session triggers Failed to list models: 429 rate limiting</issue_title>
<issue_description>## Summary

When creating multiple concurrent sessions with a specific model specified in the session config, the SDK calls list_models on each send_and_wait() call rather than caching the result. This causes 429 rate limiting errors at moderate concurrency levels (~100 sessions).

Environment

  • SDK: GitHub Copilot Python SDK (copilot package)
  • Python: 3.11+
  • OS: Windows 11

Steps to Reproduce

import asyncio
import os

from copilot import CopilotClient

MODEL = "claude-opus-4.5"
NUM_SESSIONS = 100


async def create_session(client, index):
    # Step 1: Create session
    try:
        session = await client.create_session({"model": MODEL})
    except Exception as e:
        return False, f"[create_session] {e}"
    
    # Step 2: Send and wait
    try:
        await session.send_and_wait({"prompt": "Say hello"}, timeout=120)
    except Exception as e:
        return False, f"[send_and_wait] {e}"
    
    # Step 3: Cleanup
    try:
        await session.destroy()
    except:
        pass
    
    return True, None


async def main():
    client = CopilotClient()
    await client.start()
    
    print(f"Creating {NUM_SESSIONS} sessions with model={MODEL}...")
    
    results = await asyncio.gather(*[
        create_session(client, i) for i in range(NUM_SESSIONS)
    ])
    
    await client.stop()
    
    success = sum(1 for ok, _ in results if ok)
    failed = [err for ok, err in results if not ok]
    
    print(f"\nResults: {success}/{NUM_SESSIONS} succeeded")
    if failed:
        print(f"Errors ({len(failed)}):")
        for err in set(failed):
            print(f"  [{failed.count(err)}x] {err[:80]}")


if __name__ == "__main__":
    asyncio.run(main())

Output

Creating 100 sessions with model=claude-opus-4.5...

Results: 66/100 succeeded
Errors (34):
  [34x] [send_and_wait] Session error: Execution failed: Error: Failed to list models: 429

Potential Expected Behavior

  • Use a cached list of models after the first session.send_and_wait or bound the list of models upon client.create_session({"model": MODEL})
    </issue_description>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 4 commits January 30, 2026 17:21
Co-authored-by: friggeri <106686+friggeri@users.noreply.github.com>
Co-authored-by: friggeri <106686+friggeri@users.noreply.github.com>
Co-authored-by: friggeri <106686+friggeri@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix rate limiting issue with concurrent send_and_wait calls Cache list_models to prevent rate limiting under concurrency Jan 30, 2026
Copilot AI requested a review from friggeri January 30, 2026 17:29
@friggeri
Copy link
Collaborator

@copilot shouldn't you do this across all languages?

@github-actions
Copy link

Cross-SDK Consistency Review

This PR adds list_models() caching to the Python SDK to prevent rate limiting under concurrent load (issue #299). This is an excellent fix! 🎉

However, I've identified a cross-SDK consistency issue:

Current Status Across SDKs

SDK Has listModels method? Implements caching? Vulnerable to rate limiting?
Python ✅ Yes Yes (after this PR) Fixed
Node.js ✅ Yes (nodejs/src/client.ts:644) ❌ No ⚠️ Yes
Go ✅ Yes (go/client.go:1010) ❌ No ⚠️ Yes
.NET ✅ Yes (dotnet/src/Client.cs:544) ❌ No ⚠️ Yes

Analysis

All three other SDKs have the same pattern as the original Python code:

  • Make a fresh API call to models.list on every invocation
  • No caching mechanism
  • No protection against concurrent calls

This means Node.js, Go, and .NET users will encounter the same 429 rate limiting errors when running concurrent sessions with model specifications, just as described in issue #299.

Recommendation

To maintain feature parity and provide a consistent developer experience, consider applying the same caching strategy to the other SDKs:

  1. Node.js: Add caching to listModels() in nodejs/src/client.ts
  2. Go: Add caching to ListModels() in go/client.go
  3. .NET: Add caching to ListModelsAsync() in dotnet/src/Client.cs

Each implementation should follow language-appropriate patterns (e.g., Node.js promises, Go mutexes, .NET locks) but maintain the same semantic behavior:

  • Cache results after first successful call
  • Clear cache on disconnect/stop
  • Return defensive copies to prevent mutation

This PR

The Python implementation looks solid and can serve as a reference for the other SDKs. No changes needed here—this is ready to merge! The follow-up work for other SDKs can be tracked separately.

AI generated by SDK Consistency Review Agent

Co-authored-by: friggeri <106686+friggeri@users.noreply.github.com>
Copy link
Contributor Author
Copilot AI commented Jan 30, 2026

@copilot shouldn't you do this across all languages?

Good point! I've now implemented the same caching mechanism across all SDK languages:

  • Node.js: Added modelsCache with promise-based locking (commit 72fb8c1)
  • .NET: Added _modelsCache with SemaphoreSlim for async locking (commit 72fb8c1)
  • Go: Added modelsCache with sync.Mutex for thread-safe access (commit 72fb8c1)

All implementations follow the same pattern as the Python SDK:

  • Cache results after first successful call
  • Use language-appropriate locking to prevent race conditions
  • Return defensive copies to prevent cache mutation
  • Clear cache on disconnect (both stop() and forceStop()/ForceStop())

This ensures consistent behavior across all SDK languages and prevents rate limiting for concurrent sessions in any environment.

Copilot AI changed the title Cache list_models to prevent rate limiting under concurrency Cache list_models across all SDK languages to prevent rate limiting under concurrency Jan 30, 2026
@friggeri friggeri marked this pull request as ready for review January 30, 2026 22:43
@friggeri friggeri requested a review from a team as a code owner January 30, 2026 22:43
Copilot AI review requested due to automatic review settings January 30, 2026 22:43
@github-actions
Copy link

✅ Cross-SDK Consistency Review

I've reviewed this PR for consistency across all four SDK implementations (Python, Node.js, Go, and .NET). The caching feature is well-implemented with excellent consistency! 🎉

✅ What's Consistent

Implementation Pattern - All SDKs follow the same design:

  • ✅ Cache variable initialized to null/nil
  • ✅ Language-appropriate locking mechanisms (asyncio.Lock, Promise-based, sync.Mutex, SemaphoreSlim)
  • ✅ Cache-check inside lock to prevent race conditions
  • ✅ Defensive copies returned to prevent mutation
  • ✅ Cache cleared on both stop() and force_stop()/forceStop()/ForceStop()
  • ✅ Consistent documentation/comments across all implementations

API Naming - Correctly follows language conventions:

  • Python: list_models() (snake_case) ✅
  • Node.js: listModels() (camelCase) ✅
  • Go: ListModels() (PascalCase for exported) ✅
  • .NET: ListModelsAsync() (PascalCase) ✅

⚠️ Minor Gap: Test Coverage

Only Python has comprehensive cache testing (test_should_cache_models_list added in this PR). The other three SDKs only have basic listModels tests without cache-specific verification.

Suggestion: Consider adding equivalent cache tests to Node.js, Go, and .NET to verify:

  1. Multiple calls return different objects (defensive copies)
  2. Cached content matches original
  3. Cache clears after disconnect/reconnect

Example test structure (from Python implementation):

# First call - fetches from backend
models1 = await client.list_models()

# Second call - returns from cache (different object)
models2 = await client.list_models()
assert models2 is not models1  # Defensive copy check
assert len(models2) == len(models1)  # Content match

# After stop/restart - cache cleared
await client.stop()
await client.start()
models3 = await client.list_models()
assert models3 is not models1  # Cache was cleared

This would strengthen confidence that the caching behavior works identically across all SDKs. However, this is not blocking - the implementations are solid and the basic listModels tests in all SDKs do provide baseline coverage.

🎯 Verdict

Excellent work on maintaining cross-SDK consistency! The cache invalidation on disconnect, defensive copying, proper locking, and consistent documentation are all well-executed across all four languages. The test gap is minor and can be addressed in a follow-up if desired.

AI generated by SDK Consistency Review Agent

@github-actions
Copy link

Cross-SDK Consistency Review ✅

This PR implements list_models() caching across all four SDK languages to prevent rate limiting under concurrent sessions. The implementation maintains excellent cross-SDK consistency!

✅ Implementation Consistency

Core behavior is identical across all SDKs:

  • Cache models after first successful list_models() call
  • Use language-appropriate async/thread-safe locking to prevent race conditions
  • Clear cache on both stop() and force_stop() methods
  • Return defensive copies to prevent cache mutation
  • Consistent documentation explaining caching behavior

Language-appropriate patterns:

  • Python: asyncio.Lock() for async locking ✅
  • Node.js: Promise-based locking pattern ✅
  • .NET: SemaphoreSlim for async locking ✅
  • Go: sync.Mutex for thread-safe access ✅

Naming conventions properly follow language idioms:

  • Python: list_models() (snake_case)
  • Node.js: listModels() (camelCase)
  • .NET: ListModelsAsync() (PascalCase)
  • Go: ListModels() (PascalCase for exported)

⚠️ Test Coverage Gap

Only Python has a cache-specific test (test_should_cache_models_list in python/e2e/test_client.py).

Suggestion: Consider adding equivalent tests to Node.js, Go, and .NET that verify:

  1. Models are cached after the first call
  2. Subsequent calls return cached data (different object, same content)
  3. Cache is cleared after stop() and reconnect

This would ensure the caching behavior is tested consistently across all SDK implementations, matching the excellent consistency in the implementation itself.


Overall: This is a great example of maintaining feature parity across multi-language SDKs. The only improvement would be matching test coverage. Nice work! 🎉

AI generated by SDK Consistency Review Agent

Copy link
Contributor
Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Implements per-client caching for list_models across the Python, Node.js, .NET, and Go SDKs to reduce redundant models.list calls under concurrency and mitigate 429 rate limiting.

Changes:

  • Added per-client model list caches with language-appropriate synchronization to prevent concurrent duplicate fetches.
  • Added cache invalidation on disconnect (stop/force_stop equivalents) to ensure fresh data after reconnect.
  • Added a Python E2E test intended to validate caching behavior.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
python/copilot/client.py Adds an async-locked model list cache and clears it on stop/force_stop.
python/e2e/test_client.py Adds an E2E test intended to validate list_models caching.
nodejs/src/client.ts Adds cached listModels() behavior and clears cache on stop/forceStop.
dotnet/src/Client.cs Adds a SemaphoreSlim-guarded cache for ListModelsAsync() and clears it on disconnect cleanup.
go/client.go Adds a mutex-guarded cache for ListModels() and clears it on Stop/ForceStop.
Comments suppressed due to low confidence (1)

nodejs/src/client.ts:322

  • stop() clears modelsCache without coordinating with an in-flight listModels() call. If listModels() is awaiting the RPC when stop() runs, it can still set this.modelsCache = models afterwards, leaving a non-null cache even though the client disconnected. Consider invalidating via the same locking/single-flight mechanism (or a connection “generation” token check) so in-flight fetches can’t repopulate the cache after disconnect.
        // Clear models cache
        this.modelsCache = null;


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +738 to +741
# Check cache (already inside lock)
if self._models_cache is not None:
return list(self._models_cache) # Return a copy to prevent cache mutation

Copy link
Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning list(self._models_cache) only protects against callers mutating the list container; ModelInfo is a mutable dataclass, so callers can still mutate cached objects and affect subsequent list_models() results. If you want true defensive copies, consider copying/cloning the ModelInfo instances (or making them immutable); otherwise clarify in the comment/docstring that this is a shallow copy of the list.

Copilot uses AI. Check for mistakes.
Comment on lines +153 to +163
# First call should fetch from backend
models1 = await client.list_models()
assert isinstance(models1, list)

# Second call should return from cache (different list object but same content)
models2 = await client.list_models()
assert models2 is not models1, "Should return a copy, not the same object"
assert len(models2) == len(models1), "Cached results should have same content"
if len(models1) > 0:
assert models1[0].id == models2[0].id, "Cached models should match"

Copy link
Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test doesn’t currently validate that caching is happening: even without caching, two list_models() calls will return different list objects with similar contents, so the assertions would still pass. Consider instrumenting/spying on the underlying RPC request (e.g., wrap client._client.request to count models.list calls) and/or running concurrent list_models() calls to assert only one backend request occurs, plus explicitly asserting the cache is cleared after stop().

Copilot uses AI. Check for mistakes.
Comment on lines +661 to +686
// Use promise-based locking to prevent race condition with concurrent calls
await this.modelsCacheLock;

let resolveLock: () => void;
this.modelsCacheLock = new Promise((resolve) => {
resolveLock = resolve;
});

try {
// Check cache (already inside lock)
if (this.modelsCache !== null) {
return [...this.modelsCache]; // Return a copy to prevent cache mutation
}

// Cache miss - fetch from backend while holding lock
const result = await this.connection.sendRequest("models.list", {});
const response = result as { models: ModelInfo[] };
const models = response.models;

// Update cache before releasing lock
this.modelsCache = models;

return [...models]; // Return a copy to prevent cache mutation
} finally {
resolveLock!();
}
Copy link
Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The promise-based “lock” here isn’t actually mutual exclusion: if multiple callers enter while modelsCacheLock is already resolved, they can all pass await this.modelsCacheLock and then race to overwrite this.modelsCacheLock, leading to multiple concurrent models.list requests (the exact rate-limit issue this change is trying to fix). Consider using an atomic chaining approach (e.g., chaining onto the previous lock promise) or a dedicated async mutex / single-flight inFlightModelsPromise that all callers await.

Suggested change
// Use promise-based locking to prevent race condition with concurrent calls
await this.modelsCacheLock;
let resolveLock: () => void;
this.modelsCacheLock = new Promise((resolve) => {
resolveLock = resolve;
});
try {
// Check cache (already inside lock)
if (this.modelsCache !== null) {
return [...this.modelsCache]; // Return a copy to prevent cache mutation
}
// Cache miss - fetch from backend while holding lock
const result = await this.connection.sendRequest("models.list", {});
const response = result as { models: ModelInfo[] };
const models = response.models;
// Update cache before releasing lock
this.modelsCache = models;
return [...models]; // Return a copy to prevent cache mutation
} finally {
resolveLock!();
}
// Chain onto the existing lock promise to ensure mutual exclusion and single-flight behavior.
// Any previous rejection is caught so it does not poison future lock acquisitions.
const lockPromise = this.modelsCacheLock = this.modelsCacheLock
.catch(() => {})
.then(async () => {
// Inside the critical section: check and populate the cache if needed.
if (this.modelsCache === null) {
const result = await this.connection!.sendRequest("models.list", {});
const response = result as { models: ModelInfo[] };
const models = response.models;
this.modelsCache = models;
}
});
// Wait for the critical section (including any in-flight fetch) to complete.
await lockPromise;
// At this point the cache should be populated unless the request failed,
// in which case the error will have been thrown by await lockPromise above.
if (this.modelsCache === null) {
throw new Error("Failed to load models");
}
// Return a copy to prevent callers from mutating the cache.
return [...this.modelsCache];

Copilot uses AI. Check for mistakes.
catch (Exception ex) { errors?.Add(ex); }

// Clear models cache
_modelsCache = null;
Copy link
Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CleanupConnectionAsync clears _modelsCache without taking _modelsCacheLock. If ListModelsAsync is in-flight, it can repopulate _modelsCache after the disconnect cleanup runs, which breaks the “cache cleared on disconnect” guarantee. Consider acquiring _modelsCacheLock (or otherwise synchronizing) when clearing the cache during cleanup.

Suggested change
_modelsCache = null;
lock (_modelsCacheLock)
{
_modelsCache = null;
}

Copilot uses AI. Check for mistakes.
Comment on lines +565 to +568
if (_modelsCache is not null)
{
return new List<ModelInfo>(_modelsCache); // Return a copy to prevent cache mutation
}
Copy link
Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

new List<ModelInfo>(_modelsCache) only copies the list container; ModelInfo is a mutable reference type (settable properties), so callers can still mutate the cached objects and affect subsequent results. If the goal is to prevent cached data mutation, consider cloning the ModelInfo objects (deep-ish copy) or making the model types immutable; otherwise, please adjust the comment/remarks to clarify this is only a shallow collection copy.

Copilot uses AI. Check for mistakes.
@friggeri friggeri merged commit a552ae4 into main Jan 31, 2026
31 of 33 checks passed
@friggeri friggeri deleted the copilot/fix-rate-limiting-issue branch January 31, 2026 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Concurrent send_and_wait calls with model specified in session triggers Failed to list models: 429 rate limiting

2 participants

0