Dynamic shared quota

Dynamic shared quota distributes on-demand capacity among all queries being processed by Google Cloud services. This capability eliminates the need for you to submit quota increase requests (QIRs).

Supported Google model versions

The Google models and their versions that support dynamic shared quota are the following:

Gemini 1.5 Flash (gemini-1.5-flash-002)
Gemini 1.5 Pro (gemini-1.5-pro-002)

Other supported models

For information about Claude models that support dynamic shared quota, see Use the Claude models from Anthropic.

Example of how dynamic shared quota works

Google Cloud looks at the available capacity in a specific region, such as North America, and then looks at how many projects are sending requests. Consider project A, which sends 25 queries per minute (QPM), and project B, which sends 25 QPM. The service can support 100 QPM. If project A increases the rate of its queries to 75 QPM, then dynamic shared quota supports the increase. If project A increases the rate of its queries to 100 QPM, then dynamic shared quota throttles project A down to 75 QPM in order to continue to serve project B at 25 QPM.

To troubleshoot errors that might occur with the use of dynamic shared quota, see Troubleshoot quota errors.

Considerations

Consideration	Solution
Control cost and prevent budget overruns.	Configure a self-imposed quota called a consumer quota override. For more information, see Creating a consumer quota override.
Prioritize traffic.	Use Provisioned Throughput.
Monitor your usage.	View the following metrics: `publisher/online_serving/token_count` `publisher/online_serving/tokens` For more information, see the `aiplatform` section in the Cloud Monitoring documentation.

What's next

To learn more about Gemini models that support dynamic shared quota, see Gemini models.
To learn more about Generative AI quotas and limits, see Generative AI on Vertex AI rate limits.
To learn more about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
To learn more about Google Cloud quotas and limits, see Understand quota values and system limits.