Guides
Rate Limits
Understand the rate limits and quotas for the CatLove AI API.
Overview
Rate limits help ensure fair usage and protect the service from abuse. Limits are applied per API key and vary based on your subscription tier.
We measure limits in three ways:
- RPM - Requests per minute
- TPM - Tokens per minute
- RPD - Requests per day
Rate Limits by Tier
| Tier | RPM | TPM | RPD |
|---|---|---|---|
| Free | 20 | 10,000 | 200 |
| Basic | 60 | 60,000 | 1,000 |
| Pro | 500 | 500,000 | 10,000 |
| Enterprise | Custom | Custom | Custom |
Need higher limits? Upgrade your plan or contact us for enterprise pricing.
Rate Limit Headers
API responses include headers with rate limit information:
x-ratelimit-limit-requests: 60
x-ratelimit-limit-tokens: 60000
x-ratelimit-remaining-requests: 59
x-ratelimit-remaining-tokens: 59900
x-ratelimit-reset-requests: 1s
x-ratelimit-reset-tokens: 100msx-ratelimit-limit-*- Your current limitx-ratelimit-remaining-*- Remaining quotax-ratelimit-reset-*- Time until reset
Handling 429 Errors
When you exceed the rate limit, the API returns a 429 status code. Here's how to handle it:
import time
from openai import RateLimitError
def make_request_with_backoff():
while True:
try:
return client.chat.completions.create(...)
except RateLimitError as e:
# Get retry-after header or default to 60s
retry_after = int(e.response.headers.get('retry-after', 60))
print(f"Rate limited. Waiting {retry_after}s...")
time.sleep(retry_after)Best Practices
- Implement backoff - Use exponential backoff for retries
- Batch requests - Combine multiple operations when possible
- Cache responses - Cache results to reduce API calls
- Monitor usage - Track your usage in the dashboard
- Use streaming - Streaming responses don't count against RPM until complete
Token vs Request Limits
Both request and token limits apply. A single request with many tokens can exhaust your TPM limit even if you have remaining RPM quota.