Guides

Rate Limits

Understand the rate limits and quotas for the CatLove AI API.

Overview

Rate limits help ensure fair usage and protect the service from abuse. Limits are applied per API key and vary based on your subscription tier.

We measure limits in three ways:

  • RPM - Requests per minute
  • TPM - Tokens per minute
  • RPD - Requests per day

Rate Limits by Tier

TierRPMTPMRPD
Free2010,000200
Basic6060,0001,000
Pro500500,00010,000
EnterpriseCustomCustomCustom

Need higher limits? Upgrade your plan or contact us for enterprise pricing.

Rate Limit Headers

API responses include headers with rate limit information:

x-ratelimit-limit-requests: 60
x-ratelimit-limit-tokens: 60000
x-ratelimit-remaining-requests: 59
x-ratelimit-remaining-tokens: 59900
x-ratelimit-reset-requests: 1s
x-ratelimit-reset-tokens: 100ms
  • x-ratelimit-limit-* - Your current limit
  • x-ratelimit-remaining-* - Remaining quota
  • x-ratelimit-reset-* - Time until reset

Handling 429 Errors

When you exceed the rate limit, the API returns a 429 status code. Here's how to handle it:

import time
from openai import RateLimitError

def make_request_with_backoff():
    while True:
        try:
            return client.chat.completions.create(...)
        except RateLimitError as e:
            # Get retry-after header or default to 60s
            retry_after = int(e.response.headers.get('retry-after', 60))
            print(f"Rate limited. Waiting {retry_after}s...")
            time.sleep(retry_after)

Best Practices

  • Implement backoff - Use exponential backoff for retries
  • Batch requests - Combine multiple operations when possible
  • Cache responses - Cache results to reduce API calls
  • Monitor usage - Track your usage in the dashboard
  • Use streaming - Streaming responses don't count against RPM until complete

Token vs Request Limits

Both request and token limits apply. A single request with many tokens can exhaust your TPM limit even if you have remaining RPM quota.