Rate limits

Per-key rate limits by plan, response headers, and how to back off cleanly.

Updated 2026-06-14

Rate limits apply per API key and combine a per-minute burst limit with a per-month quota. Hitting the per-minute limit returns 429 rate_limit_exceeded; exhausting the monthly quota on a write or AI request returns 429 monthly_quota_exceeded. Both carry a Retry-After header.

Limits by plan

| Plan | Per minute | Per month | Notes | |---|---|---|---| | Free | 60 | 1,000 | Read API only | | Starter | 300 | 10,000 | Read + write | | Pro | 1,500 read / 600 write | 100,000 | Full API | | Enterprise | Custom | Custom | Negotiated per contract |

Per-minute caps are scoped by tier, read and write each get their own window. Free and Starter share a single per-minute figure across both tiers; Pro splits them (1,500 read, 600 write).

Per-minute uses a sliding window resetting every 60 seconds. Per-month resets on the 1st of each calendar month, UTC, not your billing renewal date.

Multiple keys on one org each get their own burst limit but share the monthly quota.

Response headers

Every response, success or failure, includes:

| Header | Meaning | |---|---| | X-RateLimit-Limit | Total requests allowed in the current per-minute window | | X-RateLimit-Remaining | Requests left before the per-minute limit triggers | | X-RateLimit-Reset | Unix timestamp (seconds) when the per-minute window resets | | X-Monthly-Quota-Limit | Monthly request quota for the plan | | X-Monthly-Quota-Used | Requests counted against the monthly quota so far this period | | X-Monthly-Quota-Reset | Unix timestamp (seconds) when the monthly quota resets (1st of next month, UTC) | | X-Monthly-Quota-Exceeded | true on read responses served after the monthly quota is hit. You're still served, but you've passed your quota, upgrade or wait for the reset. | | Retry-After | Seconds to wait before retrying. Present only on 429 responses. |

429 response shape

A per-minute burst limit returns rate_limit_exceeded:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please try again later.",
    "retryAfter": 42
  }
}

A monthly quota limit (on write or AI tiers) returns monthly_quota_exceeded:

{
  "error": {
    "code": "monthly_quota_exceeded",
    "message": "Monthly API request quota exceeded. STARTER plan allows 10000 requests/month. Resets at 2026-07-01T00:00:00.000Z.",
    "limit": 10000,
    "used": 10000,
    "resetAt": "2026-07-01T00:00:00.000Z",
    "upgradeUrl": "https://thefaq.app/pricing"
  }
}

Branch your retry logic on the code. A rate_limit_exceeded means wait retryAfter seconds and try again. A monthly_quota_exceeded means upgrade or wait until resetAt, days, not seconds. Read requests are never blocked by the monthly quota; they keep serving with the X-Monthly-Quota-Exceeded: true header instead.

Staying within limits

Cache reads. Question lists and search results change far less than read traffic. The @faqapp/core SDK has built-in caching with configurable TTL. For your own stack, cache responses by the URL + auth-key pair.

Paginate with cursors, not by fetching everything. Use cursor from meta.pagination to walk through pages on demand. Don't fan out parallel requests for every page; that hits the burst limit fast.

Batch writes when you can. If you have 50 questions to import, send them as a single create-many request rather than 50 sequential POSTs.

Watch x-ratelimit-remaining. When it drops below ~20% of x-ratelimit-limit, slow down. Most rate-limiting incidents are visible 10–30 seconds before they hit.

Back off exponentially on repeated 429s. Start with the Retry-After value; on the second 429 double it; cap at ~60s. The SDK does this automatically.

Separate read and write keys. Burst spikes on read traffic shouldn't block your write workload. Two keys, two separate per-minute windows.