Rate limits
Per-key rate limits by plan, response headers, and how to back off cleanly.
Updated 2026-05-20
Rate limits apply per API key and combine a per-minute burst limit with a per-month quota. Hitting either returns 429 rate_limit_exceeded with a Retry-After header.
Limits by plan
| Plan | Per minute | Per month | Notes |
|---|---|---|---|
| Free | 60 | 1,000 | Read API only |
| Starter | 300 | 10,000 | Read + write |
| Pro | 1,200 | 100,000 | Full API |
| Enterprise | Custom | Custom | Negotiated per contract |
Per-minute uses a sliding window resetting every 60 seconds. Per-month resets on your billing renewal date (see the renewal field on Plan).
Multiple keys on one org each get their own burst limit but share the monthly quota.
Response headers
Every response, success or failure, includes:
| Header | Meaning |
|---|---|
x-ratelimit-limit | Total requests allowed in the current per-minute window |
x-ratelimit-remaining | Requests left before the per-minute limit triggers |
x-ratelimit-reset | Unix timestamp (seconds) when the per-minute window resets |
x-faq-soft-cap | true on read responses after the monthly quota is hit. You’re still served, but rate-limited harder. |
Retry-After | Seconds to wait before retrying. Present only on 429 responses. |
429 response shape
{
"error": {
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Retry after 42 seconds.",
"details": { "retryAfter": 42, "window": "per_minute" },
"traceId": "trc_8X3FpQk"
}
}
details.window is one of per_minute or per_month so your retry logic can be smarter. A per-month limit means upgrade or wait days, not seconds.
Staying within limits
Cache reads. Question lists and search results change far less than read traffic. The @faqapp/core SDK has built-in caching with configurable TTL. For your own stack, cache responses by the URL + auth-key pair.
Paginate with cursors, not by fetching everything. Use cursor from meta.pagination to walk through pages on demand. Don’t fan out parallel requests for every page; that hits the burst limit fast.
Batch writes when you can. If you have 50 questions to import, send them as a single create-many request rather than 50 sequential POSTs.
Watch x-ratelimit-remaining. When it drops below ~20% of x-ratelimit-limit, slow down. Most rate-limiting incidents are visible 10–30 seconds before they hit.
Back off exponentially on repeated 429s. Start with the Retry-After value; on the second 429 double it; cap at ~60s. The SDK does this automatically.
Separate read and write keys. Burst spikes on read traffic shouldn’t block your write workload. Two keys, two separate per-minute windows.