Skip to content

Rate limits

Per-key rate limits by plan, response headers, and how to back off cleanly.

Updated 2026-05-20

Rate limits apply per API key and combine a per-minute burst limit with a per-month quota. Hitting either returns 429 rate_limit_exceeded with a Retry-After header.

Limits by plan

PlanPer minutePer monthNotes
Free601,000Read API only
Starter30010,000Read + write
Pro1,200100,000Full API
EnterpriseCustomCustomNegotiated per contract

Per-minute uses a sliding window resetting every 60 seconds. Per-month resets on your billing renewal date (see the renewal field on Plan).

Multiple keys on one org each get their own burst limit but share the monthly quota.

Response headers

Every response, success or failure, includes:

HeaderMeaning
x-ratelimit-limitTotal requests allowed in the current per-minute window
x-ratelimit-remainingRequests left before the per-minute limit triggers
x-ratelimit-resetUnix timestamp (seconds) when the per-minute window resets
x-faq-soft-captrue on read responses after the monthly quota is hit. You’re still served, but rate-limited harder.
Retry-AfterSeconds to wait before retrying. Present only on 429 responses.

429 response shape

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Retry after 42 seconds.",
    "details": { "retryAfter": 42, "window": "per_minute" },
    "traceId": "trc_8X3FpQk"
  }
}

details.window is one of per_minute or per_month so your retry logic can be smarter. A per-month limit means upgrade or wait days, not seconds.

Staying within limits

Cache reads. Question lists and search results change far less than read traffic. The @faqapp/core SDK has built-in caching with configurable TTL. For your own stack, cache responses by the URL + auth-key pair.

Paginate with cursors, not by fetching everything. Use cursor from meta.pagination to walk through pages on demand. Don’t fan out parallel requests for every page; that hits the burst limit fast.

Batch writes when you can. If you have 50 questions to import, send them as a single create-many request rather than 50 sequential POSTs.

Watch x-ratelimit-remaining. When it drops below ~20% of x-ratelimit-limit, slow down. Most rate-limiting incidents are visible 10–30 seconds before they hit.

Back off exponentially on repeated 429s. Start with the Retry-After value; on the second 429 double it; cap at ~60s. The SDK does this automatically.

Separate read and write keys. Burst spikes on read traffic shouldn’t block your write workload. Two keys, two separate per-minute windows.