Rate Limits

Understand API rate limits by plan tier, monitor usage via response headers, and implement best practices for staying within limits.

Overview

TheFAQApp enforces rate limits per API key to ensure fair usage and platform stability. Limits are based on the plan tier and measured in two dimensions: monthly request quotas and per-minute burst limits. Both apply simultaneously — exceeding either threshold triggers a 429 Too Many Requests response.

Limits by Plan

Plan	Requests/month	Burst (per minute)	Notes
FREE	1,000	20	Read-only access
STARTER	10,000	60	Read + write access
PRO	100,000	200	Full access including admin operations
ENTERPRISE	Custom	Custom	Tailored to the organization's needs

Monthly quotas reset on the first day of each billing cycle. Burst limits use a sliding window that resets every 60 seconds. If an organization has multiple API keys, each key has its own burst limit but they share the monthly quota.

Response Headers

Every API response includes headers that let you monitor usage in real time:

Header	Description
`X-RateLimit-Limit`	Total requests allowed in the current window
`X-RateLimit-Remaining`	Requests remaining before the limit is reached
`X-RateLimit-Reset`	Unix timestamp (seconds) when the current window resets
`Retry-After`	Seconds to wait before retrying (only present on 429 responses)

When Limits Are Exceeded

Exceeding the rate limit returns a 429 Too Many Requests status with the standard error envelope:

{
  "error": {
    "code": "rate_limit_exceeded",
    "message": "Rate limit exceeded. Please try again later.",
    "retryAfter": 42
  }
}

The retryAfter value in the body matches the Retry-After header. Both indicate the number of seconds to wait before sending another request.

Best Practices for Staying Within Limits

Cache Responses Locally

The most effective way to reduce API calls is caching. The TypeScript SDK includes built-in caching with configurable TTL. For custom integrations, cache question lists and search results on the server side — content changes infrequently relative to read traffic.

Use Pagination Efficiently

Avoid fetching entire datasets in a single request. Use page and limit parameters to retrieve only the records needed for the current view. For background sync jobs, paginate through results sequentially rather than making parallel requests for every page.

Batch Write Operations

When creating or updating multiple questions, group them into fewer requests where possible. The bulk endpoints (available on Pro plans) let you create or update up to 50 questions in a single call.

Monitor Headers Proactively

Check X-RateLimit-Remaining in each response and slow down when the value drops below a comfortable threshold. This prevents hitting hard limits and avoids disruptive 429 errors during peak traffic.

Implement Exponential Backoff

When a 429 response occurs, wait for the Retry-After duration before retrying. For subsequent failures, increase the wait time exponentially (e.g., 1s → 2s → 4s → 8s) with a maximum cap. The SDK handles this automatically with configurable retry policies.

Separate Read and Write Keys

Using distinct API keys for read-heavy and write-heavy operations provides clearer usage tracking and avoids a burst of read traffic inadvertently blocking write operations.

On this page