Rate Limits
Understand API rate limits by plan tier, monitor usage via response headers, and implement best practices for staying within limits.
Overview
TheFAQApp enforces rate limits per API key to ensure fair usage and platform stability. Limits are based on the plan tier and measured in two dimensions: monthly request quotas and per-minute burst limits. Both apply simultaneously — exceeding either threshold triggers a 429 Too Many Requests response.
Limits by Plan
| Plan | Requests/month | Burst (per minute) | Notes |
|---|---|---|---|
| FREE | 1,000 | 20 | Read-only access |
| STARTER | 10,000 | 60 | Read + write access |
| PRO | 100,000 | 200 | Full access including admin operations |
| ENTERPRISE | Custom | Custom | Tailored to the organization's needs |
Monthly quotas reset on the first day of each billing cycle. Burst limits use a sliding window that resets every 60 seconds. If an organization has multiple API keys, each key has its own burst limit but they share the monthly quota.
Response Headers
Every API response includes headers that let you monitor usage in real time:
| Header | Description |
|---|---|
X-RateLimit-Limit | Total requests allowed in the current window |
X-RateLimit-Remaining | Requests remaining before the limit is reached |
X-RateLimit-Reset | Unix timestamp (seconds) when the current window resets |
Retry-After | Seconds to wait before retrying (only present on 429 responses) |
When Limits Are Exceeded
Exceeding the rate limit returns a 429 Too Many Requests status with the standard error envelope:
{
"error": {
"code": "rate_limit_exceeded",
"message": "Rate limit exceeded. Please try again later.",
"retryAfter": 42
}
}The retryAfter value in the body matches the Retry-After header. Both indicate the number of seconds to wait before sending another request.
Best Practices for Staying Within Limits
Cache Responses Locally
The most effective way to reduce API calls is caching. The TypeScript SDK includes built-in caching with configurable TTL. For custom integrations, cache question lists and search results on the server side — content changes infrequently relative to read traffic.
Use Pagination Efficiently
Avoid fetching entire datasets in a single request. Use page and limit parameters to retrieve only the records needed for the current view. For background sync jobs, paginate through results sequentially rather than making parallel requests for every page.
Batch Write Operations
When creating or updating multiple questions, group them into fewer requests where possible. The bulk endpoints (available on Pro plans) let you create or update up to 50 questions in a single call.
Monitor Headers Proactively
Check X-RateLimit-Remaining in each response and slow down when the value drops below a comfortable threshold. This prevents hitting hard limits and avoids disruptive 429 errors during peak traffic.
Implement Exponential Backoff
When a 429 response occurs, wait for the Retry-After duration before retrying. For subsequent failures, increase the wait time exponentially (e.g., 1s → 2s → 4s → 8s) with a maximum cap. The SDK handles this automatically with configurable retry policies.
Separate Read and Write Keys
Using distinct API keys for read-heavy and write-heavy operations provides clearer usage tracking and avoids a burst of read traffic inadvertently blocking write operations.