API Documentation
Papalily lets you extract structured data from any website using a real browser and AI. Send a URL and a plain-English description โ get back clean JSON.
https://api.papalily.com ยท Average response time: 8โ15 secondsIntroduction
The Papalily API is a REST API that accepts JSON and returns JSON. It uses a real Chromium browser to render JavaScript-heavy sites (React, Vue, Angular, Next.js, etc.) before extracting data with Gemini AI.
Unlike traditional scrapers that break when a site's HTML structure changes, Papalily uses AI to understand the page semantically โ your prompts keep working even after site redesigns.
Authentication
All API requests (except /health) require an API key passed in the x-api-key request header.
Get your API key from RapidAPI. Free tier includes 100 requests/month.
Quick Start
Make your first request in under 60 seconds: Try the live demo โ
POST /scrape Try it โ
The main endpoint. Renders the target URL in a real browser and extracts the requested data using AI. Average response time: 8โ15 seconds depending on page complexity.
Request Body
| Parameter | Type | Description |
|---|---|---|
| url | string | Required. The URL to scrape. Must be a valid http/https URL. |
| prompt | string | Required. Plain-English description of what data to extract. |
| wait_ms | number | Extra milliseconds to wait after page load. Default: 2000. Max: 10000. |
| screenshot | boolean | Include page screenshot in AI analysis. Default: true. Improves accuracy. |
Response
| Field | Type | Description |
|---|---|---|
| success | boolean | Whether the request succeeded. |
| request_id | string | Unique ID for this request (UUID). Use with GET /status. |
| url | string | Final URL after any redirects. |
| title | string | Page title. |
| data | object | The extracted data as structured JSON. |
| meta.duration_ms | number | Total processing time in milliseconds. |
| meta.cached | boolean | true if result was served from cache. |
| meta.scraped_at | string | ISO 8601 timestamp. |
POST /batch Try it โ
Scrape up to 5 URLs simultaneously in a single request. All jobs run in parallel โ dramatically faster than making individual /scrape calls.
Request Body
| Parameter | Type | Description |
|---|---|---|
| items | array | Required. Array of 1โ5 objects, each with url and prompt. |
Response
GET /usage
Returns usage statistics for your API key. Requires authentication.
GET /status/:requestId
Look up a past request by its ID. Every /scrape and /batch response includes a request_id you can use here.
GET /health
Health check endpoint. Returns API status. No authentication required.
Writing Good Prompts
The quality of extracted data depends on how clearly you describe what you want. Here are some tips:
- Be specific: "Get all product names and their USD prices" beats "Get products"
- Mention structure: "Return as an array of objects with name and price fields"
- Specify limits: "Get the top 10 results" or "Get all items on the page"
- Use domain language: "Get the article headline, author, and publication date"
Caching
Papalily caches scrape results in memory for 10 minutes. If you request the same URL + prompt combination within the TTL window, you'll get an instant response from cache.
- Cache key:
url::prompt(both lowercased) - TTL: 10 minutes from first scrape
- Cache hit is indicated by
meta.cached: truein the response - Cached responses are instant โ no browser launch or AI processing
- Cache is per-server and resets on restart
Rate Limits
| Plan | Requests/month | Requests/minute | Batch size |
|---|---|---|---|
| Free | 100 | 5 | 5 URLs |
| Pro | Unlimited | 30 | 5 URLs |
| Enterprise | Unlimited | Custom | 5 URLs |
When rate limited, the API returns HTTP 429 Too Many Requests. Implement exponential backoff in your client.
Error Codes
| HTTP Status | Code | Description |
|---|---|---|
| 400 | Bad Request | Missing or invalid url or prompt |
| 401 | Unauthorized | Missing x-api-key header |
| 403 | Forbidden | Invalid API key |
| 404 | Not Found | Request ID not found (GET /status) |
| 429 | Too Many Requests | Rate limit or monthly quota exceeded |
| 500 | Server Error | Browser failed to load page or AI extraction failed |