Home Docs Pricing Get API Key โ†’
API v2.0 ยท All systems operational

API Documentation

Papalily lets you extract structured data from any website using a real browser and AI. Send a URL and a plain-English description โ€” get back clean JSON.

๐Ÿ’ก
Base URL: https://api.papalily.com ยท Average response time: 8โ€“15 seconds

Introduction

The Papalily API is a REST API that accepts JSON and returns JSON. It uses a real Chromium browser to render JavaScript-heavy sites (React, Vue, Angular, Next.js, etc.) before extracting data with Gemini AI.

Unlike traditional scrapers that break when a site's HTML structure changes, Papalily uses AI to understand the page semantically โ€” your prompts keep working even after site redesigns.

Authentication

All API requests (except /health) require an API key passed in the x-api-key request header.

Authentication header
curl https://api.papalily.com/scrape \ -H "x-api-key: YOUR_API_KEY" \ ...
โš ๏ธ
Never expose your API key in client-side code. Always make requests from your server.

Get your API key from RapidAPI. Free tier includes 100 requests/month.

Quick Start

Make your first request in under 60 seconds: Try the live demo โ†’

curl -X POST https://api.papalily.com/scrape \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url":"https://news.ycombinator.com","prompt":"Top 5 post titles"}'

POST /scrape Try it โ†’

The main endpoint. Renders the target URL in a real browser and extracts the requested data using AI. Average response time: 8โ€“15 seconds depending on page complexity.

Request Body

ParameterTypeDescription
urlstringRequired. The URL to scrape. Must be a valid http/https URL.
promptstringRequired. Plain-English description of what data to extract.
wait_msnumberExtra milliseconds to wait after page load. Default: 2000. Max: 10000.
screenshotbooleanInclude page screenshot in AI analysis. Default: true. Improves accuracy.

Response

FieldTypeDescription
successbooleanWhether the request succeeded.
request_idstringUnique ID for this request (UUID). Use with GET /status.
urlstringFinal URL after any redirects.
titlestringPage title.
dataobjectThe extracted data as structured JSON.
meta.duration_msnumberTotal processing time in milliseconds.
meta.cachedbooleantrue if result was served from cache.
meta.scraped_atstringISO 8601 timestamp.
Example response
{ "success": true, "request_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "url": "https://news.ycombinator.com", "title": "Hacker News", "data": { "posts": [ { "title": "Show HN: I built an AI scraper", "url": "https://..." } ] }, "meta": { "duration_ms": 12650, "cached": false, "scraped_at": "2026-03-05T11:00:00.000Z" } }

POST /batch Try it โ†’

Scrape up to 5 URLs simultaneously in a single request. All jobs run in parallel โ€” dramatically faster than making individual /scrape calls.

Request Body

ParameterTypeDescription
itemsarrayRequired. Array of 1โ€“5 objects, each with url and prompt.
Batch request example
curl -X POST https://api.papalily.com/batch \ -H "x-api-key: YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "items": [ { "url": "https://site1.com", "prompt": "Get product prices" }, { "url": "https://site2.com", "prompt": "Get article titles" }, { "url": "https://site3.com", "prompt": "Get contact info" } ] }'

Response

Batch response
{ "batch_id": "a1b2c3d4-...", "count": 3, "results": [ { "success": true, "url": "https://site1.com", "data": { ... }, "meta": { "duration_ms": 9800 } }, { "success": true, "url": "https://site2.com", "data": { ... }, "meta": { "duration_ms": 11200 } }, { "success": false, "url": "https://site3.com", "error": "Timeout" } ] }

GET /usage

Returns usage statistics for your API key. Requires authentication.

Usage request
curl https://api.papalily.com/usage \ -H "x-api-key: YOUR_API_KEY" # Response { "plan": "free", "requests_used": 42, "requests_limit": 100, "requests_remaining": 58, "reset_date": "2026-04-01" }

GET /status/:requestId

Look up a past request by its ID. Every /scrape and /batch response includes a request_id you can use here.

Status request
curl https://api.papalily.com/status/f47ac10b-58cc-4372-a567-0e02b2c3d479 \ -H "x-api-key: YOUR_API_KEY" # Response { "id": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "url": "https://news.ycombinator.com", "prompt": "Top 5 post titles", "duration_ms": 12650, "success": true, "error": null, "created_at": "2026-03-05 11:00:00" }

GET /health

Health check endpoint. Returns API status. No authentication required.

# Request curl https://api.papalily.com/health # Response { "status": "ok", "ts": "2026-03-05T11:00:00.000Z" }

Writing Good Prompts

The quality of extracted data depends on how clearly you describe what you want. Here are some tips:

  • Be specific: "Get all product names and their USD prices" beats "Get products"
  • Mention structure: "Return as an array of objects with name and price fields"
  • Specify limits: "Get the top 10 results" or "Get all items on the page"
  • Use domain language: "Get the article headline, author, and publication date"
โœ…
Good prompt: "Get all job listings with title, company, location, and salary range as an array"
โš ๏ธ
Vague prompt: "Get jobs" โ€” may return incomplete or inconsistent structure

Caching

Papalily caches scrape results in memory for 10 minutes. If you request the same URL + prompt combination within the TTL window, you'll get an instant response from cache.

  • Cache key: url::prompt (both lowercased)
  • TTL: 10 minutes from first scrape
  • Cache hit is indicated by meta.cached: true in the response
  • Cached responses are instant โ€” no browser launch or AI processing
  • Cache is per-server and resets on restart
๐Ÿ’ก
Cached responses still count toward your usage quota but are served in milliseconds instead of seconds.

Rate Limits

PlanRequests/monthRequests/minuteBatch size
Free10055 URLs
ProUnlimited305 URLs
EnterpriseUnlimitedCustom5 URLs

When rate limited, the API returns HTTP 429 Too Many Requests. Implement exponential backoff in your client.

Error Codes

HTTP StatusCodeDescription
400Bad RequestMissing or invalid url or prompt
401UnauthorizedMissing x-api-key header
403ForbiddenInvalid API key
404Not FoundRequest ID not found (GET /status)
429Too Many RequestsRate limit or monthly quota exceeded
500Server ErrorBrowser failed to load page or AI extraction failed

Code Examples

E-commerce: Product Listings

{ "url": "https://shop.example.com/laptops", "prompt": "Get all laptop listings with name, price, rating, and review count" }

News: Article Data

{ "url": "https://techcrunch.com", "prompt": "Get the 10 most recent article titles, authors, dates, and URLs" }

Batch: Multiple Sites at Once

{ "items": [ { "url": "https://amazon.com/s?k=laptops", "prompt": "Top 5 products with price" }, { "url": "https://bestbuy.com/laptops", "prompt": "Top 5 products with price" }, { "url": "https://newegg.com/laptops", "prompt": "Top 5 products with price" } ] }