API v1.3.0 — All systems operational
API Documentation
Papalily lets you extract structured data from any website using a real browser and AI. Send a URL and a plain-English description — get back clean JSON. Handles React, Vue, Next.js, Angular, and any JavaScript-rendered site.
💡
Base URL:
https://api.papalily.com • Get your API key at
RapidAPI
Introduction
The Papalily API is a REST API that accepts JSON and returns JSON. It uses a real Chromium browser to render pages, then uses Gemini AI vision to extract data from both the page screenshot and text.
Key features in v1.3.0:
- Interactive browser control — new
/interact endpoint executes JS, clicks, form fills, and pagination on live pages
- Natural language task planner — describe what you want in plain English; AI plans and executes the steps automatically
- Persistent sessions — keep a browser alive across multiple API calls for complex multi-step workflows
- Zero-AI CSS extraction — use
css_schema for fast, cost-free structured extraction when page structure is known
- Vision-first extraction — AI reads the full-page screenshot to find pricing cards, grids, tables, and visual layouts
- Auto-translation — non-English results automatically translated to English
- Smart caching — repeated requests return instantly and don’t count against your quota
Authentication
All requests require your RapidAPI key in the X-RapidAPI-Key header. Subscribe at RapidAPI — the free plan includes 50 requests, no credit card needed.
X-RapidAPI-Key: YOUR_RAPIDAPI_KEY
X-RapidAPI-Host: papalily.p.rapidapi.com
Content-Type: application/json
⚠️
Never expose your API key in client-side JavaScript. Always make requests from your server or backend.
Quick Start
Make your first request in under 60 seconds:
curl -X POST https://papalily.p.rapidapi.com/scrape \
-H "X-RapidAPI-Key: YOUR_RAPIDAPI_KEY" \
-H "X-RapidAPI-Host: papalily.p.rapidapi.com" \
-H "Content-Type: application/json" \
-d '{"url":"https://news.ycombinator.com","prompt":"Top 5 post titles and URLs"}'
const res = await fetch('https://papalily.p.rapidapi.com/scrape', {
method: 'POST',
headers: {
'X-RapidAPI-Key': 'YOUR_RAPIDAPI_KEY',
'X-RapidAPI-Host': 'papalily.p.rapidapi.com',
'Content-Type': 'application/json'
},
body: JSON.stringify({
url: 'https://news.ycombinator.com',
prompt: 'Top 5 post titles and URLs'
})
});
const { data } = await res.json();
console.log(data);
import requests
response = requests.post(
'https://papalily.p.rapidapi.com/scrape',
headers={
'X-RapidAPI-Key': 'YOUR_RAPIDAPI_KEY',
'X-RapidAPI-Host': 'papalily.p.rapidapi.com',
},
json={
'url': 'https://news.ycombinator.com',
'prompt': 'Top 5 post titles and URLs'
}
)
data = response.json()['data']
print(data)
POST /scrape
The core endpoint. Renders the URL in a real Chromium browser, takes a full-page screenshot, and uses Gemini AI vision to extract exactly what you ask for. Average response time: 5–10 seconds.
Request Body
| Parameter | Type | Required | Description |
url | string | ✓ | The URL to scrape. Must be a valid http/https URL. |
prompt | string | ✓ | Plain-English description of what to extract. Be specific for best results. |
wait_ms | number | | Extra ms to wait after load. Default: auto (adaptive). Max: 10000. Only needed for unusually slow pages. |
screenshot | boolean | | Include screenshot in AI analysis. Default: true. Strongly recommended for visual layouts. |
no_cache | boolean | | Bypass cache and force a fresh scrape. Default: false. Cache hits don’t count against quota. |
proxy_url | string | | Route the browser through your proxy. Format: http://user:pass@host:port. See Proxy & Geo. |
Response
{
"success": true,
"request_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"url": "https://news.ycombinator.com",
"title": "Hacker News",
"data": {
"posts": [
{ "title": "Show HN: I built an AI scraper", "url": "https://..." },
{ "title": "The future of web data", "url": "https://..." }
]
},
"meta": {
"duration_ms": 7240,
"scraped_at": "2026-03-09T12:00:00.000Z",
"cached": false
}
}
Proxy & Geo-Targeting
By default, requests originate from our server in Asia. For geo-specific content (e.g. US pricing, region-locked pages), pass your own proxy URL in proxy_url. The browser will route through that IP.
{
"url": "https://www.shopify.com/pricing",
"prompt": "Get all pricing plans with USD prices",
"proxy_url": "http://username:password@us-proxy.example.com:8080",
"no_cache": true
}
| Proxy format | Example |
| HTTP with auth | http://user:pass@host:port |
| HTTP no auth | http://host:port |
| HTTPS | https://user:pass@host:port |
| SOCKS5 | socks5://user:pass@host:port |
🌐
Without a proxy, results may show local currency/language based on server location. The API automatically translates non-English content to English regardless.
POST /interact
Execute interactive steps on a real browser page — type into forms, click buttons, paginate, and extract data, all in one request. Accepts either an explicit steps array or a plain-English task string.
✨
New in v1.3.0. Use task mode to describe your goal in plain English — the AI plans the steps automatically.
Request body
| Field | Type | Required | Description |
url | string | ✓ | Starting URL for the browser |
task | string | ✓ or steps | Plain-English goal — AI plans and executes steps automatically |
steps | array | ✓ or task | Explicit step array (see step types below) |
proxy_url | string | | Proxy for geo-targeted requests |
Step types
| Step | Fields | AI cost | Description |
js | js, wait_for?, wait_ms? | None | Execute raw JavaScript — click, type, submit, scroll, anything |
navigate | navigate, wait_for?, wait_ms? | None | Navigate to a new URL |
wait | wait_for?, wait_ms? | None | Wait for an element or a fixed delay |
css_schema | css_schema.base, css_schema.fields[] | None | Extract structured data via CSS selectors — fast and free |
extract | extract (prompt string) | Yes | AI vision extraction — use when structure is unknown or complex |
screenshot | screenshot: true | None | Capture the current page as JPEG (base64) |
Example: task mode (natural language)
{
"url": "https://news.ycombinator.com",
"task": "Get the top 10 post titles with their URLs and point scores"
}
Example: steps mode (explicit control)
{
"url": "https://news.ycombinator.com",
"steps": [
{
"css_schema": {
"base": "tr.athing",
"fields": [
{ "name": "title", "selector": ".titleline a", "type": "text" },
{ "name": "url", "selector": ".titleline a", "type": "href" }
]
}
},
{ "js": "document.querySelector('a.morelink').click()", "wait_for": "tr.athing", "wait_ms": 1000 },
{ "extract": "All post titles and URLs from page 2" }
]
}
Response
{
"success": true,
"url": "https://news.ycombinator.com/",
"steps_executed": 3,
"steps_total": 3,
"ai_steps": 1,
"results": [
{ "step": 1, "type": "css_schema", "data": [ /* array of items */ ] },
{ "step": 2, "type": "js", "url": "https://news.ycombinator.com/?p=2" },
{ "step": 3, "type": "extract", "data": [ /* array of items */ ] }
],
"plan": { "task": "...", "steps": [ /* generated plan */ ], "cached": false, "planning_ms": 2322 },
"meta": { "duration_ms": 8354, "mode": "task" }
}
Persistent Sessions
Sessions keep a real browser alive between API calls — ideal for multi-step workflows like logging in, paginating through results, or comparing data across pages. Available on Pro plan and above.
⏱️
Sessions expire after 10 minutes of inactivity. Always call DELETE /session/:id when done to free resources immediately.
Session lifecycle
| Endpoint | Method | Description |
/session/start | POST | Open browser, navigate to URL, get session_id |
/session/:id/step | POST | Execute one step or a task on the live page |
/session/:id/state | GET | Get current URL, title, and screenshot |
/session/:id | DELETE | Close session and free browser resources |
Session limits by plan
| Plan | Concurrent sessions |
| Basic (Free) | Not available |
| Pro | 3 |
| Ultra | 10 |
| Mega | 20 |
Example: multi-page workflow
Node.js — session workflow
const BASE = 'https://papalily.p.rapidapi.com';
const HEADERS = {
'X-RapidAPI-Key': 'YOUR_KEY',
'X-RapidAPI-Host': 'papalily.p.rapidapi.com',
'Content-Type': 'application/json'
};
// 1. Start session
const { session_id } = await post(`${BASE}/session/start`, { url: 'https://news.ycombinator.com' });
// 2. Extract page 1 (zero-AI, CSS schema)
const page1 = await post(`${BASE}/session/${session_id}/step`, {
css_schema: { base: 'tr.athing', fields: [{ name: 'title', selector: '.titleline a', type: 'text' }] }
});
// 3. Click next page
await post(`${BASE}/session/${session_id}/step`, {
js: "document.querySelector('a.morelink').click()",
wait_for: 'tr.athing', wait_ms: 1000
});
// 4. Extract page 2
const page2 = await post(`${BASE}/session/${session_id}/step`, {
css_schema: { base: 'tr.athing', fields: [{ name: 'title', selector: '.titleline a', type: 'text' }] }
});
// 5. Close session
await del(`${BASE}/session/${session_id}`);
GET /usage
Returns your current plan, requests used, and remaining quota for the billing period.
# Request
curl https://papalily.p.rapidapi.com/usage \
-H "X-RapidAPI-Key: YOUR_RAPIDAPI_KEY" \
-H "X-RapidAPI-Host: papalily.p.rapidapi.com"
# Response
{
"success": true,
"plan": "pro",
"requests_used": 47,
"requests_limit": 1000,
"requests_remaining": 953
}
GET /health
Health check and cache statistics. No authentication required.
{ "status": "ok", "ts": "2026-03-09T12:00:00.000Z", "cache": { "size": 12, "maxSize": 500 } }
Writing Good Prompts
- Be specific: “Get all product names and their USD prices” beats “Get products”
- Mention structure: “Return as an array of objects with name and price fields”
- Specify scope: “Get the top 10 results” or “Get ALL items on the page”
- Use domain language: “Get the article headline, author, and publication date”
- For pricing pages: “Get all pricing plans with plan name, monthly price, annual price, and included features”
Caching
Papalily caches successful results for 10 minutes. Repeated requests with the same URL + prompt return instantly — and don’t count against your quota.
Cached responses include "cached": true in meta.
| Behaviour | Detail |
| Cache TTL | 10 minutes per URL + prompt pair |
| Max entries | 500 (LRU eviction when full) |
| Failed responses | Never cached — errors always retry fresh |
| Force refresh | Pass "no_cache": true |
| Quota impact | Cache hits do not count against monthly quota |
Rate Limits & Plans
| Plan | Requests / month | Price |
| Basic | 50 | Free |
| Pro | 1,000 | $20 / month |
| Ultra | 20,000 | $100 / month |
| Mega | 100,000 | $300 / month |
All plans subject to a concurrent request limit of 3 simultaneous scrapes. Cache hits are free and don’t count toward monthly quota.
Error Codes
| HTTP Status | Description |
| 400 | Missing or invalid url or prompt |
| 401 | Missing API key header |
| 403 | Invalid or unauthorised API key |
| 410 | Endpoint removed — see changelog for migration guide |
| 429 | Monthly quota exceeded or rate limit hit. Upgrade at RapidAPI |
| 500 | Browser rendering or AI extraction failed. Retry with no_cache: true |
Code Examples
Real-world scenarios using /scrape and the new /interact endpoint.
📦 POST /scrape — Static Extraction
E-commerce: Product Listings
{
"url": "https://shop.example.com/laptops",
"prompt": "Get all laptop listings with name, price, rating, and review count"
}
SaaS: Competitor Pricing
{
"url": "https://www.shopify.com/pricing",
"prompt": "Get all pricing plans with plan name, monthly price, annual price, and top 3 features"
}
News: Latest Headlines
{
"url": "https://techcrunch.com",
"prompt": "Get the 10 most recent article titles, authors, publish dates, and URLs"
}
Multiple URLs in parallel
const targets = [
{ url: 'https://news.ycombinator.com', prompt: 'Top 5 post titles and scores' },
{ url: 'https://github.com/trending', prompt: 'Top 5 trending repos with stars' },
{ url: 'https://lobste.rs', prompt: 'Top 5 post titles and tags' },
];
const results = await Promise.all(targets.map(item =>
fetch('https://papalily.p.rapidapi.com/scrape', {
method: 'POST',
headers: { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' },
body: JSON.stringify(item),
}).then(r => r.json())
));
⚡ POST /interact — Browser Automation
🛍️ E-commerce: Scrape All Pages of Search Results
Click “Next page” and extract products across multiple pages in one request.
{
"url": "https://shop.example.com/search?q=laptop",
"steps": [
{
"css_schema": {
"base": ".product-card",
"fields": [
{ "name": "name", "selector": ".product-title", "type": "text" },
{ "name": "price", "selector": ".price", "type": "text" },
{ "name": "rating", "selector": ".star-rating", "type": "text" },
{ "name": "url", "selector": "a", "type": "href" }
]
}
},
{ "js": "document.querySelector('.pagination-next').click()", "wait_for": ".product-card", "wait_ms": 1500 },
{
"css_schema": {
"base": ".product-card",
"fields": [
{ "name": "name", "selector": ".product-title", "type": "text" },
{ "name": "price", "selector": ".price", "type": "text" }
]
}
}
]
}
🔍 Site Search: Submit a Query and Extract Results
Type into a search box, submit the form, and extract results — zero selectors required with task mode.
{
"url": "https://news.ycombinator.com",
"task": "Search for 'AI' using the search box and return the top 10 results with title and URL"
}
📊 Finance: Stock Price + Key Metrics
Navigate to a ticker page and extract current data, no API key or auth needed.
{
"url": "https://finance.yahoo.com/quote/AAPL",
"steps": [
{ "wait_for": "[data-symbol]", "wait_ms": 2000 },
{
"extract": "Current stock price, change, percent change, market cap, P/E ratio, 52-week high and low"
}
]
}
💼 Job Board: Search and Filter Listings
Search for a role, apply a filter, and extract all matching job postings.
{
"url": "https://jobs.example.com",
"steps": [
{ "js": "document.querySelector('input[placeholder*=Search]').value = 'frontend engineer'" },
{ "js": "document.querySelector('button[type=submit]').click()", "wait_for": ".job-card", "wait_ms": 2000 },
{ "js": "document.querySelector('[data-filter=remote]').click()", "wait_ms": 1000 },
{
"css_schema": {
"base": ".job-card",
"fields": [
{ "name": "title", "selector": ".job-title", "type": "text" },
{ "name": "company", "selector": ".company", "type": "text" },
{ "name": "location", "selector": ".location", "type": "text" },
{ "name": "salary", "selector": ".salary", "type": "text" },
{ "name": "url", "selector": "a.job-link", "type": "href" }
]
}
}
]
}
📰 News Aggregator: Multi-Source Headlines in One Call
Use task mode to collect and summarise top stories from multiple tabs in one clean payload.
{
"url": "https://news.ycombinator.com",
"task": "Get the top 20 post titles, their scores, comment counts, and URLs"
}
🏠 Real Estate: Property Listings with Infinite Scroll
Scroll to load more listings before extracting — handles infinite scroll sites cleanly.
{
"url": "https://www.zillow.com/homes/for_sale/New-York_rb/",
"steps": [
{ "wait_for": ".property-card", "wait_ms": 2000 },
{ "js": "window.scrollTo(0, document.body.scrollHeight)", "wait_ms": 2000 },
{ "js": "window.scrollTo(0, document.body.scrollHeight)", "wait_ms": 2000 },
{
"extract": "All property listings with address, price, bedrooms, bathrooms, square footage, and listing URL"
}
]
}
📦 GitHub: Trending Repos with Details
Pull trending repos by language — zero AI cost using CSS schema.
{
"url": "https://github.com/trending/javascript?since=daily",
"steps": [
{
"css_schema": {
"base": "article.Box-row",
"fields": [
{ "name": "repo", "selector": "h2 a", "type": "text" },
{ "name": "url", "selector": "h2 a", "type": "href" },
{ "name": "description", "selector": "p", "type": "text" },
{ "name": "stars", "selector": "a[href*=stargazers]", "type": "text" },
{ "name": "stars_today", "selector": "span.d-inline-block.float-sm-right", "type": "text" }
]
}
}
]
}
📋 Wikipedia: Structured Table Extraction
Pull data from wiki tables as clean JSON — no AI needed.
{
"url": "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)",
"steps": [
{
"css_schema": {
"base": ".wikitable tbody tr",
"fields": [
{ "name": "rank", "selector": "td:nth-child(1)", "type": "text" },
{ "name": "country", "selector": "td:nth-child(2)", "type": "text" },
{ "name": "gdp_usd", "selector": "td:nth-child(3)", "type": "text" }
]
}
}
]
}
🤖 AI-Powered Review Summariser
Click “Load more reviews”, then ask AI to extract and summarise sentiment.
{
"url": "https://shop.example.com/product/123/reviews",
"steps": [
{ "js": "document.querySelector('.load-more-reviews').click()", "wait_for": ".review-item", "wait_ms": 1500 },
{ "js": "document.querySelector('.load-more-reviews').click()", "wait_ms": 1500 },
{
"extract": "All reviews with star rating, reviewer name, date, and review text. Also include an overall sentiment summary."
}
]
}
🔐 Persistent Session: Multi-Step Authenticated Workflow
Log in, navigate to a protected page, and extract data — all within one session. (Pro plan and above)
Node.js — session workflow
const BASE = 'https://papalily.p.rapidapi.com';
const H = { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' };
const api = (path, body) => fetch(`${BASE}${path}`, { method: body ? 'POST' : 'GET', headers: H, body: body && JSON.stringify(body) }).then(r => r.json());
// 1. Start session and navigate to login page
const { session_id } = await api('/session/start', { url: 'https://app.example.com/login' });
// 2. Fill username
await api(`/session/${session_id}/step`, { js: "document.querySelector('#email').value = 'user@example.com'" });
// 3. Fill password and submit
await api(`/session/${session_id}/step`, {
js: "document.querySelector('#password').value = 'secret'; document.querySelector('form').submit()",
wait_for: '.dashboard', wait_ms: 3000
});
// 4. Navigate to reports page
await api(`/session/${session_id}/step`, { navigate: 'https://app.example.com/reports', wait_for: '.report-table' });
// 5. Extract report data
const data = await api(`/session/${session_id}/step`, { extract: 'All report rows with date, metric name, and value' });
// 6. Close session
await fetch(`${BASE}/session/${session_id}`, { method: 'DELETE', headers: H });
console.log(data.data);
💰 Price Monitor: Track a Product Over Time
A minimal cron-friendly snippet to monitor a product price and alert on change.
async function checkPrice(url) {
const res = await fetch('https://papalily.p.rapidapi.com/scrape', {
method: 'POST',
headers: { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' },
body: JSON.stringify({ url, prompt: 'Current product name and price', no_cache: true }),
}).then(r => r.json());
const current = res.data.price;
const previous = await getLastPrice(url); // your own DB
if (current !== previous) {
await sendAlert(`Price changed: ${previous} → ${current}`);
await savePrice(url, current);
}
}
// Run every 15 minutes with setInterval or a cron job
setInterval(() => checkPrice('https://shop.example.com/product/456'), 15 * 60 * 1000);
🚀 See what’s changed
View the full version history, breaking changes, and roadmap on the changelog.
View Changelog →