API v1.3.0 — All systems operational

API Documentation

Papalily lets you extract structured data from any website using a real browser and AI. Send a URL and a plain-English description — get back clean JSON. Handles React, Vue, Next.js, Angular, and any JavaScript-rendered site.

💡
Base URL: https://api.papalily.com  •  Get your API key at RapidAPI

Introduction

The Papalily API is a REST API that accepts JSON and returns JSON. It uses a real Chromium browser to render pages, then uses Gemini AI vision to extract data from both the page screenshot and text.

Key features in v1.3.0:

  • Interactive browser control — new /interact endpoint executes JS, clicks, form fills, and pagination on live pages
  • Natural language task planner — describe what you want in plain English; AI plans and executes the steps automatically
  • Persistent sessions — keep a browser alive across multiple API calls for complex multi-step workflows
  • Zero-AI CSS extraction — use css_schema for fast, cost-free structured extraction when page structure is known
  • Vision-first extraction — AI reads the full-page screenshot to find pricing cards, grids, tables, and visual layouts
  • Auto-translation — non-English results automatically translated to English
  • Smart caching — repeated requests return instantly and don’t count against your quota

Authentication

All requests require your RapidAPI key in the X-RapidAPI-Key header. Subscribe at RapidAPI — the free plan includes 50 requests, no credit card needed.

Required headers
X-RapidAPI-Key: YOUR_RAPIDAPI_KEY X-RapidAPI-Host: papalily.p.rapidapi.com Content-Type: application/json
⚠️
Never expose your API key in client-side JavaScript. Always make requests from your server or backend.

Quick Start

Make your first request in under 60 seconds:

curl -X POST https://papalily.p.rapidapi.com/scrape \ -H "X-RapidAPI-Key: YOUR_RAPIDAPI_KEY" \ -H "X-RapidAPI-Host: papalily.p.rapidapi.com" \ -H "Content-Type: application/json" \ -d '{"url":"https://news.ycombinator.com","prompt":"Top 5 post titles and URLs"}'

POST /scrape

The core endpoint. Renders the URL in a real Chromium browser, takes a full-page screenshot, and uses Gemini AI vision to extract exactly what you ask for. Average response time: 5–10 seconds.

Request Body

ParameterTypeRequiredDescription
urlstringThe URL to scrape. Must be a valid http/https URL.
promptstringPlain-English description of what to extract. Be specific for best results.
wait_msnumberExtra ms to wait after load. Default: auto (adaptive). Max: 10000. Only needed for unusually slow pages.
screenshotbooleanInclude screenshot in AI analysis. Default: true. Strongly recommended for visual layouts.
no_cachebooleanBypass cache and force a fresh scrape. Default: false. Cache hits don’t count against quota.
proxy_urlstringRoute the browser through your proxy. Format: http://user:pass@host:port. See Proxy & Geo.

Response

Example response
{ "success": true, "request_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479", "url": "https://news.ycombinator.com", "title": "Hacker News", "data": { "posts": [ { "title": "Show HN: I built an AI scraper", "url": "https://..." }, { "title": "The future of web data", "url": "https://..." } ] }, "meta": { "duration_ms": 7240, "scraped_at": "2026-03-09T12:00:00.000Z", "cached": false } }

Proxy & Geo-Targeting

By default, requests originate from our server in Asia. For geo-specific content (e.g. US pricing, region-locked pages), pass your own proxy URL in proxy_url. The browser will route through that IP.

Scrape with US proxy
{ "url": "https://www.shopify.com/pricing", "prompt": "Get all pricing plans with USD prices", "proxy_url": "http://username:password@us-proxy.example.com:8080", "no_cache": true }
Proxy formatExample
HTTP with authhttp://user:pass@host:port
HTTP no authhttp://host:port
HTTPShttps://user:pass@host:port
SOCKS5socks5://user:pass@host:port
🌐
Without a proxy, results may show local currency/language based on server location. The API automatically translates non-English content to English regardless.

POST /interact

Execute interactive steps on a real browser page — type into forms, click buttons, paginate, and extract data, all in one request. Accepts either an explicit steps array or a plain-English task string.

New in v1.3.0. Use task mode to describe your goal in plain English — the AI plans the steps automatically.

Request body

FieldTypeRequiredDescription
urlstringStarting URL for the browser
taskstring✓ or stepsPlain-English goal — AI plans and executes steps automatically
stepsarray✓ or taskExplicit step array (see step types below)
proxy_urlstringProxy for geo-targeted requests

Step types

StepFieldsAI costDescription
jsjs, wait_for?, wait_ms?NoneExecute raw JavaScript — click, type, submit, scroll, anything
navigatenavigate, wait_for?, wait_ms?NoneNavigate to a new URL
waitwait_for?, wait_ms?NoneWait for an element or a fixed delay
css_schemacss_schema.base, css_schema.fields[]NoneExtract structured data via CSS selectors — fast and free
extractextract (prompt string)YesAI vision extraction — use when structure is unknown or complex
screenshotscreenshot: trueNoneCapture the current page as JPEG (base64)

Example: task mode (natural language)

POST /interact — task
{ "url": "https://news.ycombinator.com", "task": "Get the top 10 post titles with their URLs and point scores" }

Example: steps mode (explicit control)

POST /interact — steps
{ "url": "https://news.ycombinator.com", "steps": [ { "css_schema": { "base": "tr.athing", "fields": [ { "name": "title", "selector": ".titleline a", "type": "text" }, { "name": "url", "selector": ".titleline a", "type": "href" } ] } }, { "js": "document.querySelector('a.morelink').click()", "wait_for": "tr.athing", "wait_ms": 1000 }, { "extract": "All post titles and URLs from page 2" } ] }

Response

{ "success": true, "url": "https://news.ycombinator.com/", "steps_executed": 3, "steps_total": 3, "ai_steps": 1, "results": [ { "step": 1, "type": "css_schema", "data": [ /* array of items */ ] }, { "step": 2, "type": "js", "url": "https://news.ycombinator.com/?p=2" }, { "step": 3, "type": "extract", "data": [ /* array of items */ ] } ], "plan": { "task": "...", "steps": [ /* generated plan */ ], "cached": false, "planning_ms": 2322 }, "meta": { "duration_ms": 8354, "mode": "task" } }

Persistent Sessions

Sessions keep a real browser alive between API calls — ideal for multi-step workflows like logging in, paginating through results, or comparing data across pages. Available on Pro plan and above.

⏱️
Sessions expire after 10 minutes of inactivity. Always call DELETE /session/:id when done to free resources immediately.

Session lifecycle

EndpointMethodDescription
/session/startPOSTOpen browser, navigate to URL, get session_id
/session/:id/stepPOSTExecute one step or a task on the live page
/session/:id/stateGETGet current URL, title, and screenshot
/session/:idDELETEClose session and free browser resources

Session limits by plan

PlanConcurrent sessions
Basic (Free)Not available
Pro3
Ultra10
Mega20

Example: multi-page workflow

Node.js — session workflow
const BASE = 'https://papalily.p.rapidapi.com'; const HEADERS = { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' }; // 1. Start session const { session_id } = await post(`${BASE}/session/start`, { url: 'https://news.ycombinator.com' }); // 2. Extract page 1 (zero-AI, CSS schema) const page1 = await post(`${BASE}/session/${session_id}/step`, { css_schema: { base: 'tr.athing', fields: [{ name: 'title', selector: '.titleline a', type: 'text' }] } }); // 3. Click next page await post(`${BASE}/session/${session_id}/step`, { js: "document.querySelector('a.morelink').click()", wait_for: 'tr.athing', wait_ms: 1000 }); // 4. Extract page 2 const page2 = await post(`${BASE}/session/${session_id}/step`, { css_schema: { base: 'tr.athing', fields: [{ name: 'title', selector: '.titleline a', type: 'text' }] } }); // 5. Close session await del(`${BASE}/session/${session_id}`);

GET /usage

Returns your current plan, requests used, and remaining quota for the billing period.

# Request curl https://papalily.p.rapidapi.com/usage \ -H "X-RapidAPI-Key: YOUR_RAPIDAPI_KEY" \ -H "X-RapidAPI-Host: papalily.p.rapidapi.com" # Response { "success": true, "plan": "pro", "requests_used": 47, "requests_limit": 1000, "requests_remaining": 953 }

GET /health

Health check and cache statistics. No authentication required.

{ "status": "ok", "ts": "2026-03-09T12:00:00.000Z", "cache": { "size": 12, "maxSize": 500 } }

Writing Good Prompts

  • Be specific: “Get all product names and their USD prices” beats “Get products”
  • Mention structure: “Return as an array of objects with name and price fields”
  • Specify scope: “Get the top 10 results” or “Get ALL items on the page”
  • Use domain language: “Get the article headline, author, and publication date”
  • For pricing pages: “Get all pricing plans with plan name, monthly price, annual price, and included features”

Caching

Papalily caches successful results for 10 minutes. Repeated requests with the same URL + prompt return instantly — and don’t count against your quota.

Cached responses include "cached": true in meta.

BehaviourDetail
Cache TTL10 minutes per URL + prompt pair
Max entries500 (LRU eviction when full)
Failed responsesNever cached — errors always retry fresh
Force refreshPass "no_cache": true
Quota impactCache hits do not count against monthly quota

Rate Limits & Plans

PlanRequests / monthPrice
Basic50Free
Pro1,000$20 / month
Ultra20,000$100 / month
Mega100,000$300 / month

All plans subject to a concurrent request limit of 3 simultaneous scrapes. Cache hits are free and don’t count toward monthly quota.

Error Codes

HTTP StatusDescription
400Missing or invalid url or prompt
401Missing API key header
403Invalid or unauthorised API key
410Endpoint removed — see changelog for migration guide
429Monthly quota exceeded or rate limit hit. Upgrade at RapidAPI
500Browser rendering or AI extraction failed. Retry with no_cache: true

Code Examples

Real-world scenarios using /scrape and the new /interact endpoint.

📦 POST /scrape — Static Extraction

E-commerce: Product Listings

{ "url": "https://shop.example.com/laptops", "prompt": "Get all laptop listings with name, price, rating, and review count" }

SaaS: Competitor Pricing

{ "url": "https://www.shopify.com/pricing", "prompt": "Get all pricing plans with plan name, monthly price, annual price, and top 3 features" }

News: Latest Headlines

{ "url": "https://techcrunch.com", "prompt": "Get the 10 most recent article titles, authors, publish dates, and URLs" }

Multiple URLs in parallel

Node.js
const targets = [ { url: 'https://news.ycombinator.com', prompt: 'Top 5 post titles and scores' }, { url: 'https://github.com/trending', prompt: 'Top 5 trending repos with stars' }, { url: 'https://lobste.rs', prompt: 'Top 5 post titles and tags' }, ]; const results = await Promise.all(targets.map(item => fetch('https://papalily.p.rapidapi.com/scrape', { method: 'POST', headers: { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' }, body: JSON.stringify(item), }).then(r => r.json()) ));

⚡ POST /interact — Browser Automation

🛍️ E-commerce: Scrape All Pages of Search Results

Click “Next page” and extract products across multiple pages in one request.

steps mode
{ "url": "https://shop.example.com/search?q=laptop", "steps": [ { "css_schema": { "base": ".product-card", "fields": [ { "name": "name", "selector": ".product-title", "type": "text" }, { "name": "price", "selector": ".price", "type": "text" }, { "name": "rating", "selector": ".star-rating", "type": "text" }, { "name": "url", "selector": "a", "type": "href" } ] } }, { "js": "document.querySelector('.pagination-next').click()", "wait_for": ".product-card", "wait_ms": 1500 }, { "css_schema": { "base": ".product-card", "fields": [ { "name": "name", "selector": ".product-title", "type": "text" }, { "name": "price", "selector": ".price", "type": "text" } ] } } ] }

🔍 Site Search: Submit a Query and Extract Results

Type into a search box, submit the form, and extract results — zero selectors required with task mode.

task mode
{ "url": "https://news.ycombinator.com", "task": "Search for 'AI' using the search box and return the top 10 results with title and URL" }

📊 Finance: Stock Price + Key Metrics

Navigate to a ticker page and extract current data, no API key or auth needed.

steps mode
{ "url": "https://finance.yahoo.com/quote/AAPL", "steps": [ { "wait_for": "[data-symbol]", "wait_ms": 2000 }, { "extract": "Current stock price, change, percent change, market cap, P/E ratio, 52-week high and low" } ] }

💼 Job Board: Search and Filter Listings

Search for a role, apply a filter, and extract all matching job postings.

steps mode
{ "url": "https://jobs.example.com", "steps": [ { "js": "document.querySelector('input[placeholder*=Search]').value = 'frontend engineer'" }, { "js": "document.querySelector('button[type=submit]').click()", "wait_for": ".job-card", "wait_ms": 2000 }, { "js": "document.querySelector('[data-filter=remote]').click()", "wait_ms": 1000 }, { "css_schema": { "base": ".job-card", "fields": [ { "name": "title", "selector": ".job-title", "type": "text" }, { "name": "company", "selector": ".company", "type": "text" }, { "name": "location", "selector": ".location", "type": "text" }, { "name": "salary", "selector": ".salary", "type": "text" }, { "name": "url", "selector": "a.job-link", "type": "href" } ] } } ] }

📰 News Aggregator: Multi-Source Headlines in One Call

Use task mode to collect and summarise top stories from multiple tabs in one clean payload.

task mode
{ "url": "https://news.ycombinator.com", "task": "Get the top 20 post titles, their scores, comment counts, and URLs" }

🏠 Real Estate: Property Listings with Infinite Scroll

Scroll to load more listings before extracting — handles infinite scroll sites cleanly.

steps mode
{ "url": "https://www.zillow.com/homes/for_sale/New-York_rb/", "steps": [ { "wait_for": ".property-card", "wait_ms": 2000 }, { "js": "window.scrollTo(0, document.body.scrollHeight)", "wait_ms": 2000 }, { "js": "window.scrollTo(0, document.body.scrollHeight)", "wait_ms": 2000 }, { "extract": "All property listings with address, price, bedrooms, bathrooms, square footage, and listing URL" } ] }

📦 GitHub: Trending Repos with Details

Pull trending repos by language — zero AI cost using CSS schema.

steps mode — zero AI
{ "url": "https://github.com/trending/javascript?since=daily", "steps": [ { "css_schema": { "base": "article.Box-row", "fields": [ { "name": "repo", "selector": "h2 a", "type": "text" }, { "name": "url", "selector": "h2 a", "type": "href" }, { "name": "description", "selector": "p", "type": "text" }, { "name": "stars", "selector": "a[href*=stargazers]", "type": "text" }, { "name": "stars_today", "selector": "span.d-inline-block.float-sm-right", "type": "text" } ] } } ] }

📋 Wikipedia: Structured Table Extraction

Pull data from wiki tables as clean JSON — no AI needed.

steps mode — zero AI
{ "url": "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)", "steps": [ { "css_schema": { "base": ".wikitable tbody tr", "fields": [ { "name": "rank", "selector": "td:nth-child(1)", "type": "text" }, { "name": "country", "selector": "td:nth-child(2)", "type": "text" }, { "name": "gdp_usd", "selector": "td:nth-child(3)", "type": "text" } ] } } ] }

🤖 AI-Powered Review Summariser

Click “Load more reviews”, then ask AI to extract and summarise sentiment.

steps mode
{ "url": "https://shop.example.com/product/123/reviews", "steps": [ { "js": "document.querySelector('.load-more-reviews').click()", "wait_for": ".review-item", "wait_ms": 1500 }, { "js": "document.querySelector('.load-more-reviews').click()", "wait_ms": 1500 }, { "extract": "All reviews with star rating, reviewer name, date, and review text. Also include an overall sentiment summary." } ] }

🔐 Persistent Session: Multi-Step Authenticated Workflow

Log in, navigate to a protected page, and extract data — all within one session. (Pro plan and above)

Node.js — session workflow
const BASE = 'https://papalily.p.rapidapi.com'; const H = { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' }; const api = (path, body) => fetch(`${BASE}${path}`, { method: body ? 'POST' : 'GET', headers: H, body: body && JSON.stringify(body) }).then(r => r.json()); // 1. Start session and navigate to login page const { session_id } = await api('/session/start', { url: 'https://app.example.com/login' }); // 2. Fill username await api(`/session/${session_id}/step`, { js: "document.querySelector('#email').value = 'user@example.com'" }); // 3. Fill password and submit await api(`/session/${session_id}/step`, { js: "document.querySelector('#password').value = 'secret'; document.querySelector('form').submit()", wait_for: '.dashboard', wait_ms: 3000 }); // 4. Navigate to reports page await api(`/session/${session_id}/step`, { navigate: 'https://app.example.com/reports', wait_for: '.report-table' }); // 5. Extract report data const data = await api(`/session/${session_id}/step`, { extract: 'All report rows with date, metric name, and value' }); // 6. Close session await fetch(`${BASE}/session/${session_id}`, { method: 'DELETE', headers: H }); console.log(data.data);

💰 Price Monitor: Track a Product Over Time

A minimal cron-friendly snippet to monitor a product price and alert on change.

Node.js
async function checkPrice(url) { const res = await fetch('https://papalily.p.rapidapi.com/scrape', { method: 'POST', headers: { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' }, body: JSON.stringify({ url, prompt: 'Current product name and price', no_cache: true }), }).then(r => r.json()); const current = res.data.price; const previous = await getLastPrice(url); // your own DB if (current !== previous) { await sendAlert(`Price changed: ${previous} → ${current}`); await savePrice(url, current); } } // Run every 15 minutes with setInterval or a cron job setInterval(() => checkPrice('https://shop.example.com/product/456'), 15 * 60 * 1000);

🚀 See what’s changed

View the full version history, breaking changes, and roadmap on the changelog.

View Changelog →