API v1.3.0 — All systems operational

API Documentation

Papalily lets you extract structured data from any website using a real browser and AI. Send a URL and a plain-English description — get back clean JSON. Handles React, Vue, Next.js, Angular, and any JavaScript-rendered site.

💡

Base URL: https://api.papalily.com • Get your API key at RapidAPI

Introduction

The Papalily API is a REST API that accepts JSON and returns JSON. It uses a real Chromium browser to render pages, then uses Gemini AI vision to extract data from both the page screenshot and text.

Key features in v1.3.0:

Interactive browser control — new /interact endpoint executes JS, clicks, form fills, and pagination on live pages
Natural language task planner — describe what you want in plain English; AI plans and executes the steps automatically
Persistent sessions — keep a browser alive across multiple API calls for complex multi-step workflows
Zero-AI CSS extraction — use css_schema for fast, cost-free structured extraction when page structure is known
Vision-first extraction — AI reads the full-page screenshot to find pricing cards, grids, tables, and visual layouts
Auto-translation — non-English results automatically translated to English
Smart caching — repeated requests return instantly and don’t count against your quota

Authentication

All requests require your RapidAPI key in the X-RapidAPI-Key header. Subscribe at RapidAPI — the free plan includes 50 requests, no credit card needed.

X-RapidAPI-Key: YOUR_RAPIDAPI_KEY
X-RapidAPI-Host: papalily.p.rapidapi.com
Content-Type: application/json

⚠️

Never expose your API key in client-side JavaScript. Always make requests from your server or backend.

Quick Start

Make your first request in under 60 seconds:

curl -X POST https://papalily.p.rapidapi.com/scrape \
  -H "X-RapidAPI-Key: YOUR_RAPIDAPI_KEY" \
  -H "X-RapidAPI-Host: papalily.p.rapidapi.com" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://news.ycombinator.com","prompt":"Top 5 post titles and URLs"}'

POST /scrape

The core endpoint. Renders the URL in a real Chromium browser, takes a full-page screenshot, and uses Gemini AI vision to extract exactly what you ask for. Average response time: 5–10 seconds.

Request Body

Parameter	Type	Required	Description
`url`	string	✓	The URL to scrape. Must be a valid http/https URL.
`prompt`	string	✓	Plain-English description of what to extract. Be specific for best results.
`wait_ms`	number		Extra ms to wait after load. Default: auto (adaptive). Max: 10000. Only needed for unusually slow pages.
`screenshot`	boolean		Include screenshot in AI analysis. Default: `true`. Strongly recommended for visual layouts.
`no_cache`	boolean		Bypass cache and force a fresh scrape. Default: `false`. Cache hits don’t count against quota.
`proxy_url`	string		Route the browser through your proxy. Format: `http://user:pass@host:port`. See Proxy & Geo.

Response

{
  "success": true,
  "request_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
  "url": "https://news.ycombinator.com",
  "title": "Hacker News",
  "data": {
    "posts": [
      { "title": "Show HN: I built an AI scraper", "url": "https://..." },
      { "title": "The future of web data", "url": "https://..." }
    ]
  },
  "meta": {
    "duration_ms": 7240,
    "scraped_at": "2026-03-09T12:00:00.000Z",
    "cached": false
  }
}

Proxy & Geo-Targeting

By default, requests originate from our server in Asia. For geo-specific content (e.g. US pricing, region-locked pages), pass your own proxy URL in proxy_url. The browser will route through that IP.

{
  "url": "https://www.shopify.com/pricing",
  "prompt": "Get all pricing plans with USD prices",
  "proxy_url": "http://username:password@us-proxy.example.com:8080",
  "no_cache": true
}

Proxy format	Example
HTTP with auth	`http://user:pass@host:port`
HTTP no auth	`http://host:port`
HTTPS	`https://user:pass@host:port`
SOCKS5	`socks5://user:pass@host:port`

🌐

Without a proxy, results may show local currency/language based on server location. The API automatically translates non-English content to English regardless.

POST /interact

Execute interactive steps on a real browser page — type into forms, click buttons, paginate, and extract data, all in one request. Accepts either an explicit steps array or a plain-English task string.

✨

New in v1.3.0. Use task mode to describe your goal in plain English — the AI plans the steps automatically.

Request body

Field	Type	Required	Description
`url`	string	✓	Starting URL for the browser
`task`	string	✓ or steps	Plain-English goal — AI plans and executes steps automatically
`steps`	array	✓ or task	Explicit step array (see step types below)
`proxy_url`	string		Proxy for geo-targeted requests

Step types

Step	Fields	AI cost	Description
`js`	js, wait_for?, wait_ms?	None	Execute raw JavaScript — click, type, submit, scroll, anything
`navigate`	navigate, wait_for?, wait_ms?	None	Navigate to a new URL
`wait`	wait_for?, wait_ms?	None	Wait for an element or a fixed delay
`css_schema`	css_schema.base, css_schema.fields[]	None	Extract structured data via CSS selectors — fast and free
`extract`	extract (prompt string)	Yes	AI vision extraction — use when structure is unknown or complex
`screenshot`	screenshot: true	None	Capture the current page as JPEG (base64)

Example: task mode (natural language)

{
  "url": "https://news.ycombinator.com",
  "task": "Get the top 10 post titles with their URLs and point scores"
}

Example: steps mode (explicit control)

{
  "url": "https://news.ycombinator.com",
  "steps": [
    {
      "css_schema": {
        "base": "tr.athing",
        "fields": [
          { "name": "title", "selector": ".titleline a", "type": "text" },
          { "name": "url",   "selector": ".titleline a", "type": "href" }
        ]
      }
    },
    { "js": "document.querySelector('a.morelink').click()", "wait_for": "tr.athing", "wait_ms": 1000 },
    { "extract": "All post titles and URLs from page 2" }
  ]
}

Response

{
  "success": true,
  "url": "https://news.ycombinator.com/",
  "steps_executed": 3,
  "steps_total": 3,
  "ai_steps": 1,
  "results": [
    { "step": 1, "type": "css_schema", "data": [ /* array of items */ ] },
    { "step": 2, "type": "js", "url": "https://news.ycombinator.com/?p=2" },
    { "step": 3, "type": "extract", "data": [ /* array of items */ ] }
  ],
  "plan": { "task": "...", "steps": [ /* generated plan */ ], "cached": false, "planning_ms": 2322 },
  "meta": { "duration_ms": 8354, "mode": "task" }
}

Persistent Sessions

Sessions keep a real browser alive between API calls — ideal for multi-step workflows like logging in, paginating through results, or comparing data across pages. Available on Pro plan and above.

⏱️

Sessions expire after 10 minutes of inactivity. Always call DELETE /session/:id when done to free resources immediately.

Session lifecycle

Endpoint	Method	Description
`/session/start`	POST	Open browser, navigate to URL, get session_id
`/session/:id/step`	POST	Execute one step or a task on the live page
`/session/:id/state`	GET	Get current URL, title, and screenshot
`/session/:id`	DELETE	Close session and free browser resources

Session limits by plan

Plan	Concurrent sessions
Basic (Free)	Not available
Pro	3
Ultra	10
Mega	20

Example: multi-page workflow

const BASE = 'https://papalily.p.rapidapi.com';
const HEADERS = {
  'X-RapidAPI-Key': 'YOUR_KEY',
  'X-RapidAPI-Host': 'papalily.p.rapidapi.com',
  'Content-Type': 'application/json'
};

// 1. Start session
const { session_id } = await post(`${BASE}/session/start`, { url: 'https://news.ycombinator.com' });

// 2. Extract page 1 (zero-AI, CSS schema)
const page1 = await post(`${BASE}/session/${session_id}/step`, {
  css_schema: { base: 'tr.athing', fields: [{ name: 'title', selector: '.titleline a', type: 'text' }] }
});

// 3. Click next page
await post(`${BASE}/session/${session_id}/step`, {
  js: "document.querySelector('a.morelink').click()",
  wait_for: 'tr.athing', wait_ms: 1000
});

// 4. Extract page 2
const page2 = await post(`${BASE}/session/${session_id}/step`, {
  css_schema: { base: 'tr.athing', fields: [{ name: 'title', selector: '.titleline a', type: 'text' }] }
});

// 5. Close session
await del(`${BASE}/session/${session_id}`);

GET /usage

Returns your current plan, requests used, and remaining quota for the billing period.

# Request
curl https://papalily.p.rapidapi.com/usage \
  -H "X-RapidAPI-Key: YOUR_RAPIDAPI_KEY" \
  -H "X-RapidAPI-Host: papalily.p.rapidapi.com"

# Response
{
  "success": true,
  "plan": "pro",
  "requests_used": 47,
  "requests_limit": 1000,
  "requests_remaining": 953
}

GET /health

Health check and cache statistics. No authentication required.

{ "status": "ok", "ts": "2026-03-09T12:00:00.000Z", "cache": { "size": 12, "maxSize": 500 } }

Writing Good Prompts

Be specific: “Get all product names and their USD prices” beats “Get products”
Mention structure: “Return as an array of objects with name and price fields”
Specify scope: “Get the top 10 results” or “Get ALL items on the page”
Use domain language: “Get the article headline, author, and publication date”
For pricing pages: “Get all pricing plans with plan name, monthly price, annual price, and included features”

Caching

Papalily caches successful results for 10 minutes. Repeated requests with the same URL + prompt return instantly — and don’t count against your quota.

Cached responses include "cached": true in meta.

Behaviour	Detail
Cache TTL	10 minutes per URL + prompt pair
Max entries	500 (LRU eviction when full)
Failed responses	Never cached — errors always retry fresh
Force refresh	Pass `"no_cache": true`
Quota impact	Cache hits do not count against monthly quota

Rate Limits & Plans

Plan	Requests / month	Price
Basic	50	Free
Pro	1,000	$20 / month
Ultra	20,000	$100 / month
Mega	100,000	$300 / month

All plans subject to a concurrent request limit of 3 simultaneous scrapes. Cache hits are free and don’t count toward monthly quota.

Error Codes

HTTP Status	Description
400	Missing or invalid `url` or `prompt`
401	Missing API key header
403	Invalid or unauthorised API key
410	Endpoint removed — see changelog for migration guide
429	Monthly quota exceeded or rate limit hit. Upgrade at RapidAPI
500	Browser rendering or AI extraction failed. Retry with `no_cache: true`

Code Examples

Real-world scenarios using /scrape and the new /interact endpoint.

📦 POST /scrape — Static Extraction

E-commerce: Product Listings

{
  "url": "https://shop.example.com/laptops",
  "prompt": "Get all laptop listings with name, price, rating, and review count"
}

SaaS: Competitor Pricing

{
  "url": "https://www.shopify.com/pricing",
  "prompt": "Get all pricing plans with plan name, monthly price, annual price, and top 3 features"
}

News: Latest Headlines

{
  "url": "https://techcrunch.com",
  "prompt": "Get the 10 most recent article titles, authors, publish dates, and URLs"
}

Multiple URLs in parallel

const targets = [
  { url: 'https://news.ycombinator.com', prompt: 'Top 5 post titles and scores' },
  { url: 'https://github.com/trending',    prompt: 'Top 5 trending repos with stars' },
  { url: 'https://lobste.rs',               prompt: 'Top 5 post titles and tags' },
];

const results = await Promise.all(targets.map(item =>
  fetch('https://papalily.p.rapidapi.com/scrape', {
    method: 'POST',
    headers: { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' },
    body: JSON.stringify(item),
  }).then(r => r.json())
));

⚡ POST /interact — Browser Automation

🛍️ E-commerce: Scrape All Pages of Search Results

Click “Next page” and extract products across multiple pages in one request.

{
  "url": "https://shop.example.com/search?q=laptop",
  "steps": [
    {
      "css_schema": {
        "base": ".product-card",
        "fields": [
          { "name": "name",   "selector": ".product-title", "type": "text" },
          { "name": "price",  "selector": ".price",          "type": "text" },
          { "name": "rating", "selector": ".star-rating",    "type": "text" },
          { "name": "url",    "selector": "a",               "type": "href" }
        ]
      }
    },
    { "js": "document.querySelector('.pagination-next').click()", "wait_for": ".product-card", "wait_ms": 1500 },
    {
      "css_schema": {
        "base": ".product-card",
        "fields": [
          { "name": "name",  "selector": ".product-title", "type": "text" },
          { "name": "price", "selector": ".price",          "type": "text" }
        ]
      }
    }
  ]
}

🔍 Site Search: Submit a Query and Extract Results

Type into a search box, submit the form, and extract results — zero selectors required with task mode.

{
  "url": "https://news.ycombinator.com",
  "task": "Search for 'AI' using the search box and return the top 10 results with title and URL"
}

📊 Finance: Stock Price + Key Metrics

Navigate to a ticker page and extract current data, no API key or auth needed.

{
  "url": "https://finance.yahoo.com/quote/AAPL",
  "steps": [
    { "wait_for": "[data-symbol]", "wait_ms": 2000 },
    {
      "extract": "Current stock price, change, percent change, market cap, P/E ratio, 52-week high and low"
    }
  ]
}

💼 Job Board: Search and Filter Listings

Search for a role, apply a filter, and extract all matching job postings.

{
  "url": "https://jobs.example.com",
  "steps": [
    { "js": "document.querySelector('input[placeholder*=Search]').value = 'frontend engineer'" },
    { "js": "document.querySelector('button[type=submit]').click()", "wait_for": ".job-card", "wait_ms": 2000 },
    { "js": "document.querySelector('[data-filter=remote]').click()", "wait_ms": 1000 },
    {
      "css_schema": {
        "base": ".job-card",
        "fields": [
          { "name": "title",    "selector": ".job-title",   "type": "text" },
          { "name": "company",  "selector": ".company",     "type": "text" },
          { "name": "location", "selector": ".location",    "type": "text" },
          { "name": "salary",   "selector": ".salary",      "type": "text" },
          { "name": "url",      "selector": "a.job-link",   "type": "href" }
        ]
      }
    }
  ]
}

📰 News Aggregator: Multi-Source Headlines in One Call

Use task mode to collect and summarise top stories from multiple tabs in one clean payload.

{
  "url": "https://news.ycombinator.com",
  "task": "Get the top 20 post titles, their scores, comment counts, and URLs"
}

🏠 Real Estate: Property Listings with Infinite Scroll

Scroll to load more listings before extracting — handles infinite scroll sites cleanly.

{
  "url": "https://www.zillow.com/homes/for_sale/New-York_rb/",
  "steps": [
    { "wait_for": ".property-card", "wait_ms": 2000 },
    { "js": "window.scrollTo(0, document.body.scrollHeight)", "wait_ms": 2000 },
    { "js": "window.scrollTo(0, document.body.scrollHeight)", "wait_ms": 2000 },
    {
      "extract": "All property listings with address, price, bedrooms, bathrooms, square footage, and listing URL"
    }
  ]
}

📦 GitHub: Trending Repos with Details

Pull trending repos by language — zero AI cost using CSS schema.

{
  "url": "https://github.com/trending/javascript?since=daily",
  "steps": [
    {
      "css_schema": {
        "base": "article.Box-row",
        "fields": [
          { "name": "repo",        "selector": "h2 a",                         "type": "text" },
          { "name": "url",         "selector": "h2 a",                         "type": "href" },
          { "name": "description", "selector": "p",                             "type": "text" },
          { "name": "stars",       "selector": "a[href*=stargazers]",          "type": "text" },
          { "name": "stars_today", "selector": "span.d-inline-block.float-sm-right", "type": "text" }
        ]
      }
    }
  ]
}

📋 Wikipedia: Structured Table Extraction

Pull data from wiki tables as clean JSON — no AI needed.

{
  "url": "https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)",
  "steps": [
    {
      "css_schema": {
        "base": ".wikitable tbody tr",
        "fields": [
          { "name": "rank",    "selector": "td:nth-child(1)", "type": "text" },
          { "name": "country", "selector": "td:nth-child(2)", "type": "text" },
          { "name": "gdp_usd", "selector": "td:nth-child(3)", "type": "text" }
        ]
      }
    }
  ]
}

🤖 AI-Powered Review Summariser

Click “Load more reviews”, then ask AI to extract and summarise sentiment.

{
  "url": "https://shop.example.com/product/123/reviews",
  "steps": [
    { "js": "document.querySelector('.load-more-reviews').click()", "wait_for": ".review-item", "wait_ms": 1500 },
    { "js": "document.querySelector('.load-more-reviews').click()", "wait_ms": 1500 },
    {
      "extract": "All reviews with star rating, reviewer name, date, and review text. Also include an overall sentiment summary."
    }
  ]
}

🔐 Persistent Session: Multi-Step Authenticated Workflow

const BASE = 'https://papalily.p.rapidapi.com';
const H = { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' };
const api = (path, body) => fetch(`${BASE}${path}`, { method: body ? 'POST' : 'GET', headers: H, body: body && JSON.stringify(body) }).then(r => r.json());

// 1. Start session and navigate to login page
const { session_id } = await api('/session/start', { url: 'https://app.example.com/login' });

// 2. Fill username
await api(`/session/${session_id}/step`, { js: "document.querySelector('#email').value = 'user@example.com'" });

// 3. Fill password and submit
await api(`/session/${session_id}/step`, {
  js: "document.querySelector('#password').value = 'secret'; document.querySelector('form').submit()",
  wait_for: '.dashboard', wait_ms: 3000
});

// 4. Navigate to reports page
await api(`/session/${session_id}/step`, { navigate: 'https://app.example.com/reports', wait_for: '.report-table' });

// 5. Extract report data
const data = await api(`/session/${session_id}/step`, { extract: 'All report rows with date, metric name, and value' });

// 6. Close session
await fetch(`${BASE}/session/${session_id}`, { method: 'DELETE', headers: H });

console.log(data.data);

💰 Price Monitor: Track a Product Over Time

A minimal cron-friendly snippet to monitor a product price and alert on change.

async function checkPrice(url) {
  const res = await fetch('https://papalily.p.rapidapi.com/scrape', {
    method: 'POST',
    headers: { 'X-RapidAPI-Key': 'YOUR_KEY', 'X-RapidAPI-Host': 'papalily.p.rapidapi.com', 'Content-Type': 'application/json' },
    body: JSON.stringify({ url, prompt: 'Current product name and price', no_cache: true }),
  }).then(r => r.json());

  const current = res.data.price;
  const previous = await getLastPrice(url); // your own DB

  if (current !== previous) {
    await sendAlert(`Price changed: ${previous} → ${current}`);
    await savePrice(url, current);
  }
}

// Run every 15 minutes with setInterval or a cron job
setInterval(() => checkPrice('https://shop.example.com/product/456'), 15 * 60 * 1000);

🚀 See what’s changed

View the full version history, breaking changes, and roadmap on the changelog.

View Changelog →