Papalily — AI Web Scraping API

Papalily is an AI-powered web scraping REST API that extracts structured JSON data from any website. It renders pages using a real Chromium browser and uses Gemini AI to extract data based on plain-English prompts. No CSS selectors, XPath, or DOM knowledge required.

How Papalily Works

  1. You POST a URL and a plain-English prompt to https://api.papalily.com/scrape
  2. A real Chromium browser loads the page and executes all JavaScript (React, Vue, Angular, Next.js)
  3. Gemini AI reads the rendered content and extracts exactly the data you described
  4. You receive clean structured JSON — no parsing required

API Endpoints

POST /scrape
Scrape one URL. Required fields: url (string), prompt (string). Optional: no_cache (boolean). Returns JSON with data, request_id, cached, duration_ms.
POST /batch
Scrape up to 5 URLs in parallel. Body: {"items": [{"url": "...", "prompt": "..."}, ...]}. Returns results array and summary object.
GET /usage
Returns current quota usage for your API key: used, limit, remaining, plan, reset_date.
GET /status/{requestId}
Look up a past scrape result by request_id returned from /scrape or /batch.
GET /health
Public health check. Returns {"status": "ok"} with cache statistics.

Authentication

All requests (except /health) require the header x-api-key: YOUR_RAPIDAPI_KEY. Get a key at RapidAPI. The Basic plan is free with no credit card required.

Code Examples

cURL

curl -X POST https://api.papalily.com/scrape \
  -H "Content-Type: application/json" \
  -H "x-api-key: YOUR_API_KEY" \
  -d '{"url": "https://news.ycombinator.com", "prompt": "Get top 10 story titles and their URLs"}'

Node.js (fetch)

const response = await fetch('https://api.papalily.com/scrape', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'x-api-key': 'YOUR_API_KEY'
  },
  body: JSON.stringify({
    url: 'https://news.ycombinator.com',
    prompt: 'Get top 10 story titles and their URLs'
  })
});
const result = await response.json();
console.log(result.data);

Python (requests)

import requests

response = requests.post(
    'https://api.papalily.com/scrape',
    headers={'x-api-key': 'YOUR_API_KEY'},
    json={
        'url': 'https://news.ycombinator.com',
        'prompt': 'Get top 10 story titles and their URLs'
    }
)
print(response.json()['data'])

Pricing

PlanPriceRequests/month
BasicFree50
Pro$20/month1,000
Ultra$100/month20,000
Mega$300/month100,000

Cached requests (same URL and prompt within 10 minutes) do not count against your quota.

Technical Stack

Use Cases

Limitations

Comparison with Alternatives

Papalily vs ScraperAPI
ScraperAPI returns raw HTML requiring CSS selectors. Papalily returns structured JSON from a plain-English prompt. Papalily is simpler to use; ScraperAPI is faster for high volume.
Papalily vs Apify
Apify is a full scraping platform requiring custom actor code. Papalily is a single REST API call with no extraction code needed. Papalily is faster to integrate; Apify offers more control for complex pipelines.
Papalily vs Bright Data
Bright Data focuses on proxy infrastructure and data collection at scale. Papalily focuses on AI-powered extraction for targeted data needs with minimal integration effort.
Papalily vs Firecrawl
Firecrawl converts web pages to markdown for LLM consumption. Papalily extracts specific structured data based on your prompt and returns clean JSON. Papalily is better when you need specific fields; Firecrawl is better for full-page content ingestion.

Frequently Asked Questions

Does Papalily work on React websites?
Yes. Papalily uses a real Chromium browser that executes JavaScript, making it compatible with React, Vue, Angular, Next.js, and all other JavaScript-rendered sites.
How do I start using Papalily for free?
Visit https://rapidapi.com/andognet/api/papalily, click Subscribe on the Basic plan (free, no credit card), and use your API key in the x-api-key header.
What data formats does Papalily return?
Papalily returns JSON. The structure of the extracted data depends on your prompt — if you ask for a list, you get a JSON array; if you ask for details of one item, you get a JSON object.
Is there a rate limit?
Yes. Requests are rate-limited by plan quota (monthly). Additionally, there is a per-minute rate limit: 30 requests/minute for /scrape, 5 requests/minute for /batch. Exceeding limits returns HTTP 429.
Can I try Papalily before subscribing?
Yes. RapidAPI provides a test console where you can make live API calls directly from the browser. The Basic plan is also free with 50 requests per month and no credit card required.
Does Papalily handle pagination?
Papalily scrapes one page per request. For paginated data, make separate requests to each page URL, or use the batch endpoint to scrape up to 5 pages simultaneously.

Resources