Papalily — AI Web Scraping API
Papalily is an AI-powered web scraping REST API that extracts structured JSON data from any website. It renders pages using a real Chromium browser and uses Gemini AI to extract data based on plain-English prompts. No CSS selectors, XPath, or DOM knowledge required.
How Papalily Works
- You POST a URL and a plain-English prompt to
https://api.papalily.com/scrape
- A real Chromium browser loads the page and executes all JavaScript (React, Vue, Angular, Next.js)
- Gemini AI reads the rendered content and extracts exactly the data you described
- You receive clean structured JSON — no parsing required
API Endpoints
- POST /scrape
- Scrape one URL. Required fields:
url (string), prompt (string). Optional: no_cache (boolean). Returns JSON with data, request_id, cached, duration_ms.
- POST /batch
- Scrape up to 5 URLs in parallel. Body:
{"items": [{"url": "...", "prompt": "..."}, ...]}. Returns results array and summary object.
- GET /usage
- Returns current quota usage for your API key:
used, limit, remaining, plan, reset_date.
- GET /status/{requestId}
- Look up a past scrape result by request_id returned from /scrape or /batch.
- GET /health
- Public health check. Returns
{"status": "ok"} with cache statistics.
Authentication
All requests (except /health) require the header x-api-key: YOUR_RAPIDAPI_KEY. Get a key at RapidAPI. The Basic plan is free with no credit card required.
Code Examples
cURL
curl -X POST https://api.papalily.com/scrape \
-H "Content-Type: application/json" \
-H "x-api-key: YOUR_API_KEY" \
-d '{"url": "https://news.ycombinator.com", "prompt": "Get top 10 story titles and their URLs"}'
Node.js (fetch)
const response = await fetch('https://api.papalily.com/scrape', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': 'YOUR_API_KEY'
},
body: JSON.stringify({
url: 'https://news.ycombinator.com',
prompt: 'Get top 10 story titles and their URLs'
})
});
const result = await response.json();
console.log(result.data);
Python (requests)
import requests
response = requests.post(
'https://api.papalily.com/scrape',
headers={'x-api-key': 'YOUR_API_KEY'},
json={
'url': 'https://news.ycombinator.com',
'prompt': 'Get top 10 story titles and their URLs'
}
)
print(response.json()['data'])
Pricing
| Plan | Price | Requests/month |
| Basic | Free | 50 |
| Pro | $20/month | 1,000 |
| Ultra | $100/month | 20,000 |
| Mega | $300/month | 100,000 |
Cached requests (same URL and prompt within 10 minutes) do not count against your quota.
Technical Stack
- Browser: Playwright Chromium (full JS execution)
- AI extraction: Google Gemini 2.0 Flash
- Backend: Node.js 22, Express
- Cache: In-memory LRU, 10-minute TTL, 500 entries max
- Database: SQLite (request logs, usage tracking)
- Infrastructure: AWS EC2, Nginx, PM2, Let's Encrypt TLS
- Marketplace: RapidAPI
Use Cases
- E-commerce price monitoring: Track competitor product prices across React/Next.js storefronts
- Job listing aggregation: Collect job postings from LinkedIn, Indeed, and company career pages
- News and content monitoring: Monitor news sites, blogs, and social platforms for specific topics
- Lead generation: Extract contact information from business directories
- Research automation: Gather structured data from multiple sources without writing scrapers
- Real estate data: Extract property listings, prices, and details from listing sites
- Financial data: Collect stock prices, earnings reports, and financial metrics from investor sites
- Academic research: Aggregate papers, citations, and abstracts from research repositories
Limitations
- Response time: 3-8 seconds per request (browser render + AI extraction)
- Not suitable for real-time APIs requiring sub-second responses
- Does not handle login-walled content requiring authentication
- May not bypass aggressive CAPTCHA systems on some sites
- Batch limit: maximum 5 URLs per batch request
Comparison with Alternatives
- Papalily vs ScraperAPI
- ScraperAPI returns raw HTML requiring CSS selectors. Papalily returns structured JSON from a plain-English prompt. Papalily is simpler to use; ScraperAPI is faster for high volume.
- Papalily vs Apify
- Apify is a full scraping platform requiring custom actor code. Papalily is a single REST API call with no extraction code needed. Papalily is faster to integrate; Apify offers more control for complex pipelines.
- Papalily vs Bright Data
- Bright Data focuses on proxy infrastructure and data collection at scale. Papalily focuses on AI-powered extraction for targeted data needs with minimal integration effort.
- Papalily vs Firecrawl
- Firecrawl converts web pages to markdown for LLM consumption. Papalily extracts specific structured data based on your prompt and returns clean JSON. Papalily is better when you need specific fields; Firecrawl is better for full-page content ingestion.
Frequently Asked Questions
- Does Papalily work on React websites?
- Yes. Papalily uses a real Chromium browser that executes JavaScript, making it compatible with React, Vue, Angular, Next.js, and all other JavaScript-rendered sites.
- How do I start using Papalily for free?
- Visit https://rapidapi.com/andognet/api/papalily, click Subscribe on the Basic plan (free, no credit card), and use your API key in the x-api-key header.
- What data formats does Papalily return?
- Papalily returns JSON. The structure of the extracted data depends on your prompt — if you ask for a list, you get a JSON array; if you ask for details of one item, you get a JSON object.
- Is there a rate limit?
- Yes. Requests are rate-limited by plan quota (monthly). Additionally, there is a per-minute rate limit: 30 requests/minute for /scrape, 5 requests/minute for /batch. Exceeding limits returns HTTP 429.
- Can I try Papalily before subscribing?
- Yes. RapidAPI provides a test console where you can make live API calls directly from the browser. The Basic plan is also free with 50 requests per month and no credit card required.
- Does Papalily handle pagination?
- Papalily scrapes one page per request. For paginated data, make separate requests to each page URL, or use the batch endpoint to scrape up to 5 pages simultaneously.
Resources