Changelog — Papalily API Version History

v1.3.0 Latest March 10, 2026

Major new capability: interactive browser automation. Papalily can now execute JavaScript, fill forms, click buttons, paginate, and maintain browser sessions across multiple API calls. Includes a natural language task planner that converts plain-English goals into executable steps automatically.

New Endpoints

POST /interact — Execute a sequence of interactive steps on a real browser page. Accepts steps (explicit) or task (natural language, AI-planned).
POST /session/start — Open a persistent browser session. Context stays alive between API calls. Available on Pro and above.
POST /session/:id/step — Execute one step or a natural-language task on a live session.
GET /session/:id/state — Get current URL, title, and screenshot of a live session.
DELETE /session/:id — Close a session and free browser resources.

New Features

Natural language task planner — Pass "task": "..." to /interact or /session/:id/step. The AI snapshots the live page, generates a step plan, and executes it automatically. Plans are cached per domain+task for 1 hour.
CSS schema extraction — New css_schema step type extracts structured data via CSS selectors with zero AI cost. Faster and cheaper than extract when page structure is known.
Per-request browser contexts — Each request now gets an isolated Playwright context, preventing cookie/state leaks between users.
Crash guards — unhandledRejection and uncaughtException handlers prevent full process crashes on async errors.
PM2 memory restart — Server auto-restarts at 1GB memory to prevent OOM crashes.

Fixes

Fixed rate limiter ERR_ERL_UNEXPECTED_X_FORWARDED_FOR false-alarm errors flooding the log.
Fixed mobile hamburger menu not working on Blog, Compare, and Resources pages.
Fixed mobile nav font size inconsistency across pages.
Standardised navbar links across all pages (Home | Docs | Pricing | Blog | Compare | Changelog | Resources).

v1.1.1 March 9, 2026

Removed the /batch endpoint to protect server stability. Use POST /scrape for each URL individually — either sequentially or in parallel from your own client code.

Breaking Changes

POST /batch removed — endpoint now returns 410 Gone. The batch endpoint spawned multiple concurrent Playwright browser instances, causing high memory pressure. Replace with individual calls to POST /scrape.

Migration

For sequential scraping: call POST /scrape in a loop
For parallel scraping: call POST /scrape concurrently from your client (e.g. Promise.all in JS, asyncio.gather in Python)
Cached results return instantly and are quota-free — repeated URLs benefit automatically

v1.1.0 Major Update March 9, 2026

Major rendering reliability update. The API now uses an adaptive content-stability algorithm to detect when React/Vue/Next.js pages have finished hydrating — replacing fixed wait timers. Full-page screenshots, proxy support, and automatic non-English translation complete the release.

New Features

proxy_url parameter — route the browser through any HTTP/HTTPS/SOCKS5 proxy for geo-specific content (e.g. get USD pricing from a US IP)
Adaptive content-stability wait — polls innerText length every 600ms and exits only when the page stops changing, instead of a fixed delay
Lazy-load trigger — automatically scrolls the full page before capture to trigger intersection-observer lazy-loaded components
Auto-translation — results containing Korean, Japanese, Chinese, or Arabic are automatically translated to English via a second AI pass
API endpoint request logging — every call to every route is logged with status code and response time
Analytics dashboard endpoint stats — new table shows total calls, success rate, avg response time, and error count per endpoint

Performance

Page load: networkidle → load event — saves 1–3s per request on most sites
Resource blocking: images, fonts, media, and tracking scripts aborted during navigation
Browser context reuse — shared context kept alive between requests instead of recreating
Screenshot quality optimised: full page up to 5000px tall at quality 70 for richer AI analysis
HTML preprocessed before AI analysis — <script>, <style>, SVG, and comments stripped
15 extra Chrome --disable-* flags for unused browser services
Gemini model instance reused at module level (no re-initialisation per request)

Bug Fixes

Fixed geo-targeted sites (Shopify, etc.) serving localised content to non-US server IPs — added locale: en-US, Accept-Language, and CF-IPCountry headers
Fixed Korean/CJK text leaking into extraction results — AI extraction prompt now enforces English; translation safety net as backup
Cookies cleared between requests — prevented stale geo-targeting cookies from affecting subsequent scrapes
Removed responseMimeType: application/json which was suppressing Gemini's language instruction-following
Fixed duplicate route handlers causing request conflicts
Analytics dashboard: bar charts now use %-based widths (was fixed 300px, broke on mobile)
Analytics mobile: Referrer/IP/Device ID columns hidden on mobile; all tables wrapped in overflow-x: auto

Infrastructure

Vision-first extraction — system prompt explicitly instructs AI to study screenshot for pricing cards, grids, and visual tables
Geo-redirect interception — path-based locale redirects (/ko/, /ja/, etc.) rewritten to /en/
Proxy requests use isolated one-time browser contexts — no shared state bleed

🔗 View on RapidAPI

v1.0.2 Update March 8, 2026

Security hardening, analytics v2, self-hosted tracking dashboard, and mobile optimisation.

New Features

Self-hosted analytics dashboard at /analytics — pageviews, click events, top pages, referrers
Analytics v2: real IP tracking, device ID (__ppid localStorage UUID), browser fingerprint hash
Scheduled blog publishing system — 8 SEO posts drip-fed 2x/week via cron job
GEO/AI optimisation — ai-page.html served to AI crawlers (GPTBot, ClaudeBot, PerplexityBot) for ChatGPT Search and Perplexity extraction
Comparison pages: /compare/ hub, vs ScraperAPI, vs Apify
Resources page with curated developer tools and backlinks

Performance & Security

Nginx rate limiting zones: per-endpoint limits (scrape 10r/m, batch 3r/m, general 30r/m)
HSTS, CSP, gzip, static asset caching on www
MAX_CONCURRENT_SCRAPES = 3 cap to prevent OOM on concurrent Playwright instances
trust proxy 1 set for correct client IP behind Nginx
Mobile: orbs disabled on screens <768px (removed heavy filter:blur GPU load)
Mobile navigation added to all pages (was completely missing)

Bug Fixes

Fixed css/ and js/ directories with 700 permissions — Nginx could not read static assets
Cache now never stores failed scrapes — subsequent requests always retry fresh
Batch: each URL in batch counts as 1 quota request; cache hits are free
RapidAPI plan limits correctly read from x-rapidapi-subscription header on every request

v1.0.1 Update March 7, 2026

RapidAPI integration, CORS, improved cache logic, and batch endpoint hardening.

New Features

RapidAPI proxy secret validation (x-rapidapi-proxy-secret header)
CORS headers added for browser-side API access
SEO/GEO meta — robots.txt, sitemap.xml, llms.txt, IndexNow key, JSON-LD schemas
Blog launched — first post on scraping React sites with AI
GitHub profile README as DA96 backlink

Improvements

Cache improved: max 500 entries, LRU eviction, never caches failure responses
Batch: pre-flight quota check before scraping; per-item quota counting
RapidAPI plan auto-sync: BASIC→50, PRO→1000, ULTRA→20000, MEGA→100000 requests

v1.0.0 Initial Release March 6, 2026

Papalily API launches publicly on RapidAPI. Chromium-based scraping with Gemini AI extraction.

Initial Features

POST /scrape — render any URL in Chromium + extract structured JSON via Gemini AI
POST /batch — scrape up to 5 URLs in parallel
GET /usage — check quota and plan limits
GET /health — API status and cache stats
10-minute LRU result cache — repeated requests instant and quota-free
Playwright Chromium headless rendering — handles React, Vue, Next.js, Angular
Gemini 2.0 Flash AI extraction engine — screenshot + text for maximum accuracy
Let's Encrypt SSL on all three domains (www, bare, api)
PM2 process manager with systemd auto-restart

API Changelog

🚀 On the Roadmap