Scraping JavaScript-heavy websites and SPAs (Single Page Applications) is the new normal in 2026.
The days of simple HTML pages are behind us — React, Vue, Angular, and Svelte now power the majority of modern web applications.
These frameworks deliver rich user experiences, but they also make traditional scraping approaches completely ineffective.
If you've ever tried to scrape a website and found empty divs where content should be, you've hit the JavaScript wall.
This guide will show you exactly how to break through it — with modern techniques that work on even the most complex SPAs.
Why Traditional Scraping Fails on Modern Websites
The fundamental problem is simple: traditional HTTP requests only fetch the initial HTML document.
In a Single Page Application, that initial document is essentially a shell — a container that loads JavaScript,
which then fetches and renders the actual content dynamically.
The JavaScript Rendering Gap
Here's what happens when you fetch a modern React app with a simple HTTP request:
What you get with HTTP requests
<!DOCTYPE html>
<html>
<head><title>My App</title></head>
<body>
<!-- The content you want is NOT here -->
<div id="root"></div>
<!-- JavaScript that renders content later -->
<script src="/static/js/main.chunk.js"></script>
</body>
</html>
The <div id="root"></div> is empty. The actual content loads after JavaScript executes,
makes API calls, and renders components. Traditional scrapers never see this rendered content because they don't execute JavaScript.
Common SPA Patterns That Break Scrapers
- Client-side routing — URL changes don't trigger page reloads; content swaps dynamically
- Infinite scroll — New content loads as users scroll, never present in initial HTML
- Lazy loading — Images and content load only when they enter the viewport
- AJAX data fetching — Content comes from API calls after the page loads
- Virtual scrolling — Only visible items exist in the DOM; scrolling creates/destroys elements
- Authentication gates — Content requires login or session tokens fetched dynamically
Warning: Even if you see content in your browser's "View Source," that doesn't mean traditional scraping will work.
Modern browsers often show the live DOM, not the original HTML response.
The Solution: Headless Browser Scraping
To scrape JavaScript-heavy websites, you need a headless browser — a real browser that runs without a visible interface,
executes JavaScript, and gives you access to the fully rendered page. Think of it as programmatically controlling Chrome or Firefox.
Popular Headless Browser Options
| Tool |
Best For |
Learning Curve |
| Playwright |
Modern web apps, all browsers |
Moderate |
| Puppeteer |
Chrome-only projects |
Moderate |
| Selenium |
Legacy support, multiple languages |
Steeper |
| Papalily API |
Quick deployment, no infrastructure |
Easy |
Scraping React Applications
React apps are everywhere — from e-commerce sites to dashboards to social platforms.
Here's how to handle them effectively.
Waiting for Content to Render
The key challenge with React is knowing when the content has loaded.
You can't just fetch the page and immediately extract data — you need to wait for React to render.
react-scraper.js — Waiting for React to render
const { chromium } = require('playwright');
async function scrapeReactSite(url) {
const browser = await chromium.launch();
const page = await browser.newPage();
// Navigate and wait for network to be idle
await page.goto(url, { waitUntil: 'networkidle' });
// Additional wait for React to render
await page.waitForTimeout(2000);
// Wait for specific element that indicates content loaded
await page.waitForSelector('.product-list', { timeout: 10000 });
// Now extract data from rendered page
const products = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product-card')).map(card => ({
title: card.querySelector('.product-title')?.innerText,
price: card.querySelector('.product-price')?.innerText,
image: card.querySelector('img')?.src
}));
});
await browser.close();
return products;
}
Handling React Router Navigation
React Router handles navigation without page reloads. To scrape multiple "pages," you need to trigger navigation
and wait for the new content to render.
react-router-scraper.js — SPA navigation
async function scrapeReactRouterSite(startUrl) {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(startUrl, { waitUntil: 'networkidle' });
const allData = [];
// Find all navigation links
const links = await page.$$eval('a[href^="/products/"]', links =>
links.map(l => l.href)
);
// Visit each "page" without reloading the browser
for (const link of links.slice(0, 5)) {
await page.goto(link, { waitUntil: 'networkidle' });
await page.waitForSelector('.product-detail');
const data = await page.evaluate(() => ({
title: document.querySelector('h1')?.innerText,
description: document.querySelector('.description')?.innerText
}));
allData.push(data);
}
await browser.close();
return allData;
}
Scraping Vue.js Applications
Vue.js apps often use directives like v-if and v-for to conditionally render content.
The challenge is ensuring all conditions have been evaluated before extraction.
vue-scraper.js — Handling Vue directives
async function scrapeVueSite(url) {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle' });
// Vue often uses transitions - wait for them to complete
await page.waitForFunction(() => {
// Check if Vue is done rendering (no more loading states)
const loadingElements = document.querySelectorAll('.loading, .v-loading, [v-cloak]');
return loadingElements.length === 0;
}, { timeout: 15000 });
// Scroll to trigger any lazy-loaded Vue components
await page.evaluate(async () => {
await new Promise((resolve) => {
let totalHeight = 0;
const distance = 300;
const timer = setInterval(() => {
const scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight += distance;
if (totalHeight >= scrollHeight) {
clearInterval(timer);
resolve();
}
}, 200);
});
});
// Extract data from fully rendered Vue app
const data = await page.evaluate(() => {
return Array.from(document.querySelectorAll('[data-v-]')).map(el => ({
// Vue adds data-v- attributes for scoped CSS
text: el.innerText,
html: el.innerHTML
}));
});
await browser.close();
return data;
}
Handling Infinite Scroll
Infinite scroll is one of the most common SPA patterns and one of the trickiest to scrape.
You need to programmatically scroll the page to load more content.
infinite-scroll-scraper.js — Loading all content
async function scrapeInfiniteScroll(url, maxScrolls = 10) {
const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto(url, { waitUntil: 'networkidle' });
let previousHeight = 0;
let scrollCount = 0;
while (scrollCount < maxScrolls) {
previousHeight = await page.evaluate(() => document.body.scrollHeight);
// Scroll to bottom
await page.evaluate(() => {
window.scrollTo(0, document.body.scrollHeight);
});
// Wait for new content to load
await page.waitForTimeout(2000);
const newHeight = await page.evaluate(() => document.body.scrollHeight);
// Stop if no new content loaded
if (newHeight === previousHeight) {
break;
}
scrollCount++;
}
// Now extract all loaded content
const items = await page.$$eval('.item', items =>
items.map(item => ({
title: item.querySelector('.title')?.innerText,
link: item.querySelector('a')?.href
}))
);
await browser.close();
return items;
}
Pro Tip: Some sites detect automated scrolling. Add random delays between scrolls and
vary the scroll distance to appear more human-like.
Intercepting API Calls (The Smart Way)
Here's a powerful technique: instead of scraping the rendered HTML, intercept the API calls that the SPA makes
to fetch its data. This often returns clean JSON that's much easier to work with.
api-intercept.js — Catching XHR/fetch requests
async function interceptAPICalls(url) {
const browser = await chromium.launch();
const page = await browser.newPage();
const apiResponses = [];
// Intercept all network responses
await page.route('**/*', async (route) => {
const request = route.request();
const response = await route.fetch();
// Capture JSON API responses
if (request.url().includes('/api/') ||
response.headers()['content-type']?.includes('json')) {
try {
const json = await response.json();
apiResponses.push({
url: request.url(),
data: json
});
} catch (e) {}
}
await route.fulfill({ response });
});
await page.goto(url, { waitUntil: 'networkidle' });
await page.waitForTimeout(3000);
await browser.close();
// Return the API data instead of scraped HTML
return apiResponses;
}
The Easy Way: AI-Powered SPA Scraping
All the techniques above work, but they require significant setup, debugging, and maintenance.
CSS selectors break when sites redesign. Timing issues cause flaky scrapes. Anti-bot measures block headless browsers.
AI-powered scraping APIs like Papalily handle all of this automatically.
They run real headless browsers, execute JavaScript, wait for content to render, and extract data based on natural language descriptions.
AI-powered SPA scraping with Papalily
const response = await fetch('https://api.papalily.com/scrape', {
method: 'POST',
headers: {
'x-api-key': 'YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
url: 'https://react-ecommerce-site.com/products',
prompt: `Get all products from this React site.
Scroll down to load all items from infinite scroll.
Extract name, price, image URL, and rating for each product.`,
// Papalily automatically handles:
// - JavaScript rendering
// - Waiting for React to mount
// - Scrolling to load lazy content
// - Extracting structured data
}),
});
const result = await response.json();
// Returns clean JSON with all product data
Best Practices for SPA Scraping
1. Always Wait for Content
Never extract data immediately after navigation. Use waitForSelector, waitForFunction,
or networkidle to ensure the SPA has finished rendering.
2. Handle Loading States
SPAs often show skeleton screens or spinners while loading. Wait for these to disappear before extracting data.
3. Scroll to Trigger Lazy Loading
Many SPAs only load content when it enters the viewport. Programmatically scroll the page to ensure all content is loaded.
4. Watch for Mutation Observers
Some SPAs continuously update the DOM. Use waitForFunction to wait for specific conditions
like "no more loading indicators" or "at least 50 items rendered."
5. Respect Rate Limits
SPAs often make many API calls. Aggressive scraping can trigger rate limiting or IP bans.
Add delays between requests and consider using proxy rotation.
Common Pitfalls and Solutions
| Problem |
Solution |
| Empty content / missing elements |
Increase wait time; use waitForSelector with longer timeout |
| Stale element reference |
Re-query elements after each navigation/scroll |
| Infinite scroll stops working |
Add human-like delays; randomize scroll distance |
| Bot detection blocking scraper |
Use stealth plugins; rotate user agents; add random delays |
| Memory leaks with long sessions |
Restart browser periodically; close pages when done |
Conclusion
Scraping JavaScript-heavy websites and SPAs requires a different approach than traditional web scraping.
You need headless browsers, smart waiting strategies, and techniques for handling dynamic content loading.
While you can build all of this yourself with Playwright or Puppeteer, it requires significant maintenance
and expertise. For most use cases, an AI-powered scraping API like Papalily handles the complexity automatically,
letting you focus on using the data rather than fighting with browser automation.
Scrape Any SPA in Minutes
Get a free API key on RapidAPI — 100 free requests per month.
Works on React, Vue, Angular, and any JavaScript-heavy site.
Get Free API Key on RapidAPI →
Full documentation at papalily.com/docs