Advanced JavaScript SPA 2026

Scraping JavaScript-Heavy Websites
and SPAs: Definitive 2026 Guide

📅 June 21, 2026 ⏱ 12 min read By Papalily Team

Scraping JavaScript-heavy websites and SPAs (Single Page Applications) is the new normal in 2026. The days of simple HTML pages are behind us — React, Vue, Angular, and Svelte now power the majority of modern web applications. These frameworks deliver rich user experiences, but they also make traditional scraping approaches completely ineffective.

If you've ever tried to scrape a website and found empty divs where content should be, you've hit the JavaScript wall. This guide will show you exactly how to break through it — with modern techniques that work on even the most complex SPAs.

Why Traditional Scraping Fails on Modern Websites

The fundamental problem is simple: traditional HTTP requests only fetch the initial HTML document. In a Single Page Application, that initial document is essentially a shell — a container that loads JavaScript, which then fetches and renders the actual content dynamically.

The JavaScript Rendering Gap

Here's what happens when you fetch a modern React app with a simple HTTP request:

What you get with HTTP requests
<!DOCTYPE html> <html> <head><title>My App</title></head> <body> <!-- The content you want is NOT here --> <div id="root"></div> <!-- JavaScript that renders content later --> <script src="/static/js/main.chunk.js"></script> </body> </html>

The <div id="root"></div> is empty. The actual content loads after JavaScript executes, makes API calls, and renders components. Traditional scrapers never see this rendered content because they don't execute JavaScript.

Common SPA Patterns That Break Scrapers

Warning: Even if you see content in your browser's "View Source," that doesn't mean traditional scraping will work. Modern browsers often show the live DOM, not the original HTML response.

The Solution: Headless Browser Scraping

To scrape JavaScript-heavy websites, you need a headless browser — a real browser that runs without a visible interface, executes JavaScript, and gives you access to the fully rendered page. Think of it as programmatically controlling Chrome or Firefox.

Popular Headless Browser Options

Tool Best For Learning Curve
Playwright Modern web apps, all browsers Moderate
Puppeteer Chrome-only projects Moderate
Selenium Legacy support, multiple languages Steeper
Papalily API Quick deployment, no infrastructure Easy

Scraping React Applications

React apps are everywhere — from e-commerce sites to dashboards to social platforms. Here's how to handle them effectively.

Waiting for Content to Render

The key challenge with React is knowing when the content has loaded. You can't just fetch the page and immediately extract data — you need to wait for React to render.

react-scraper.js — Waiting for React to render
const { chromium } = require('playwright'); async function scrapeReactSite(url) { const browser = await chromium.launch(); const page = await browser.newPage(); // Navigate and wait for network to be idle await page.goto(url, { waitUntil: 'networkidle' }); // Additional wait for React to render await page.waitForTimeout(2000); // Wait for specific element that indicates content loaded await page.waitForSelector('.product-list', { timeout: 10000 }); // Now extract data from rendered page const products = await page.evaluate(() => { return Array.from(document.querySelectorAll('.product-card')).map(card => ({ title: card.querySelector('.product-title')?.innerText, price: card.querySelector('.product-price')?.innerText, image: card.querySelector('img')?.src })); }); await browser.close(); return products; }

Handling React Router Navigation

React Router handles navigation without page reloads. To scrape multiple "pages," you need to trigger navigation and wait for the new content to render.

react-router-scraper.js — SPA navigation
async function scrapeReactRouterSite(startUrl) { const browser = await chromium.launch(); const page = await browser.newPage(); await page.goto(startUrl, { waitUntil: 'networkidle' }); const allData = []; // Find all navigation links const links = await page.$$eval('a[href^="/products/"]', links => links.map(l => l.href) ); // Visit each "page" without reloading the browser for (const link of links.slice(0, 5)) { await page.goto(link, { waitUntil: 'networkidle' }); await page.waitForSelector('.product-detail'); const data = await page.evaluate(() => ({ title: document.querySelector('h1')?.innerText, description: document.querySelector('.description')?.innerText })); allData.push(data); } await browser.close(); return allData; }

Scraping Vue.js Applications

Vue.js apps often use directives like v-if and v-for to conditionally render content. The challenge is ensuring all conditions have been evaluated before extraction.

vue-scraper.js — Handling Vue directives
async function scrapeVueSite(url) { const browser = await chromium.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle' }); // Vue often uses transitions - wait for them to complete await page.waitForFunction(() => { // Check if Vue is done rendering (no more loading states) const loadingElements = document.querySelectorAll('.loading, .v-loading, [v-cloak]'); return loadingElements.length === 0; }, { timeout: 15000 }); // Scroll to trigger any lazy-loaded Vue components await page.evaluate(async () => { await new Promise((resolve) => { let totalHeight = 0; const distance = 300; const timer = setInterval(() => { const scrollHeight = document.body.scrollHeight; window.scrollBy(0, distance); totalHeight += distance; if (totalHeight >= scrollHeight) { clearInterval(timer); resolve(); } }, 200); }); }); // Extract data from fully rendered Vue app const data = await page.evaluate(() => { return Array.from(document.querySelectorAll('[data-v-]')).map(el => ({ // Vue adds data-v- attributes for scoped CSS text: el.innerText, html: el.innerHTML })); }); await browser.close(); return data; }

Handling Infinite Scroll

Infinite scroll is one of the most common SPA patterns and one of the trickiest to scrape. You need to programmatically scroll the page to load more content.

infinite-scroll-scraper.js — Loading all content
async function scrapeInfiniteScroll(url, maxScrolls = 10) { const browser = await chromium.launch(); const page = await browser.newPage(); await page.goto(url, { waitUntil: 'networkidle' }); let previousHeight = 0; let scrollCount = 0; while (scrollCount < maxScrolls) { previousHeight = await page.evaluate(() => document.body.scrollHeight); // Scroll to bottom await page.evaluate(() => { window.scrollTo(0, document.body.scrollHeight); }); // Wait for new content to load await page.waitForTimeout(2000); const newHeight = await page.evaluate(() => document.body.scrollHeight); // Stop if no new content loaded if (newHeight === previousHeight) { break; } scrollCount++; } // Now extract all loaded content const items = await page.$$eval('.item', items => items.map(item => ({ title: item.querySelector('.title')?.innerText, link: item.querySelector('a')?.href })) ); await browser.close(); return items; }
Pro Tip: Some sites detect automated scrolling. Add random delays between scrolls and vary the scroll distance to appear more human-like.

Intercepting API Calls (The Smart Way)

Here's a powerful technique: instead of scraping the rendered HTML, intercept the API calls that the SPA makes to fetch its data. This often returns clean JSON that's much easier to work with.

api-intercept.js — Catching XHR/fetch requests
async function interceptAPICalls(url) { const browser = await chromium.launch(); const page = await browser.newPage(); const apiResponses = []; // Intercept all network responses await page.route('**/*', async (route) => { const request = route.request(); const response = await route.fetch(); // Capture JSON API responses if (request.url().includes('/api/') || response.headers()['content-type']?.includes('json')) { try { const json = await response.json(); apiResponses.push({ url: request.url(), data: json }); } catch (e) {} } await route.fulfill({ response }); }); await page.goto(url, { waitUntil: 'networkidle' }); await page.waitForTimeout(3000); await browser.close(); // Return the API data instead of scraped HTML return apiResponses; }

The Easy Way: AI-Powered SPA Scraping

All the techniques above work, but they require significant setup, debugging, and maintenance. CSS selectors break when sites redesign. Timing issues cause flaky scrapes. Anti-bot measures block headless browsers.

AI-powered scraping APIs like Papalily handle all of this automatically. They run real headless browsers, execute JavaScript, wait for content to render, and extract data based on natural language descriptions.

AI-powered SPA scraping with Papalily
const response = await fetch('https://api.papalily.com/scrape', { method: 'POST', headers: { 'x-api-key': 'YOUR_API_KEY', 'Content-Type': 'application/json', }, body: JSON.stringify({ url: 'https://react-ecommerce-site.com/products', prompt: `Get all products from this React site. Scroll down to load all items from infinite scroll. Extract name, price, image URL, and rating for each product.`, // Papalily automatically handles: // - JavaScript rendering // - Waiting for React to mount // - Scrolling to load lazy content // - Extracting structured data }), }); const result = await response.json(); // Returns clean JSON with all product data

Best Practices for SPA Scraping

1. Always Wait for Content

Never extract data immediately after navigation. Use waitForSelector, waitForFunction, or networkidle to ensure the SPA has finished rendering.

2. Handle Loading States

SPAs often show skeleton screens or spinners while loading. Wait for these to disappear before extracting data.

3. Scroll to Trigger Lazy Loading

Many SPAs only load content when it enters the viewport. Programmatically scroll the page to ensure all content is loaded.

4. Watch for Mutation Observers

Some SPAs continuously update the DOM. Use waitForFunction to wait for specific conditions like "no more loading indicators" or "at least 50 items rendered."

5. Respect Rate Limits

SPAs often make many API calls. Aggressive scraping can trigger rate limiting or IP bans. Add delays between requests and consider using proxy rotation.

Common Pitfalls and Solutions

Problem Solution
Empty content / missing elements Increase wait time; use waitForSelector with longer timeout
Stale element reference Re-query elements after each navigation/scroll
Infinite scroll stops working Add human-like delays; randomize scroll distance
Bot detection blocking scraper Use stealth plugins; rotate user agents; add random delays
Memory leaks with long sessions Restart browser periodically; close pages when done

Conclusion

Scraping JavaScript-heavy websites and SPAs requires a different approach than traditional web scraping. You need headless browsers, smart waiting strategies, and techniques for handling dynamic content loading.

While you can build all of this yourself with Playwright or Puppeteer, it requires significant maintenance and expertise. For most use cases, an AI-powered scraping API like Papalily handles the complexity automatically, letting you focus on using the data rather than fighting with browser automation.

Scrape Any SPA in Minutes

Get a free API key on RapidAPI — 100 free requests per month. Works on React, Vue, Angular, and any JavaScript-heavy site.

Get Free API Key on RapidAPI →

Full documentation at papalily.com/docs