Scraping JavaScript-Heavy Websites and SPAs: Definitive 2026 Guide

Scraping JavaScript-heavy websites and SPAs (Single Page Applications) is the new normal in 2026. The days of simple HTML pages are behind us — React, Vue, Angular, and Svelte now power the majority of modern web applications. These frameworks deliver rich user experiences, but they also make traditional scraping approaches completely ineffective.

If you've ever tried to scrape a website and found empty divs where content should be, you've hit the JavaScript wall. This guide will show you exactly how to break through it — with modern techniques that work on even the most complex SPAs.

Why Traditional Scraping Fails on Modern Websites

The fundamental problem is simple: traditional HTTP requests only fetch the initial HTML document. In a Single Page Application, that initial document is essentially a shell — a container that loads JavaScript, which then fetches and renders the actual content dynamically.

The JavaScript Rendering Gap

Here's what happens when you fetch a modern React app with a simple HTTP request:

<!DOCTYPE html>
<html>
<head><title>My App</title></head>
<body>
  <!-- The content you want is NOT here -->
  <div id="root"></div>
  
  <!-- JavaScript that renders content later -->
  <script src="/static/js/main.chunk.js"></script>
</body>
</html>

The <div id="root"></div> is empty. The actual content loads after JavaScript executes, makes API calls, and renders components. Traditional scrapers never see this rendered content because they don't execute JavaScript.

Common SPA Patterns That Break Scrapers

Client-side routing — URL changes don't trigger page reloads; content swaps dynamically
Infinite scroll — New content loads as users scroll, never present in initial HTML
Lazy loading — Images and content load only when they enter the viewport
AJAX data fetching — Content comes from API calls after the page loads
Virtual scrolling — Only visible items exist in the DOM; scrolling creates/destroys elements
Authentication gates — Content requires login or session tokens fetched dynamically

Warning: Even if you see content in your browser's "View Source," that doesn't mean traditional scraping will work. Modern browsers often show the live DOM, not the original HTML response.

The Solution: Headless Browser Scraping

To scrape JavaScript-heavy websites, you need a headless browser — a real browser that runs without a visible interface, executes JavaScript, and gives you access to the fully rendered page. Think of it as programmatically controlling Chrome or Firefox.

Popular Headless Browser Options

Tool	Best For	Learning Curve
Playwright	Modern web apps, all browsers	Moderate
Puppeteer	Chrome-only projects	Moderate
Selenium	Legacy support, multiple languages	Steeper
Papalily API	Quick deployment, no infrastructure	Easy

Scraping React Applications

React apps are everywhere — from e-commerce sites to dashboards to social platforms. Here's how to handle them effectively.

Waiting for Content to Render

The key challenge with React is knowing when the content has loaded. You can't just fetch the page and immediately extract data — you need to wait for React to render.

const { chromium } = require('playwright');

async function scrapeReactSite(url) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  
  // Navigate and wait for network to be idle
  await page.goto(url, { waitUntil: 'networkidle' });
  
  // Additional wait for React to render
  await page.waitForTimeout(2000);
  
  // Wait for specific element that indicates content loaded
  await page.waitForSelector('.product-list', { timeout: 10000 });
  
  // Now extract data from rendered page
  const products = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('.product-card')).map(card => ({
      title: card.querySelector('.product-title')?.innerText,
      price: card.querySelector('.product-price')?.innerText,
      image: card.querySelector('img')?.src
    }));
  });
  
  await browser.close();
  return products;
}

Handling React Router Navigation

React Router handles navigation without page reloads. To scrape multiple "pages," you need to trigger navigation and wait for the new content to render.

async function scrapeReactRouterSite(startUrl) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  
  await page.goto(startUrl, { waitUntil: 'networkidle' });
  
  const allData = [];
  
  // Find all navigation links
  const links = await page.$$eval('a[href^="/products/"]', links => 
    links.map(l => l.href)
  );
  
  // Visit each "page" without reloading the browser
  for (const link of links.slice(0, 5)) {
    await page.goto(link, { waitUntil: 'networkidle' });
    await page.waitForSelector('.product-detail');
    
    const data = await page.evaluate(() => ({
      title: document.querySelector('h1')?.innerText,
      description: document.querySelector('.description')?.innerText
    }));
    
    allData.push(data);
  }
  
  await browser.close();
  return allData;
}

Scraping Vue.js Applications

Vue.js apps often use directives like v-if and v-for to conditionally render content. The challenge is ensuring all conditions have been evaluated before extraction.

async function scrapeVueSite(url) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  
  await page.goto(url, { waitUntil: 'networkidle' });
  
  // Vue often uses transitions - wait for them to complete
  await page.waitForFunction(() => {
    // Check if Vue is done rendering (no more loading states)
    const loadingElements = document.querySelectorAll('.loading, .v-loading, [v-cloak]');
    return loadingElements.length === 0;
  }, { timeout: 15000 });
  
  // Scroll to trigger any lazy-loaded Vue components
  await page.evaluate(async () => {
    await new Promise((resolve) => {
      let totalHeight = 0;
      const distance = 300;
      const timer = setInterval(() => {
        const scrollHeight = document.body.scrollHeight;
        window.scrollBy(0, distance);
        totalHeight += distance;
        
        if (totalHeight >= scrollHeight) {
          clearInterval(timer);
          resolve();
        }
      }, 200);
    });
  });
  
  // Extract data from fully rendered Vue app
  const data = await page.evaluate(() => {
    return Array.from(document.querySelectorAll('[data-v-]')).map(el => ({
      // Vue adds data-v- attributes for scoped CSS
      text: el.innerText,
      html: el.innerHTML
    }));
  });
  
  await browser.close();
  return data;
}

Handling Infinite Scroll

Infinite scroll is one of the most common SPA patterns and one of the trickiest to scrape. You need to programmatically scroll the page to load more content.

async function scrapeInfiniteScroll(url, maxScrolls = 10) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  
  await page.goto(url, { waitUntil: 'networkidle' });
  
  let previousHeight = 0;
  let scrollCount = 0;
  
  while (scrollCount < maxScrolls) {
    previousHeight = await page.evaluate(() => document.body.scrollHeight);
    
    // Scroll to bottom
    await page.evaluate(() => {
      window.scrollTo(0, document.body.scrollHeight);
    });
    
    // Wait for new content to load
    await page.waitForTimeout(2000);
    
    const newHeight = await page.evaluate(() => document.body.scrollHeight);
    
    // Stop if no new content loaded
    if (newHeight === previousHeight) {
      break;
    }
    
    scrollCount++;
  }
  
  // Now extract all loaded content
  const items = await page.$$eval('.item', items => 
    items.map(item => ({
      title: item.querySelector('.title')?.innerText,
      link: item.querySelector('a')?.href
    }))
  );
  
  await browser.close();
  return items;
}

Pro Tip: Some sites detect automated scrolling. Add random delays between scrolls and vary the scroll distance to appear more human-like.

Intercepting API Calls (The Smart Way)

Here's a powerful technique: instead of scraping the rendered HTML, intercept the API calls that the SPA makes to fetch its data. This often returns clean JSON that's much easier to work with.

async function interceptAPICalls(url) {
  const browser = await chromium.launch();
  const page = await browser.newPage();
  
  const apiResponses = [];
  
  // Intercept all network responses
  await page.route('**/*', async (route) => {
    const request = route.request();
    const response = await route.fetch();
    
    // Capture JSON API responses
    if (request.url().includes('/api/') || 
        response.headers()['content-type']?.includes('json')) {
      try {
        const json = await response.json();
        apiResponses.push({
          url: request.url(),
          data: json
        });
      } catch (e) {}
    }
    
    await route.fulfill({ response });
  });
  
  await page.goto(url, { waitUntil: 'networkidle' });
  await page.waitForTimeout(3000);
  
  await browser.close();
  
  // Return the API data instead of scraped HTML
  return apiResponses;
}

The Easy Way: AI-Powered SPA Scraping

All the techniques above work, but they require significant setup, debugging, and maintenance. CSS selectors break when sites redesign. Timing issues cause flaky scrapes. Anti-bot measures block headless browsers.

AI-powered scraping APIs like Papalily handle all of this automatically. They run real headless browsers, execute JavaScript, wait for content to render, and extract data based on natural language descriptions.

const response = await fetch('https://api.papalily.com/scrape', {
  method: 'POST',
  headers: {
    'x-api-key': 'YOUR_API_KEY',
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    url: 'https://react-ecommerce-site.com/products',
    prompt: `Get all products from this React site. 
             Scroll down to load all items from infinite scroll. 
             Extract name, price, image URL, and rating for each product.`,
    // Papalily automatically handles:
    // - JavaScript rendering
    // - Waiting for React to mount
    // - Scrolling to load lazy content
    // - Extracting structured data
  }),
});

const result = await response.json();
// Returns clean JSON with all product data

Best Practices for SPA Scraping

1. Always Wait for Content

Never extract data immediately after navigation. Use waitForSelector, waitForFunction, or networkidle to ensure the SPA has finished rendering.

2. Handle Loading States

SPAs often show skeleton screens or spinners while loading. Wait for these to disappear before extracting data.

3. Scroll to Trigger Lazy Loading

Many SPAs only load content when it enters the viewport. Programmatically scroll the page to ensure all content is loaded.

4. Watch for Mutation Observers

Some SPAs continuously update the DOM. Use waitForFunction to wait for specific conditions like "no more loading indicators" or "at least 50 items rendered."

5. Respect Rate Limits

SPAs often make many API calls. Aggressive scraping can trigger rate limiting or IP bans. Add delays between requests and consider using proxy rotation.

Common Pitfalls and Solutions

Problem	Solution
Empty content / missing elements	Increase wait time; use waitForSelector with longer timeout
Stale element reference	Re-query elements after each navigation/scroll
Infinite scroll stops working	Add human-like delays; randomize scroll distance
Bot detection blocking scraper	Use stealth plugins; rotate user agents; add random delays
Memory leaks with long sessions	Restart browser periodically; close pages when done

Conclusion

Scraping JavaScript-heavy websites and SPAs requires a different approach than traditional web scraping. You need headless browsers, smart waiting strategies, and techniques for handling dynamic content loading.

While you can build all of this yourself with Playwright or Puppeteer, it requires significant maintenance and expertise. For most use cases, an AI-powered scraping API like Papalily handles the complexity automatically, letting you focus on using the data rather than fighting with browser automation.

Scrape Any SPA in Minutes

Get a free API key on RapidAPI — 100 free requests per month. Works on React, Vue, Angular, and any JavaScript-heavy site.

Get Free API Key on RapidAPI →

Full documentation at papalily.com/docs