Headless Browsers
JavaScript
Tutorial
How to Scrape Dynamic
JavaScript-Rendered Websites
📅 April 16, 2026
•
⏱ 11 min read
•
By Papalily Team
Dynamic JavaScript-rendered websites have become the norm in modern web development.
Built with React, Vue, Angular, and other frameworks, these sites load content dynamically after
the initial page request. While this creates fast, interactive user experiences, it presents a
significant challenge for web scrapers. This guide covers everything you need to know about
scraping these dynamic sites effectively.
Understanding the Challenge
When you fetch a modern website using traditional HTTP libraries like Python's requests
or Node's axios, you often receive an almost empty HTML shell. The actual content is loaded
by JavaScript running in the browser. Without executing that JavaScript, your scraper sees nothing
but loading spinners and placeholder elements.
Signs you're dealing with a JavaScript-rendered site:
- The page content appears gradually as you watch it load
- Viewing the page source (Ctrl+U) shows minimal HTML compared to what you see in DevTools
- URLs don't change when navigating between sections (single-page application behavior)
- Content loads as you scroll (infinite scroll pagination)
- Data appears after a brief loading animation
Traditional vs Headless Approaches
Choosing the right scraping approach depends on the target website's architecture.
Here's when to use each method:
Traditional HTTP Scraping
Use traditional scraping when:
- The website serves complete HTML on the initial request (server-side rendering)
- You can find the data in the HTML source without JavaScript execution
- Speed is critical and you need to make many requests quickly
- The site doesn't require user interactions to reveal content
Traditional scraping is faster (milliseconds vs seconds) and uses fewer resources.
It's perfect for static sites, blogs, and documentation pages.
Headless Browser Scraping
Use headless browsers when:
- Content loads dynamically after the initial page load
- You need to interact with the page (click buttons, fill forms, scroll)
- The site uses infinite scroll or "Load More" pagination
- You need to wait for specific elements to appear
- The site has anti-scraping measures that detect non-browser requests
Headless browsers run a real browser environment without the visible UI. They execute
JavaScript, render the DOM, and allow programmatic interaction just like a human user.
Headless Browser Options
The two most popular tools for headless scraping are Playwright and Puppeteer:
Playwright
Developed by Microsoft, Playwright supports multiple browsers (Chromium, Firefox, WebKit)
and offers excellent cross-browser consistency. It's become the preferred choice for many
developers due to its reliability and modern API design.
const { chromium } = require('playwright');
async function scrapeDynamicSite() {
const browser = await chromium.launch();
const page = await browser.newPage();
// Navigate and wait for content to load
await page.goto('https://example.com/products');
await page.waitForSelector('.product-item');
// Extract data
const products = await page.evaluate(() => {
return Array.from(document.querySelectorAll('.product-item')).map(item => ({
name: item.querySelector('.product-name')?.innerText,
price: item.querySelector('.product-price')?.innerText
}));
});
console.log(products);
await browser.close();
}
scrapeDynamicSite();
Puppeteer
Google's Puppeteer is a Node.js library that provides a high-level API to control Chrome
or Chromium. It's mature, well-documented, and integrates seamlessly with the Chrome DevTools Protocol.
const puppeteer = require('puppeteer');
async function scrapeWithPuppeteer() {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('https://example.com/products');
// Wait for dynamic content
await page.waitForFunction(() => {
return document.querySelectorAll('.product-item').length > 0;
});
// Scroll to load more items
await page.evaluate(async () => {
await new Promise((resolve) => {
let totalHeight = 0;
const distance = 100;
const timer = setInterval(() => {
const scrollHeight = document.body.scrollHeight;
window.scrollBy(0, distance);
totalHeight += distance;
if (totalHeight >= scrollHeight) {
clearInterval(timer);
resolve();
}
}, 100);
});
});
await browser.close();
}
scrapeWithPuppeteer();
Performance Tips and Best Practices
Headless browsers are powerful but resource-intensive. Here are strategies to optimize performance:
1. Use Browser Contexts
Instead of launching a new browser for each scrape, create isolated contexts within a single
browser instance. This is much faster and uses less memory.
2. Block Unnecessary Resources
Images, CSS, and fonts often aren't needed for data extraction. Block them to speed up page loads:
await page.route('**/*', (route) => {
const resourceType = route.request().resourceType();
if (['image', 'stylesheet', 'font'].includes(resourceType)) {
await route.abort();
} else {
await route.continue();
}
});
3. Handle Timeouts Gracefully
Dynamic sites can be unpredictable. Always set reasonable timeouts and handle failures:
try {
await page.waitForSelector('.product-list', { timeout: 10000 });
} catch (error) {
console.error('Content failed to load within timeout');
// Take screenshot for debugging
await page.screenshot({ path: 'error.png' });
}
4. Reuse Pages When Possible
If scraping multiple pages from the same site, navigate to new URLs instead of closing and
reopening the browser. This maintains cookies and session state.
5. Run Headless in Production
Always use headless mode in production for better performance. Only use headed mode for debugging:
// Production: headless for speed
const browser = await chromium.launch({ headless: true });
// Debugging: headed to see what's happening
const browser = await chromium.launch({ headless: false, slowMo: 100 });
When to Consider an API Solution
Managing headless browsers at scale introduces significant complexity: proxy rotation,
CAPTCHA solving, browser fingerprint randomization, and infrastructure costs. For many use cases,
using a managed scraping API like Papalily
is more cost-effective than building and maintaining your own headless infrastructure.
Papalily handles all the complexity of headless browsing, proxy management, and anti-detection
measures, letting you focus on using the data rather than extracting it. Simply send a URL and
a natural language prompt describing what you want to extract.
Simplify Dynamic Site Scraping
Skip the headless browser complexity. Papalily handles JavaScript rendering,
proxy rotation, and data extraction automatically.
Get Free API Key on RapidAPI →
Full documentation at papalily.com/docs