How to Scrape React and Vue Sites Without Writing a Single CSS Selector

You've been there. You write a scraper on Monday. By Friday the site changed its markup, your selectors are broken, and the data pipeline is down. You spend two hours debugging CSS selectors for a site that never published its internal HTML structure in the first place and probably never will. This is the reality of scraping modern web applications.

It gets worse. The site is built with React. Your requests call returns a skeleton HTML with one <div id="root"></div> and nothing else. The actual content is rendered client-side by JavaScript that runs in the browser — not in your scraper.

This post covers why traditional scrapers fail on modern JS-heavy sites, how the old workarounds (Puppeteer, Playwright) still leave you wrestling with selectors, and how AI-powered scraping changes the game entirely.

Why Traditional Scrapers Fail on React and Vue Sites

Classic web scraping tools like requests (Python), axios (Node.js), or curl operate at the HTTP level. They fetch the raw HTML response from the server. That works fine for static sites and server-side rendered pages.

But React, Vue, Angular, Next.js (in CSR mode), and countless other modern frameworks deliver a minimal HTML shell to the browser, then hydrate the UI using JavaScript. The actual product listings, prices, job titles, or article headlines are never in the initial HTML response — they're injected into the DOM after JS executes.

Three things make this especially painful:

Hydration delay: Even if you use a headless browser, you need to wait for React/Vue to finish rendering. The timing is unpredictable — it depends on network speed, API calls the page makes, and component lifecycle hooks.
Dynamic class names: Tools like CSS Modules, Tailwind JIT, or styled-components often generate hashed or utility class names (sc-aX7bV, tw-flex-1) that are meaningless for targeting and change between builds.
Structural changes: When a design team ships a UI update, the DOM structure changes. Your scraper breaks silently — or loudly at 3am when your monitoring pipeline crashes.

The Old Approach: Puppeteer / Playwright + Selectors

The standard solution has been to use a headless browser — Puppeteer (Chrome) or Playwright (multi-browser) — to render the page, then use CSS selectors or XPath to extract the data. This works, but it introduces a different set of problems.

Here's a typical Playwright scraper for a React-based job board:

const { chromium } = require('playwright');

const browser = await chromium.launch();
const page = await browser.newPage();
await page.goto('https://jobs.example.com/software-engineer');

// Wait for the React app to hydrate
await page.waitForSelector('.JobListing__container--xK7pQ');

const jobs = await page.$$eval(
  '.JobListing__container--xK7pQ',
  els => els.map(el => ({
    title: el.querySelector('.JobListing__title--aB3Rd')?.textContent?.trim(),
    company: el.querySelector('.JobListing__company--qZ9Lx')?.textContent?.trim(),
    location: el.querySelector('.JobListing__location--mP2Yw')?.textContent?.trim(),
    salary: el.querySelector('.JobListing__salary--nV8Kj')?.textContent?.trim(),
  }))
);

await browser.close();
console.log(jobs);

This works — until the site ships a new build with different class names. Then you're back to inspecting the DOM, finding the new selectors, and pushing a fix. For high-churn sites, this maintenance burden is significant.

You also have to handle: waiting for the right element, scrolling to load lazy content, closing cookie banners, and a dozen edge cases specific to each site. Every site is its own mini-project.

The AI-Powered Approach: Describe What You Want, Get JSON Back

What if instead of specifying where the data is in the HTML, you just described what the data is? That's the core idea behind AI-powered scraping.

Papalily works in three steps:

Send a URL and a plain-English prompt describing what you want to extract.
A real Chromium browser renders the page — executing all JavaScript, waiting for React/Vue to hydrate, and capturing the final DOM state exactly as a human would see it.
Gemini AI reads the rendered page (text + screenshot) and extracts precisely the data you described, returning clean structured JSON.

No selectors. No XPath. No fragile DOM queries. If the site redesigns next week, your prompt still works because the AI understands the semantic meaning of the content — not its position in the DOM tree.

Code Examples

cURL — Quickest way to test

curl -X POST https://api.papalily.com/scrape \
  -H "x-api-key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://jobs.example.com/software-engineer",
    "prompt": "Get all job listings with title, company, location, salary range, and job URL"
  }'

# Response
{
  "success": true,
  "data": {
    "jobs": [
      {
        "title": "Senior Backend Engineer",
        "company": "Acme Corp",
        "location": "Remote",
        "salary": "$130,000 – $160,000",
        "url": "https://jobs.example.com/listing/sr-backend-123"
      }
    ]
  },
  "meta": { "duration_ms": 8921 }
}

Node.js — E-commerce price monitoring

// Monitor competitor prices on a React-based product page
async function getProductPrices(url) {
  const res = await fetch('https://api.papalily.com/scrape', {
    method: 'POST',
    headers: {
      'x-api-key': process.env.PAPALILY_API_KEY,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      url,
      prompt: 'Get all products with name, current price, original price, and discount percentage. Include out-of-stock status.',
    }),
  });

  const { data } = await res.json();
  return data.products;
}

// Run it
const products = await getProductPrices('https://shop.competitor.com/laptops');
console.log(`Found ${products.length} products`);

// Compare with your own prices
products.forEach(p => {
  if (parseFloat(p.current_price) < getOwnPrice(p.name)) {
    console.log(`⚠ Competitor undercuts us on: ${p.name}`);
  }
});

Python — News aggregation

import requests
import os

def scrape_news(url, topic):
    """Scrape latest articles from any news site, React or not."""
    resp = requests.post(
        'https://api.papalily.com/scrape',
        headers={'x-api-key': os.environ['PAPALILY_API_KEY']},
        json={
            'url': url,
            'prompt': ff'Get the 10 most recent articles about {topic}. '
                      'Return title, author, published date, summary, and article URL for each.',
            'wait_ms': 3000,  # Extra wait for lazy-loaded content
        }
    )
    return resp.json()['data']['articles']

# Works on React-based news sites, Vue-based blogs, static sites — anything
articles = scrape_news('https://techcrunch.com', 'artificial intelligence')
for article in articles:
    print(f"{article['title']} — {article['author']} ({article['published_date']})")

Batch scraping — Multiple URLs in one call

// Scrape 5 competitor product pages in one API call
const res = await fetch('https://api.papalily.com/batch', {
  method: 'POST',
  headers: {
    'x-api-key': process.env.PAPALILY_API_KEY,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    requests: [
      { url: 'https://shop-a.com/headphones', prompt: 'All headphones with name and price' },
      { url: 'https://shop-b.com/headphones', prompt: 'All headphones with name and price' },
      { url: 'https://shop-c.com/headphones', prompt: 'All headphones with name and price' },
    ],
  }),
});

const { results } = await res.json();
// All 3 run in parallel — total time ≈ slowest single scrape
results.forEach((r, i) => console.log(`Shop ${i+1}:`, r.data));

Real-World Use Cases

E-commerce Price Monitoring

Monitor competitor prices across React-based e-commerce stores. Traditional scrapers break every time a store runs a frontend A/B test or updates their design. With AI extraction, your prompt ("get all product prices and availability") keeps working regardless of how the store's markup changes.

Run it on a schedule (cron, GitHub Actions, whatever) and push the results to a database. Build price alerts, trend charts, or auto-repricing logic on top.

Job Listings Aggregation

Job boards are almost universally React or Vue now (Greenhouse, Lever, Ashby, Workday — all JS-heavy). Aggregating listings from multiple boards used to require a different scraper per platform. With a prompt like "get all open positions with title, department, location, and apply URL," you get consistent JSON from every board without writing a single platform-specific selector.

News and Content Aggregation

Build your own news reader, industry digest, or research tool by pulling articles from multiple sources. The AI understands what an "article title" and "publication date" mean semantically, so it works across different news site layouts without configuration.

Real Estate and Rental Listings

Real estate sites are notoriously complex — map-based UIs, lazy loading, infinite scroll, all built in React. A prompt like "get all apartment listings with price, beds, baths, square footage, and listing URL" works across Zillow, Redfin, local agency sites, and any other platform.

The Maintenance Advantage

The real ROI of AI scraping isn't just initial development speed (though that's significant). It's the ongoing maintenance savings.

With selector-based scrapers, every site update is a potential break. Teams at larger companies often dedicate engineering time specifically to "scraper maintenance" — a cost that grows linearly with the number of sites scraped.

AI-powered extraction is fundamentally more resilient to change because it understands meaning, not structure. The same way a human can find the price on a product page regardless of how it's styled, Gemini can extract the price whether it's in a <span class="price">, a <div data-testid="product-price">, or an element with a randomly generated class name.

Try Papalily Free

100 free requests per month. No credit card. Works on any site — React, Vue, Angular, Next.js, or plain HTML. Get your API key in seconds.

Get Free API Key on RapidAPI →

Available on RapidAPI — secure billing, instant access

Limitations to Know

AI scraping isn't magic. A few things to keep in mind:

Response time: Because a real browser renders the page (8–15 seconds typically), this isn't suitable for real-time APIs. It's designed for batch jobs, scheduled pipelines, and background tasks.
Login-walled content: Papalily doesn't handle authentication. It scrapes publicly accessible pages. Logged-in content requires a different approach.
Anti-bot measures: Aggressive bot detection (Cloudflare challenge pages, CAPTCHAs) may block even real browser renders. A real browser helps a lot here, but it's not a silver bullet for every site.
Very large datasets: If you need thousands of paginated pages scraped daily, you'll want a dedicated scraping infrastructure. Papalily is optimized for targeted, high-value extractions rather than bulk crawling.

Getting Started in 2 Minutes

Sign up for a free API key at RapidAPI (no credit card needed).
Copy the cURL example above, replace YOUR_API_KEY, change the URL and prompt to a site you actually need.
Run it. You'll have clean JSON back in under 15 seconds.

The API is documented at papalily.com/docs if you want to explore all the parameters (wait_ms, no_cache, batch mode, etc.).

Writing a scraper for a React site in 2026 shouldn't require you to become an expert in that site's internal DOM structure. Describe what you want. Get the data. That's it.