Amazon is one of the most valuable data sources on the internet — and one of the hardest to scrape reliably. If you've tried scraping Amazon in 2025 or 2026, you've probably hit the wall: CAPTCHAs, empty product pages, JavaScript that never loads, or worse — quietly incorrect data that looks right but isn't.
This guide covers why Amazon is so hard to scrape, what the traditional approaches miss, and how combining real browser rendering with AI extraction changes the equation. We'll include working code examples throughout.
Amazon has invested heavily in bot detection and anti-scraping infrastructure. Understanding why they're hard helps you choose the right approach.
Amazon product pages aren't static HTML. They're React applications that render dynamically. If you send a simple HTTP GET request to an Amazon product page and try to parse the response, you'll get a skeleton with empty product containers. The actual product name, price, images, and reviews are loaded by JavaScript after the initial HTML arrives.
This means any scraper that doesn't execute JavaScript — like raw requests in Python or
fetch in Node.js — will silently return incomplete data or no data at all.
Amazon tracks dozens of signals to identify bots. These include:
requests vs a real browser differs significantlynavigator.webdriver flag, Playwright/Puppeteer signaturesAmazon constantly A/B tests its page layouts. The class names and DOM structure on a product page today may be completely different from what they were three months ago — or from what a different user in a different region sees. Selectors that work on Monday break by Friday.
Amazon uses both reCAPTCHA and their own custom CAPTCHA system. Once triggered, you typically can't proceed until the CAPTCHA is solved — either by a human or an automated CAPTCHA solver service.
The simplest approach — and the one that fails fastest on Amazon. You get the HTML skeleton, not the rendered product data. Amazon also quickly identifies and blocks requests that don't look like real browsers based on headers and TLS fingerprint.
# This looks simple but returns empty product containers on Amazon
import requests
from bs4 import BeautifulSoup
resp = requests.get('https://www.amazon.com/dp/B0XXXXX', headers={
'User-Agent': 'Mozilla/5.0 ...',
})
soup = BeautifulSoup(resp.text, 'html.parser')
price = soup.select_one('#priceblock_ourprice') # Returns None — JS hasn't run
print(price) # None
Using a real browser via Playwright or Puppeteer is much better — JavaScript executes, and you get the real rendered page. But you still need to:
navigator.webdriver)
It works, but it's a constant maintenance burden. Amazon's class names are often auto-generated
(a-price-whole vs a hash-based selector) and can change in A/B tests.
Before diving into solutions, it's worth understanding what people actually need this data for:
The approach that works reliably in 2026 combines two things:
The AI approach is specifically valuable for Amazon because it solves the selector brittleness problem.
Instead of .a-price-whole (which changes), you say "get the current price" — and the AI
understands what that means regardless of the DOM structure.
curl -X POST https://api.papalily.com/scrape \
-H "x-api-key: YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://www.amazon.com/dp/B0XXXXXXXXX",
"prompt": "Extract product details: name, current price, original price if on sale, rating, number of reviews, availability (in stock or not), ASIN, main features (bullet points), and brand name."
}'
# Returns:
{
"success": true,
"data": {
"name": "Product Name Here",
"current_price": "$29.99",
"original_price": "$49.99",
"discount": "40% off",
"rating": 4.3,
"review_count": 2847,
"in_stock": true,
"asin": "B0XXXXXXXXX",
"brand": "BrandName",
"features": [
"Feature one description",
"Feature two description"
]
},
"meta": { "duration_ms": 11240 }
}
const API_KEY = process.env.PAPALILY_API_KEY;
async function getAmazonProductPrice(asin) {
const url = `https://www.amazon.com/dp/${asin}`;
const response = await fetch('https://api.papalily.com/scrape', {
method: 'POST',
headers: {
'x-api-key': API_KEY,
'Content-Type': 'application/json',
},
body: JSON.stringify({
url,
prompt: `Get the product name, current price, original price if crossed out,
rating out of 5, number of reviews, and stock status.
Return as { name, price, original_price, rating, reviews, in_stock }`,
}),
});
const result = await response.json();
return result.data;
}
async function monitorPrices(asins) {
console.log(`Checking prices for ${asins.length} products...\n`);
for (const asin of asins) {
try {
const product = await getAmazonProductPrice(asin);
const timestamp = new Date().toISOString();
console.log(`${timestamp} | ${asin}`);
console.log(` ${product.name}`);
console.log(` Price: ${product.price}${product.original_price ? ` (was ${product.original_price})` : ''}`);
console.log(` Rating: ${product.rating} (${product.reviews} reviews)`);
console.log(` Stock: ${product.in_stock ? 'In Stock' : 'Out of Stock'}\n`);
// In production: save to database, trigger alerts on price changes
} catch (err) {
console.error(`Failed for ${asin}:`, err.message);
}
}
}
// Monitor these ASINs
monitorPrices(['B0XXXXXXXXX', 'B0YYYYYYYYY', 'B0ZZZZZZZZZ']);
import requests
import json
from datetime import datetime
API_KEY = "YOUR_API_KEY"
def analyze_amazon_category(asins: list[str]) -> list[dict]:
"""Analyze multiple Amazon products for competitive intelligence."""
# Use batch endpoint for up to 5 at a time
results = []
for i in range(0, len(asins), 5):
batch = asins[i:i+5]
urls = [f"https://www.amazon.com/dp/{asin}" for asin in batch]
resp = requests.post(
"https://api.papalily.com/batch",
headers={"x-api-key": API_KEY},
json={
"urls": urls,
"prompt": """Extract: product name, price, original price if on sale,
rating (number), review count, brand, key features (first 3 bullet points),
and whether it's in stock. Return as structured JSON.""",
},
timeout=120,
)
batch_result = resp.json()
for item in batch_result.get("results", []):
if item.get("data"):
results.append({
"asin": batch[batch_result["results"].index(item)],
"scraped_at": datetime.utcnow().isoformat(),
**item["data"],
})
return results
# Example: analyze top products in a category
asins = [
"B0ASIN00001",
"B0ASIN00002",
"B0ASIN00003",
]
products = analyze_amazon_category(asins)
# Save for analysis
with open("amazon_analysis.json", "w") as f:
json.dump(products, f, indent=2)
# Quick summary
print(f"\nAnalyzed {len(products)} products")
prices = [float(p.get("price", "0").replace("$", "").replace(",", ""))
for p in products if p.get("price")]
if prices:
print(f"Price range: ${min(prices):.2f} - ${max(prices):.2f}")
print(f"Average: ${sum(prices)/len(prices):.2f}")
GET /usage to track your API consumption.Amazon scraping in 2026 is hard, but not impossible. The key insights are: you must use a real browser (JavaScript rendering is non-negotiable), and you need an approach that doesn't depend on brittle CSS selectors that break with every A/B test and redesign.
AI-powered extraction — where you describe what you want in plain English — solves both problems elegantly. The browser handles rendering and anti-detection. The AI handles semantic understanding of the page, regardless of its current structure.
Papalily gives you 50 free requests to test with. No credit card, no setup. Drop in any Amazon product URL and describe what you want extracted.
Get Free API Key on RapidAPI →