Is Web Scraping Legal in 2026? What Every Developer Should Know

⚠ Disclaimer: This article is for informational purposes only. It does not constitute legal advice and should not be relied upon as such. Laws vary by jurisdiction. Consult a qualified attorney for advice specific to your situation.

Is web scraping legal? It's one of the most searched questions in developer circles, and the honest answer is: it depends. It depends on what you're scraping, why, how, and where you're located. In 2026, the legal landscape around web scraping has become somewhat clearer thanks to landmark court decisions — but significant gray areas remain. Here's what you need to know.

The Short Answer

Scraping publicly available data — information visible to any unauthenticated visitor — is generally not illegal under computer crime laws in the United States, based on current case law. However, it may still violate a website's Terms of Service, raise copyright concerns, or conflict with privacy regulations like GDPR depending on what data you collect.

"Legal" and "permitted" are different things. Something can be legal (not criminal) while still being prohibited by contract (a website's ToS) or subject to civil liability.

Key Legal Cases

hiQ Labs vs. LinkedIn (US, 2022)

hiQ Labs vs. LinkedIn Corp., 9th Circuit, 2022 hiQ scraped public LinkedIn profiles to build a HR analytics product. LinkedIn sent cease-and-desist letters and blocked hiQ's access. hiQ sued. The Ninth Circuit Court of Appeals ruled that scraping publicly accessible data does not violate the Computer Fraud and Abuse Act (CFAA) — the key US federal computer crime statute — because public websites are not "protected computers" requiring authorization to access. The case was ultimately settled in 2023 with LinkedIn agreeing not to pursue CFAA claims against hiQ.

This ruling is significant because it limits how aggressively platforms can invoke computer crime law against scrapers of public data. It does not mean scraping is universally legal — only that the CFAA may not apply to publicly accessible pages.

Ryanair vs. PR Aviation (EU, 2015)

Ryanair DAC vs. PR Aviation BV, European Court of Justice, 2015 Ryanair's Terms of Service prohibited scraping. PR Aviation scraped Ryanair's public flight prices for a price comparison site. The ECJ ruled that while EU database rights didn't apply in this case, contractual Terms of Service restrictions on scraping can be legally enforceable under contract law — if users (or bots acting on their behalf) are deemed to have accepted the terms.

This case illustrates that ToS violations can have legal consequences in Europe, even when the underlying data access is technical "public."

Van Buren vs. United States (US, 2021)

Van Buren vs. United States, US Supreme Court, 2021 This case narrowed the interpretation of the CFAA's "exceeds authorized access" provision. The Supreme Court held that the CFAA applies when someone accesses a computer system they aren't entitled to access at all — not simply when they violate terms of use while accessing something they otherwise have authorization to view. This further weakened CFAA's applicability to public web scraping.

Terms of Service vs. Law

This distinction matters enormously:

ToS violations are typically civil, not criminal. Violating a website's Terms of Service can expose you to a lawsuit and potentially injunctive relief (being forced to stop), but it's generally not a criminal offense under current US law for public data.
ToS enforceability varies. For ToS terms to be enforceable, there typically needs to be evidence that the party agreed to them. Bots don't click "I Agree" — so browsewrap agreements (terms buried in footers) may not bind automated scrapers.
Clickwrap is different. If you've manually created an account, agreed to terms during signup, and then use automation under that account — you've likely agreed to the ToS and can be held to them.

What's Generally Considered Low-Risk

Scraping publicly visible data that requires no login
Respecting robots.txt directives (even if not legally required in most places)
Collecting data for research, journalism, or non-commercial purposes
Making requests at reasonable rates that don't impact server performance
Not copying entire copyrighted works — extracting factual data is different from reproducing text
Collecting data about businesses or products rather than private individuals

What Carries Higher Risk

Bypassing authentication walls — scraping content behind a login you've been granted access to under a ToS that prohibits scraping. This is where CFAA risk increases.
Circumventing technical measures — defeating CAPTCHA systems or intentionally bypassing IP blocks. Some jurisdictions treat this more seriously.
Collecting personal data at scale — names, emails, phone numbers of private individuals fall under GDPR (EU), CCPA (California), and similar laws. Mass collection of personal data requires a legal basis.
Republishing copyrighted content — scraping the text of news articles and republishing them is copyright infringement, not just a scraping legal question.
Building competing products using scraped data — some jurisdictions recognize database rights (particularly in the EU) that protect significant investments in data collection.

GDPR and Privacy Considerations

If you're in the EU or scraping data about EU residents, GDPR applies. Publicly available data isn't automatically GDPR-exempt. Under GDPR, processing personal data requires a valid legal basis — consent, legitimate interests, contractual necessity, etc.

For developers: if you're scraping product prices, stock levels, or business information — you're typically in a safer position. If you're scraping names, emails, phone numbers, or social profiles of individuals — get legal advice before proceeding.

Practical Best Practices

Check robots.txt first. Respecting it won't give you legal immunity, but disregarding it (especially when you know about it) can be used against you.
Read the Terms of Service. Know what you're agreeing to or potentially violating. Make a conscious, informed decision.
Don't circumvent technical measures. Rotating IPs and using real browsers is normal. Defeating CAPTCHAs and bypassing explicit IP bans is riskier.
Rate limit yourself. Don't hammer servers. Slow, respectful scraping is much harder to argue as malicious.
Stick to public data. No login walls, no personal data of private individuals.
Don't republish copyrighted text. Extracting facts is different from copying articles.
Get legal advice for commercial use at scale. The stakes are higher when you're building a product on scraped data.

The Bottom Line

In 2026, scraping publicly accessible websites for non-personal data, at reasonable rates, without circumventing authentication, is not criminally illegal in most jurisdictions under current case law. Whether it's contractually permissible depends on the specific site's ToS and how a court would assess enforceability against automated access.

The most responsible approach: know your target, know the risks, respect rate limits, avoid personal data, and consult a lawyer if you're building something significant.

Whatever you decide to scrape — Papalily makes it easy.

Scrape Responsibly with Papalily

AI-powered extraction that respects rate limits, renders real browsers, and returns clean JSON. Built for developers who care about doing things right. Free tier — 100 requests/month.

Get Free API Key on RapidAPI →

This article is for informational purposes only. Not legal advice.