Legal
Ethics
Web Scraping
Is Web Scraping Legal in 2026?
What Every Developer Should Know
📅 March 21, 2026
•
⏱ 10 min read
•
By Papalily Team
⚠ Disclaimer: This article is for informational purposes only. It does not constitute
legal advice and should not be relied upon as such. Laws vary by jurisdiction. Consult a qualified
attorney for advice specific to your situation.
Is web scraping legal? It's one of the most searched questions in developer circles,
and the honest answer is: it depends. It depends on what you're scraping, why, how, and where you're
located. In 2026, the legal landscape around web scraping has become somewhat clearer thanks to
landmark court decisions — but significant gray areas remain. Here's what you need to know.
The Short Answer
Scraping publicly available data — information visible to any unauthenticated visitor —
is generally not illegal under computer crime laws in the United States, based on current case law.
However, it may still violate a website's Terms of Service, raise copyright concerns, or conflict with
privacy regulations like GDPR depending on what data you collect.
"Legal" and "permitted" are different things. Something can be legal (not criminal) while still being
prohibited by contract (a website's ToS) or subject to civil liability.
Key Legal Cases
hiQ Labs vs. LinkedIn (US, 2022)
hiQ Labs vs. LinkedIn Corp., 9th Circuit, 2022
hiQ scraped public LinkedIn profiles to build a HR analytics product. LinkedIn sent cease-and-desist
letters and blocked hiQ's access. hiQ sued. The Ninth Circuit Court of Appeals ruled that scraping
publicly accessible data does not violate the Computer Fraud and Abuse Act (CFAA) — the key US
federal computer crime statute — because public websites are not "protected computers" requiring
authorization to access. The case was ultimately settled in 2023 with LinkedIn agreeing not to
pursue CFAA claims against hiQ.
This ruling is significant because it limits how aggressively platforms can invoke computer crime
law against scrapers of public data. It does not mean scraping is universally legal —
only that the CFAA may not apply to publicly accessible pages.
Ryanair vs. PR Aviation (EU, 2015)
Ryanair DAC vs. PR Aviation BV, European Court of Justice, 2015
Ryanair's Terms of Service prohibited scraping. PR Aviation scraped Ryanair's public flight prices
for a price comparison site. The ECJ ruled that while EU database rights didn't apply in this case,
contractual Terms of Service restrictions on scraping can be legally enforceable under contract law
— if users (or bots acting on their behalf) are deemed to have accepted the terms.
This case illustrates that ToS violations can have legal consequences in Europe, even when the
underlying data access is technical "public."
Van Buren vs. United States (US, 2021)
Van Buren vs. United States, US Supreme Court, 2021
This case narrowed the interpretation of the CFAA's "exceeds authorized access" provision.
The Supreme Court held that the CFAA applies when someone accesses a computer system they
aren't entitled to access at all — not simply when they violate terms of use while accessing
something they otherwise have authorization to view. This further weakened CFAA's applicability
to public web scraping.
Terms of Service vs. Law
This distinction matters enormously:
-
ToS violations are typically civil, not criminal. Violating a website's Terms
of Service can expose you to a lawsuit and potentially injunctive relief (being forced to stop),
but it's generally not a criminal offense under current US law for public data.
-
ToS enforceability varies. For ToS terms to be enforceable, there typically
needs to be evidence that the party agreed to them. Bots don't click "I Agree" — so
browsewrap agreements (terms buried in footers) may not bind automated scrapers.
-
Clickwrap is different. If you've manually created an account, agreed to terms
during signup, and then use automation under that account — you've likely agreed to the ToS
and can be held to them.
What's Generally Considered Low-Risk
- Scraping publicly visible data that requires no login
- Respecting
robots.txt directives (even if not legally required in most places)
- Collecting data for research, journalism, or non-commercial purposes
- Making requests at reasonable rates that don't impact server performance
- Not copying entire copyrighted works — extracting factual data is different from reproducing text
- Collecting data about businesses or products rather than private individuals
What Carries Higher Risk
-
Bypassing authentication walls — scraping content behind a login you've been
granted access to under a ToS that prohibits scraping. This is where CFAA risk increases.
-
Circumventing technical measures — defeating CAPTCHA systems or
intentionally bypassing IP blocks. Some jurisdictions treat this more seriously.
-
Collecting personal data at scale — names, emails, phone numbers of private
individuals fall under GDPR (EU), CCPA (California), and similar laws. Mass collection
of personal data requires a legal basis.
-
Republishing copyrighted content — scraping the text of news articles and
republishing them is copyright infringement, not just a scraping legal question.
-
Building competing products using scraped data — some jurisdictions recognize
database rights (particularly in the EU) that protect significant investments in data collection.
GDPR and Privacy Considerations
If you're in the EU or scraping data about EU residents, GDPR applies. Publicly available data
isn't automatically GDPR-exempt. Under GDPR, processing personal data requires a valid legal basis —
consent, legitimate interests, contractual necessity, etc.
For developers: if you're scraping product prices, stock levels, or business information — you're
typically in a safer position. If you're scraping names, emails, phone numbers, or social profiles
of individuals — get legal advice before proceeding.
Practical Best Practices
- Check
robots.txt first. Respecting it won't give you legal immunity, but disregarding it (especially when you know about it) can be used against you.
- Read the Terms of Service. Know what you're agreeing to or potentially violating. Make a conscious, informed decision.
- Don't circumvent technical measures. Rotating IPs and using real browsers is normal. Defeating CAPTCHAs and bypassing explicit IP bans is riskier.
- Rate limit yourself. Don't hammer servers. Slow, respectful scraping is much harder to argue as malicious.
- Stick to public data. No login walls, no personal data of private individuals.
- Don't republish copyrighted text. Extracting facts is different from copying articles.
- Get legal advice for commercial use at scale. The stakes are higher when you're building a product on scraped data.
The Bottom Line
In 2026, scraping publicly accessible websites for non-personal data, at reasonable rates, without
circumventing authentication, is not criminally illegal in most jurisdictions under current case law.
Whether it's contractually permissible depends on the specific site's ToS and how a court would
assess enforceability against automated access.
The most responsible approach: know your target, know the risks, respect rate limits, avoid personal
data, and consult a lawyer if you're building something significant.
Whatever you decide to scrape — Papalily makes it easy.
Scrape Responsibly with Papalily
AI-powered extraction that respects rate limits, renders real browsers, and returns clean JSON.
Built for developers who care about doing things right. Free tier — 100 requests/month.
Get Free API Key on RapidAPI →
This article is for informational purposes only. Not legal advice.