Web Scraping8 min read

Is It Legal to Scrape a Website in 2026? A Practical Guide

What U.S., EU, and UK courts have actually ruled on web scraping in the past two years — and the four-question checklist I run before any scraping job.

Krawly Editorial Team avatarBy Krawly Editorial TeamIn-house engineers, writers & reviewers·Updated

The short answer

Scraping publicly accessible web pages is, in most jurisdictions, legal — provided you respect a handful of constraints. The constraints have tightened since 2024, and the long answer matters because the gap between "legal" and "you will not be sued" is not zero.

This article is not legal advice. We are engineers, not lawyers. What follows is a practical summary of how the Krawly team evaluates each scraping job before running it, based on actual court rulings and the lawyers we have consulted with for client work.

The four-question checklist I run before every job

1. Is the data behind a login or a paywall?

Scraping content that sits behind authentication is the single fastest way to make a scraping job indefensible. The 2022 *hiQ Labs v. LinkedIn* ruling protected scraping of public profiles, but every subsequent case (*Meta v. Bright Data 2024*, *X Corp v. CCDH 2023*) has reinforced that authenticated content gets stronger legal protection. If you have to log in to see it, do not scrape it.

What our Lead Generation tool does — pulling emails from a public contact page — is fine. Scraping a member-only directory is not.

2. Does the site's `robots.txt` ask you to stay out?

`robots.txt` is not legally binding in the U.S. by itself, but it is admissible evidence of bad faith in *Computer Fraud and Abuse Act* (CFAA) cases and contractual claims. Some EU member states treat it as part of the implicit terms of access.

Practical rule: I read `robots.txt` (our robots.txt analyzer makes this trivial) before any non-trivial scraping job. If the site explicitly disallows my path, I either negotiate access via the site owner or move on. Sneaking past `robots.txt` is the kind of thing that turns a civil dispute into a criminal one in jurisdictions with anti-circumvention statutes.

3. Are you taking enough data to substitute for the original site?

This is the *hot news* / database-rights doctrine and it has teeth in both EU (Database Directive) and U.S. (state-law misappropriation). Scraping a sample of weather forecasts to enrich your app is fine. Scraping every weather forecast for every postcode and republishing them as an alternative service is exactly the kind of thing courts have shut down.

A useful test: would your scraped data set, if published, reduce the demand for the original site? If yes, you are in dangerous territory and need a license, not just a scraper.

4. Are you triggering anti-fraud, anti-abuse, or rate-limit protections?

This is where most engineers trip themselves up. Even if the *content* is public, the *infrastructure* that serves it is the site owner's. If you bypass IP rate limits, rotate residential proxies to evade detection, or solve CAPTCHAs at scale, you are evading protective measures that the site has explicitly put in place. CFAA in the U.S. and Computer Misuse Act in the UK both treat circumvention of technical protections as a separate offence regardless of whether the content was public.

Practical rule: respect the rate limit. If a site rate-limits you to one request per second, scrape at one request per 1.2 seconds. Krawly's tools all default to single-request-with-timeout behaviour for this reason.

What the recent cases actually say

  • ***hiQ Labs v. LinkedIn* (Ninth Circuit, 2022, settled 2022)** — Scraping public LinkedIn profiles is not a CFAA violation; the data is publicly accessible. But hiQ ultimately settled and shut down because LinkedIn won on the contract-breach claim. Lesson: the CFAA is one risk, but breach-of-ToS contract claims are also real.
  • ***Meta v. Bright Data* (N.D. Cal., 2024)** — Court ruled in favour of Bright Data on the scraping question (Facebook public data is fair game), but the case turned on whether Bright Data had logged in to scrape protected data. They had not, and they won.
  • ***X Corp v. CCDH* (S.D.N.Y., 2023)** — Twitter / X lost a scraping-related claim against CCDH because it could not show concrete economic harm; the case was a warning that ToS claims need actual damages to stick.
  • ***LinkedIn v. Mantheos* (N.D. Cal., 2024)** — LinkedIn won here against a tool that *did* let users log in via residential proxies; circumvention of authentication and rate limits was the deciding factor.
  • The pattern across all four cases is the same: public content scraped at reasonable speeds = protected. Authentication-bypassing or rate-limit-evading scraping = liability.

    What this means for the tools on Krawly

    Every Krawly tool is designed around the four checks above:

  • We never offer authentication bypass. Tools that need a login (private LinkedIn data, paywalled news) are not on the platform and never will be.
  • We respect `robots.txt` for tools whose job is to crawl a site (broken-link checker, sitemap-health checker). For one-shot lookups against a single URL, we make exactly the request a browser would.
  • We rate-limit anonymous traffic to 30 calls per IP per day. The free tier is not designed to support large-scale extraction.
  • We do not provide residential-proxy rotation, CAPTCHA solving, or fingerprint randomisation features. These exist as products elsewhere; we deliberately do not build them.
  • If your scraping job needs anything stronger than what Krawly provides, you almost certainly need a commercial license from the source site, not a stronger scraper.

    The decision flowchart, simplified

    1. Is the data behind a login? → Stop. Get a license or don't scrape.

    2. Does `robots.txt` disallow your path? → Stop. Negotiate or move on.

    3. Are you republishing in volumes that compete with the source? → Stop without a license.

    4. Are you tempted to bypass rate limits or anti-bot protections? → Stop. That's the line.

    If all four answers are no, you are most likely on the right side of the law. Document your decisions, log the user agent you use, throttle your requests, and you'll be in defensible position even if questioned.

    Tools you'll use to do this responsibly

  • robots.txt Analyzer — check what a site allows before you scrape
  • User Agent Parser — verify your scraper identifies itself honestly
  • HTTP Headers Analyzer — see how a site responds to your requests
  • DNS / WHOIS Lookup — find the right contact for license requests
  • Krawly Methodology page — full description of how we built the tools to respect these constraints
  • Have a specific case?

    If you have a borderline scraping question and want a second opinion before you ship, you can email us at info@krawly.io. We cannot give legal advice but we can usually point you at a relevant case or a lawyer who specialises in this area.

    Try All 170+ Free Tools

    No signup required. Start analyzing websites, scraping data, and more.

    Browse All Tools

    Related Articles