Why I keep writing about this
Most of what gets called "bot detection" online is folklore. The same five blog posts have been recycled since 2020. The fingerprints they list mostly stopped mattering by 2023 — and the techniques that actually flag a modern Playwright or Puppeteer crawl in 2026 are mostly not in those posts.
I run Krawly. Every tool we ship that touches a real website — SEO Analyzer, Tech Detector, Page Speed Analyzer, Screenshot Capture — runs through a headless browser pool. Over the past 18 months we have been blocked, throttled, fingerprinted, and CAPTCHA-challenged enough times to learn which signals still matter and which are dead weight.
This article is the field report. It is not a "bypass detection" guide — we deliberately do not build evasion features into Krawly tools. It is a description of what a 2026 anti-bot stack is actually looking at, so you can either (a) build crawls that are detectable but legitimate, or (b) understand why your existing crawler is getting blocked.
What "detection" actually means
A site doesn't detect "headless browsers" as a single binary state. It scores you across dozens of signals and decides whether to serve you the content, send you to a CAPTCHA, throttle your IP, or quietly serve degraded content. Three industry-standard scoring engines drive most of what you'll hit in 2026:
They share a lot of signals but weight them differently. The signals below are the ones all three watch in 2026.

The first thing Krawly's Tech Detector will tell you about a target site is which bot-management product (if any) is in front of it. Run this before you decide whether to scrape — it saves a lot of trial-and-error.
Signals that still matter in 2026
1. TLS fingerprint (JA3 / JA4) — the most important signal you probably don't think about
Every TLS handshake your browser performs leaves a fingerprint. The exact set of cipher suites, extensions, elliptic curves, and signature algorithms a client offers — in the exact order it offers them — is captured as a JA3 hash (legacy) or JA4 hash (current, fingerprints the full ClientHello).
Chrome, Firefox, and Safari each have well-known JA4 fingerprints. So do their headless variants. Python's `requests` library has one. Node's `https` module has one. `curl` has one. The fingerprint of a stock `requests` script does not match a real Chrome, regardless of how perfect your User-Agent header is.
Cloudflare publishes a JA4 fingerprint score on every request and uses it as a top-weight signal. In 2026, sending Chrome's User-Agent from a `requests` script is the equivalent of writing "I am a bot" on the request — it just takes Cloudflare ~5ms to read.
What you can do: use a library that performs a real Chrome-compatible TLS handshake. `curl-impersonate`, `tls-client` (Go), or running Playwright/Puppeteer with real Chromium. `requests` + custom headers is solved-problem territory; you're fingerprinted before your HTTP request payload is even read.
2. HTTP/2 frame ordering (Akamai fingerprint)
A close cousin of TLS fingerprinting. After the TLS handshake completes, you negotiate HTTP/2 and start sending SETTINGS, WINDOW_UPDATE, and HEADERS frames. The order in which a real Chrome sends those frames, the values it picks for the SETTINGS, and the priority frames it emits, are all consistent and well-documented.
Akamai's bot manager builds an HTTP/2 fingerprint at this layer. PerimeterX uses a variant. If you wrote a hand-rolled HTTP/2 client, your fingerprint won't match Chrome's. Even Go's standard `net/http` package has a fingerprint that flags as "non-browser" against the strictest detectors.
What you can do: same answer as TLS — use a real browser binary or a library specifically built to mimic one. Hand-rolled HTTP/2 in 2026 is detectable by anyone who cares.
3. `navigator.webdriver` — yes, still
The single most ancient detection signal, originally part of the WebDriver spec, is still the first thing detection scripts check. In headless Chrome with default flags, `navigator.webdriver === true`. Some Playwright builds default it to false, some don't.
Bot scripts ship a one-liner at the top of every detection script:
```js
if (navigator.webdriver) { flagged = true; }
```
This still catches the laziest crawlers in 2026. It's not the only check, but it's free, runs in microseconds, and shifts you to a higher-risk bucket.
4. Inconsistent feature surface — User-Agent says Chrome, but the JS engine is V8 from a year ago
If your User-Agent says Chrome 127, the page can run feature-detection JavaScript and check whether features that landed in Chrome 127 are actually present. If they're not — because your headless build is older — you're flagged for User-Agent spoofing.
Common features sites check in 2026:
This is one of the bigger 2026 changes from 2022 best practices: simply matching the User-Agent string is no longer enough. The whole feature surface has to match.
5. Mouse / scroll / focus event entropy
Every real user generates a constant trickle of mousemove, scroll, focus, and blur events. Headless browsers, by default, generate none unless you simulate them. Detection scripts buffer these events for 5-10 seconds after page load and score the variance.
A page loaded with zero mousemove events is flagged. A page loaded with perfectly evenly spaced events ("mousemove every 250ms exactly") is flagged harder — synthetic patterns are easier to spot than nothing.
What you can do: if your job actually requires bypassing detection (commercial competitive scraping, etc.), libraries like `puppeteer-extra-plugin-stealth` and `undetected-chromedriver` simulate human-shaped event patterns. If your job is non-adversarial (analytics, your own site monitoring, public data collection where the site doesn't actively block you), don't bother — it's brittle and breaks every few weeks.
6. Canvas + WebGL fingerprint
Each combination of GPU, driver, OS, and browser produces a slightly different canvas rendering for the same drawing instructions. Detection scripts draw a hidden canvas with text + shapes, hash the PNG output, and compare it to a database of known fingerprints.
Headless Chrome on Linux with software rendering produces a hash that is wildly different from any consumer GPU + Windows + Chrome combination. Even with --use-gl=swiftshader and --enable-webgl, the fingerprint is recognisable.
The bot-management products maintain a list of "datacenter GPU fingerprints" (Linux + software rendering = bot). They don't block on this alone, but it bumps your risk score.
Signals that mostly stopped mattering in 2026
Plugin list / `navigator.plugins`
The plugin list was a famous fingerprint until Chrome 96 removed third-party plugin support entirely. Now everyone has the same three default plugins (PDF Viewer, Chrome PDF, Native Client). The signal collapsed in 2022 and most detection scripts have stopped weighting it.
`window.chrome` object existence
In 2018-2021 you could detect headless Chrome because `window.chrome` was missing. Every stealth plugin has fixed this for years. The signal is now noise.
Timezone / language mismatch
Still checked, but with low weight. In 2026 plenty of real users have VPNs to one country, system timezone to another, and browser language to a third. The mismatch is no longer a strong signal.
`Notification.permission === "denied"`
A 2019-era favourite. Now meaningless because most real users have notifications denied by default after Chrome's 2023 quieter-notifications update.
How to tell what a target site is using
The reliable workflow:
1. Run HTTP Headers Analyzer against the target. The response headers will leak Cloudflare (`server: cloudflare`), DataDome (`x-datadome`), or PerimeterX/HUMAN (`x-px-...`) on most sites.

2. Run Tech Detector for second-source confirmation. It looks at JS files, cookie names, and meta tags in addition to headers.
3. Run JS Framework Detector to find out which client-side framework is rendering the page. A heavy React or Vue SPA needs a different scraping strategy than a server-rendered page.

If you see Cloudflare + heavy JS rendering + `x-datadome` headers, you are looking at a site that is actively investing in detection. Either scale your stack to match (real browser, residential proxy, mature stealth plugins) or pick a different source. Hand-rolled `requests` is not going to work.
A decision tree, finally
I get asked this often enough to just publish it. For any new scraping job:
1. Is the data behind login? Stop. Get a license; do not scrape authenticated content.
2. Does the site use Cloudflare Bot Management or DataDome or PerimeterX? Check HTTP Headers + Tech Detector. If yes, your stack has to include a real browser + matching TLS fingerprint. Plain `requests` will not work.
3. Is the content rendered by JavaScript? Check JS Framework Detector. If yes, you need a real browser (Playwright/Puppeteer/Chromedriver). `requests + BeautifulSoup` will return shell HTML.
4. Is the data available through an official API or RSS feed? Check RSS Feed Finder and the site's robots.txt for hints. An API is always preferable to scraping.
5. Is there a sitemap? Use Sitemap Extractor to get all URLs first, instead of crawling discovery.
Most "scraping is hard" stories happen because someone skipped steps 2-4.
What we do at Krawly
For the public, free Krawly tools that run on your behalf:
For our paid Krawly users with API access, we offer higher request rates against their own sites or sites they have written permission to scrape, but the same posture — visible identification, respect for rate limits, no detection bypass.
What I don't recommend
What I do recommend for legitimate work
Questions and corrections
The bot-detection landscape changes faster than any other corner of web infrastructure. Numbers and weightings in this article are current as of May 2026; some will be obsolete in six months. If you see an outdated claim or a confirmed disagreement, email info@krawly.io and I will update with a dated correction.
For deeper technical context on TLS fingerprinting specifically, Salesforce's original JA3 spec and the JA4+ documentation from FoxIO are the primary sources I lean on.