JA4 Fingerprinting Against AI Scrapers: A Practical Guide
TLS fingerprinting detects AI scrapers when BaaS platforms spoof JS APIs. Learn JA4, analyze real fingerprints from Browserbase and LLM browsers.
WebDecoy Team
WebDecoy Security Team
JA4 Fingerprinting Against AI Scrapers: A Practical Guide
TLS fingerprinting is experiencing a renaissance. The reason is simple: Browser-as-a-Service platforms like Browserbase, Hyperbrowser, and the growing ecosystem of LLM-powered browsers can spoof nearly every JavaScript API, rotate residential IPs, and generate convincing user agents. What they cannot easily fake is the TLS handshake.
This guide walks through JA4—the modern successor to JA3—with practical examples from real AI scraping tools. Whether you are a security analyst investigating suspicious traffic or a platform engineer building detection systems, this is the technical reference you need.
Why TLS Fingerprinting Matters Now
The bot detection landscape has shifted dramatically. Traditional signals are compromised:
User-Agent strings: Trivially spoofed. Every BaaS platform rotates user agents automatically.
JavaScript environment: Stealth libraries like Puppeteer Extra patch navigator.webdriver, fake plugins arrays, spoof canvas fingerprints, and override dozens of browser APIs. These patches are well-documented and widely deployed.
IP reputation: Residential proxy networks are cheap and abundant. Traffic originates from ISP-assigned addresses that pass reputation checks.
Behavioral patterns: AI agents are getting better at mimicking human interaction timing. While still detectable, the signal is weaker than it was.
TLS fingerprinting exploits a fundamental asymmetry: spoofing the TLS handshake requires recompiling the TLS stack. You cannot just change a header or override a JavaScript property. The cipher suites, extensions, and protocol parameters are baked into the client implementation.
When Browserbase runs a stealth session claiming to be Chrome 120, the TLS handshake reveals the actual Chromium version of their cloud infrastructure. When an LLM browser built on Playwright makes requests, its TLS fingerprint matches Playwright—not Chrome.
This is the detection vector BaaS platforms cannot easily neutralize.
JA3: The Foundation
Before diving into JA4, understanding JA3 provides essential context. Developed by Salesforce in 2017, JA3 was the first widely-adopted TLS fingerprinting method.
How JA3 Works
JA3 creates a fingerprint from the TLS ClientHello message by concatenating five fields:
- TLS Version (2 bytes) - The protocol version offered
- Cipher Suites (variable) - Encryption algorithms in preference order
- Extensions (variable) - TLS extensions in order
- Elliptic Curves (variable) - Supported key exchange curves
- EC Point Formats (variable) - Elliptic curve point format support
These values are joined with commas and hyphens, then MD5 hashed:
TLSVersion,Ciphers,Extensions,EllipticCurves,ECPointFormatsExample raw string:
771,4866-4865-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513-21,29-23-24-25,0MD5 hash:
cd08e31494f9531f560d64c695473da9JA3 Limitations
JA3 served well for years but has significant limitations:
GREASE randomization: Modern browsers use GREASE (Generate Random Extensions And Sustain Extensibility) values that change between sessions. These are dummy values designed to prevent protocol ossification. JA3 includes them in the hash, meaning the same browser can produce different fingerprints.
Extension order sensitivity: JA3 captures extension order, but browsers can reorder extensions without functional impact. This creates unnecessary fingerprint variance.
TLS 1.3 challenges: TLS 1.3 encrypts more of the handshake. Some parameters visible in TLS 1.2 are now hidden.
Evasion libraries: Tools like uTLS allow programmatic manipulation of TLS parameters. An attacker can construct a ClientHello that produces any desired JA3 hash.
These limitations motivated the development of JA4.
JA4: Next-Generation TLS Fingerprinting
JA4 was released in 2023 by FoxIO to address JA3’s shortcomings. It represents a fundamental rethinking of TLS fingerprinting methodology.
The JA4 Family
JA4 is actually a suite of fingerprinting methods:
| Fingerprint | Target | Use Case |
|---|---|---|
| JA4 | TLS ClientHello | Primary client identification |
| JA4S | TLS ServerHello | Server configuration analysis |
| JA4H | HTTP headers | Application-layer fingerprinting |
| JA4X | X.509 certificates | Certificate chain analysis |
| JA4T | TCP parameters | Network stack identification |
| JA4SSH | SSH handshake | SSH client fingerprinting |
For AI scraper detection, JA4 and JA4H are the most relevant.
JA4 Structure
Unlike JA3’s opaque MD5 hash, JA4 uses a human-readable format with three sections:
[protocol][version][SNI][cipher_count][extension_count][ALPN]_[cipher_hash]_[extension_hash]A real JA4 fingerprint looks like:
t13d1516h2_8daaf6152771_b0da82dd1658Breaking this down:
Section 1: t13d1516h2
t- TCP (vsqfor QUIC)13- TLS 1.3d- Domain SNI present (vsifor IP)15- 15 cipher suites offered16- 16 extensions presenth2- HTTP/2 ALPN (Application-Layer Protocol Negotiation)
Section 2: 8daaf6152771
- Truncated SHA256 of sorted cipher suite list
Section 3: b0da82dd1658
- Truncated SHA256 of sorted extension list
Why JA4 Is Harder to Evade
GREASE filtering: JA4 strips GREASE values before hashing. Random padding no longer affects the fingerprint.
Sorted hashing: Cipher suites and extensions are sorted before hashing. Reordering no longer changes the fingerprint.
Readable format: The prefix provides immediate context without database lookups. You can see at a glance: TLS version, transport protocol, and connection characteristics.
Multiple dimensions: Combining JA4 with JA4H creates a multi-layered fingerprint that requires spoofing both TLS and HTTP layers correctly.
Real Fingerprints from AI Scraping Tools
Let us examine actual fingerprints from the tools security analysts encounter in production.
Browserbase Sessions
Browserbase runs managed Chromium instances for AI agents and web automation. Despite claiming various Chrome versions in the User-Agent, their sessions produce consistent TLS fingerprints.
Observed JA4 fingerprint:
t13d1517h2_8daaf6152771_02713d6af862Analysis:
t13- TLS 1.3 over TCPd- Proper SNI handling15- 15 cipher suites (matches Chromium)17- 17 extensions (slightly different from stock Chrome)h2- HTTP/2 negotiation
The extension count differs from stock Chrome builds because Browserbase’s environment modifies TLS configuration. When a session claims Chrome/121.0.0.0 but presents 17 extensions instead of Chrome 121’s standard 16, this mismatch is a detection signal.
JA4H fingerprint:
ge11nn060000_c48a6182b93a_c48a6182b93a_0000000000000000The HTTP header fingerprint reveals:
ge- GET method11- 11 headers presentnn- No cookies, no referer- Standard Accept-* header patterns
Real Chrome sessions show different JA4H patterns, particularly around header ordering and cookie presence.
Playwright-Based AI Browsers
AI agents built on Playwright (including many open-source LLM browsers) share Playwright’s TLS characteristics.
Observed JA4 fingerprint:
t13d1516h2_8daaf6152771_e5627efa2ab1The cipher hash matches Chromium (8daaf6152771), but the extension hash differs (e5627efa2ab1). This occurs because Playwright’s launch configuration modifies extension handling.
Specifically, Playwright often:
- Disables certain extensions for stability
- Adds automation-related extensions
- Modifies extension order for performance
These modifications are invisible at the JavaScript layer (stealth patches hide them) but visible in the TLS handshake.
Python Requests Library
When AI agents fall back to direct HTTP requests (common for API scraping), Python’s requests library has a distinctive fingerprint.
JA4 fingerprint (Python 3.11 + requests):
t12d1307h1_c16a28f6ef30_0000000000000000Analysis:
t12- TLS 1.2 (Python’s ssl module defaults)13- 13 cipher suites07- 7 extensions (minimal)h1- HTTP/1.1 only (no HTTP/2)- Empty extension hash indicates no SNI extensions
This fingerprint is trivially detectable. Any traffic claiming to be Chrome but presenting this fingerprint is definitively not Chrome.
curl and wget
Command-line tools used for testing and scripting have distinct fingerprints:
curl 7.x JA4:
t12d1309h1_c35a2a7e3d2f_0000000000000000wget JA4:
t12d0907h1_b8ea3a52c2bc_0000000000000000Both show:
- TLS 1.2 preference
- Minimal extension support
- HTTP/1.1 only
- Low cipher suite counts
Go HTTP Client
Go’s net/http package has a recognizable fingerprint:
Go 1.21 JA4:
t13d1310h2_9dc936c68ed4_000000000000- TLS 1.3 support
- 13 cipher suites
- 10 extensions
- HTTP/2 capable
Go clients claiming to be browsers are immediately detectable by this fingerprint.
Node.js (undici/fetch)
Modern Node.js uses undici for HTTP:
Node 20 JA4:
t13d1411h2_7b5a4dc2bc8e_d43e45c10a9f- TLS 1.3
- 14 cipher suites
- 11 extensions
- HTTP/2
The cipher and extension hashes differ significantly from browser implementations.
Building a JA4 Detection System
Here is a practical architecture for security teams implementing JA4-based detection.
Data Collection
Capturing JA4 requires TLS handshake visibility. Options include:
Reverse proxy with TLS termination:
# nginx with ssl_preread for JA4 extraction
stream {
server {
listen 443 ssl;
ssl_preread on;
# Log TLS parameters for JA4 generation
access_log /var/log/nginx/tls_fingerprints.log tls_fingerprint;
}
}Load balancer integration:
- HAProxy:
ssl_fc_sni,ssl_fc_ciphervariables - AWS ALB: TLS metadata in access logs
- Cloudflare: JA3/JA4 available in Firewall Rules
Dedicated TLS inspection:
# Using scapy for packet capture
from scapy.all import sniff
from scapy.layers.tls.handshake import TLSClientHello
def extract_ja4(packet):
if packet.haslayer(TLSClientHello):
hello = packet[TLSClientHello]
# Extract cipher suites
ciphers = [c.name for c in hello.ciphers]
# Extract extensions
extensions = [e.type for e in hello.ext]
# Generate JA4
return generate_ja4(hello.version, ciphers, extensions)Fingerprint Database
Maintain a database mapping fingerprints to known clients:
CREATE TABLE tls_fingerprints (
id SERIAL PRIMARY KEY,
ja4 VARCHAR(50) NOT NULL,
ja4h VARCHAR(100),
client_name VARCHAR(100),
client_version VARCHAR(50),
is_browser BOOLEAN DEFAULT FALSE,
is_automation BOOLEAN DEFAULT FALSE,
is_known_bot BOOLEAN DEFAULT FALSE,
threat_score INTEGER DEFAULT 0,
first_seen TIMESTAMP DEFAULT NOW(),
last_seen TIMESTAMP DEFAULT NOW(),
occurrence_count INTEGER DEFAULT 1,
notes TEXT
);
-- Index for fast lookups
CREATE INDEX idx_ja4 ON tls_fingerprints(ja4);
CREATE INDEX idx_ja4h ON tls_fingerprints(ja4h);
-- Sample entries
INSERT INTO tls_fingerprints (ja4, client_name, is_browser, is_automation) VALUES
('t13d1516h2_8daaf6152771_b0da82dd1658', 'Chrome 120', TRUE, FALSE),
('t13d1517h2_8daaf6152771_02713d6af862', 'Browserbase', FALSE, TRUE),
('t13d1516h2_8daaf6152771_e5627efa2ab1', 'Playwright', FALSE, TRUE),
('t12d1307h1_c16a28f6ef30_0000000000000000', 'Python requests', FALSE, TRUE);Detection Logic
class JA4Detector:
def __init__(self, db_connection):
self.db = db_connection
self.cache = {}
def analyze_request(self, ja4: str, ja4h: str, user_agent: str) -> dict:
"""
Analyze a request for fingerprint anomalies.
Returns:
dict with threat_score, signals, and verdict
"""
signals = []
threat_score = 0
# 1. Check known fingerprint database
known = self.lookup_fingerprint(ja4)
if known:
if known['is_known_bot']:
signals.append({
'type': 'known_bot_fingerprint',
'detail': known['client_name'],
'confidence': 95
})
threat_score += 40
if known['is_automation'] and 'Chrome' in user_agent:
signals.append({
'type': 'automation_claiming_browser',
'detail': f"{known['client_name']} claiming {user_agent}",
'confidence': 90
})
threat_score += 45
# 2. Analyze JA4 prefix for anomalies
prefix_analysis = self.analyze_ja4_prefix(ja4)
if prefix_analysis['anomalies']:
signals.extend(prefix_analysis['anomalies'])
threat_score += prefix_analysis['score_boost']
# 3. Cross-reference with User-Agent
ua_consistency = self.check_ua_consistency(ja4, user_agent)
if not ua_consistency['consistent']:
signals.append({
'type': 'ja4_ua_mismatch',
'detail': ua_consistency['detail'],
'confidence': ua_consistency['confidence']
})
threat_score += int(ua_consistency['confidence'] * 0.5)
# 4. Check JA4H if available
if ja4h:
http_analysis = self.analyze_ja4h(ja4h, user_agent)
signals.extend(http_analysis['signals'])
threat_score += http_analysis['score_boost']
# Determine verdict
if threat_score >= 80:
verdict = 'block'
elif threat_score >= 50:
verdict = 'challenge'
elif threat_score >= 25:
verdict = 'flag'
else:
verdict = 'allow'
return {
'threat_score': min(threat_score, 100),
'signals': signals,
'verdict': verdict,
'ja4': ja4,
'ja4h': ja4h
}
def analyze_ja4_prefix(self, ja4: str) -> dict:
"""Analyze the human-readable JA4 prefix."""
anomalies = []
score_boost = 0
# Parse prefix: t13d1516h2
prefix = ja4.split('_')[0]
# Extract components
transport = prefix[0] # t or q
tls_version = prefix[1:3] # 13, 12, 11, 10
sni = prefix[3] # d or i
cipher_count = int(prefix[4:6])
ext_count = int(prefix[6:8])
alpn = prefix[8:] # h1, h2, etc.
# Check for outdated TLS
if tls_version in ['10', '11']:
anomalies.append({
'type': 'outdated_tls',
'detail': f'TLS 1.{tls_version[-1]} is deprecated',
'confidence': 80
})
score_boost += 25
# Check for HTTP/1.1 only (unusual for modern browsers)
if alpn == 'h1':
anomalies.append({
'type': 'no_http2_support',
'detail': 'Client does not support HTTP/2',
'confidence': 60
})
score_boost += 15
# Check for low extension count (automation tools)
if ext_count < 10:
anomalies.append({
'type': 'low_extension_count',
'detail': f'Only {ext_count} TLS extensions (browsers have 15+)',
'confidence': 70
})
score_boost += 20
return {
'anomalies': anomalies,
'score_boost': score_boost
}
def check_ua_consistency(self, ja4: str, user_agent: str) -> dict:
"""Check if JA4 fingerprint is consistent with claimed User-Agent."""
# Known browser JA4 patterns (cipher hash portion)
chrome_cipher_hash = '8daaf6152771'
firefox_cipher_hash = '5b6e3c2d1a9f'
safari_cipher_hash = '3d4e5f6a7b8c'
# Extract cipher hash from JA4
parts = ja4.split('_')
if len(parts) >= 2:
cipher_hash = parts[1]
else:
return {'consistent': True, 'detail': 'Unable to parse JA4', 'confidence': 0}
# Check claimed browser vs actual fingerprint
if 'Chrome' in user_agent and cipher_hash != chrome_cipher_hash:
return {
'consistent': False,
'detail': f'Claims Chrome but cipher hash is {cipher_hash}',
'confidence': 85
}
if 'Firefox' in user_agent and cipher_hash != firefox_cipher_hash:
return {
'consistent': False,
'detail': f'Claims Firefox but cipher hash is {cipher_hash}',
'confidence': 85
}
return {'consistent': True, 'detail': None, 'confidence': 0}Alert Integration
Feed detection results into your security infrastructure:
def send_to_siem(detection_result: dict, request_metadata: dict):
"""Send detection event to SIEM."""
event = {
'timestamp': datetime.utcnow().isoformat(),
'event_type': 'tls_fingerprint_detection',
'source_ip': request_metadata['client_ip'],
'destination': request_metadata['host'],
'user_agent': request_metadata['user_agent'],
'ja4': detection_result['ja4'],
'ja4h': detection_result.get('ja4h'),
'threat_score': detection_result['threat_score'],
'verdict': detection_result['verdict'],
'signals': detection_result['signals'],
'severity': 'high' if detection_result['threat_score'] >= 80 else 'medium'
}
# Splunk HEC
if SPLUNK_ENABLED:
requests.post(
SPLUNK_HEC_URL,
headers={'Authorization': f'Splunk {SPLUNK_TOKEN}'},
json={'event': event}
)
# Elastic
if ELASTIC_ENABLED:
es_client.index(
index='security-tls-fingerprints',
document=event
)Analysis Workflows for Security Teams
Investigating Suspicious Traffic
When you identify potentially automated traffic, JA4 analysis follows this workflow:
Step 1: Extract fingerprints from logs
# Parse nginx logs for JA4 data
grep "ja4=" /var/log/nginx/access.log | \
awk -F'ja4=' '{print $2}' | \
cut -d' ' -f1 | \
sort | uniq -c | sort -rn | head -20Step 2: Identify anomalous patterns
-- Find fingerprints claiming Chrome but not matching Chrome's signature
SELECT
ja4,
user_agent,
COUNT(*) as requests,
COUNT(DISTINCT source_ip) as unique_ips
FROM access_logs
WHERE user_agent LIKE '%Chrome%'
AND ja4 NOT IN (SELECT ja4 FROM known_chrome_fingerprints)
GROUP BY ja4, user_agent
ORDER BY requests DESC;Step 3: Cross-reference with known automation tools
-- Match against automation fingerprint database
SELECT
l.ja4,
l.user_agent,
k.client_name as detected_client,
COUNT(*) as request_count
FROM access_logs l
JOIN tls_fingerprints k ON l.ja4 = k.ja4
WHERE k.is_automation = TRUE
GROUP BY l.ja4, l.user_agent, k.client_name
ORDER BY request_count DESC;Building Detection Rules
Cloudflare Firewall Rule:
(cf.bot_management.ja3_hash in {"e7d705a3286e19ea42f587b344ee6865"
"cd08e31494f9531f560d64c695473da9"})
or (http.user_agent contains "Chrome" and
not cf.bot_management.ja3_hash in {"known_chrome_hash_1" "known_chrome_hash_2"})HAProxy ACL:
# Block known automation fingerprints
acl automation_ja4 req.fhdr(x-ja4) -m str t13d1517h2_8daaf6152771_02713d6af862
acl automation_ja4 req.fhdr(x-ja4) -m str t13d1516h2_8daaf6152771_e5627efa2ab1
acl automation_ja4 req.fhdr(x-ja4) -m str t12d1307h1_c16a28f6ef30_0000000000000000
http-request deny if automation_ja4Reporting Dashboard Metrics
Track these JA4-related metrics:
- Fingerprint diversity: Unique JA4 hashes per day
- Mismatch rate: Requests where JA4 contradicts User-Agent
- Automation percentage: Traffic from known automation fingerprints
- New fingerprint alerts: Previously unseen JA4 hashes
- Block rate by fingerprint: Effectiveness of fingerprint-based rules
Evasion Techniques and Countermeasures
Current Evasion Methods
uTLS library: Allows Go programs to mimic arbitrary TLS fingerprints:
import (
tls "github.com/refraction-networking/utls"
)
// Mimic Chrome's TLS fingerprint
config := &tls.Config{...}
conn, _ := tls.Dial("tcp", "example.com:443",
config, &tls.ClientHelloID{
Client: "Chrome",
Version: "120",
})Countermeasure: Combine JA4 with behavioral analysis. Even with perfect TLS mimicry, automation exhibits detectable interaction patterns.
TLS proxy chaining: Route traffic through a browser-based proxy to inherit its fingerprint.
Countermeasure: Analyze latency patterns. Proxy-chained requests show characteristic timing signatures.
Browser farm services: Use actual browser instances to establish TLS connections, then inject automation.
Countermeasure: Monitor for impossible behavior—no human clicks 1000 times per second with perfect accuracy.
Defense in Depth
JA4 is one layer of a comprehensive detection system:
- TLS fingerprinting (JA4): Network layer, hard to spoof
- HTTP fingerprinting (JA4H): Application layer correlation
- JavaScript environment checks: Detect stealth patches
- Behavioral analysis: Interaction pattern detection
- Honeypots: Definitive automation signals
When all layers agree the client is legitimate, confidence is high. When layers disagree—Chrome User-Agent, Playwright JA4, synthetic mouse movements—the verdict is clear.
WebDecoy’s JA4 Implementation
WebDecoy integrates JA4 fingerprinting as a core detection signal. Our SDKs for Node.js, Go, and PHP automatically extract TLS parameters and generate JA4/JA4H fingerprints.
Detection flow:
Request → TLS Handshake → Extract JA4
↓
Compare against database
↓
Cross-reference with User-Agent
↓
Add to threat score
↓
Combine with behavioral signals
↓
Verdict: allow/challenge/blockWhat we detect:
- Browserbase sessions claiming browser User-Agents
- Playwright/Puppeteer automation
- Python/Go/Node HTTP clients
- curl/wget command-line tools
- Unknown automation with low TLS extension counts
- Version mismatches between claimed and actual browsers
Integration:
const WebDecoy = require('@webdecoy/node-sdk');
const webdecoy = new WebDecoy({
apiKey: 'your-api-key',
enableTLSFingerprinting: true,
tlsAnalysis: {
checkJA4: true,
checkJA4H: true,
blockKnownAutomation: true,
alertOnMismatch: true
}
});
app.use(webdecoy.middleware());Conclusion
TLS fingerprinting via JA4 provides a detection vector that BaaS platforms cannot easily neutralize. While they can patch JavaScript APIs, rotate IPs, and generate convincing user agents, the TLS handshake reveals the underlying client implementation.
For security analysts, JA4 offers:
- Immediate classification: Human-readable prefix provides instant context
- Database correlation: Match against known automation tools
- Mismatch detection: Compare fingerprint against claimed browser
- Evasion resistance: Sorted hashing defeats randomization
For platform engineers, JA4 integrates into:
- Firewall rules: Block known automation fingerprints
- Threat scoring: Add TLS signals to multi-factor detection
- SIEM alerting: Feed fingerprint mismatches to security operations
- Forensics: Investigate suspicious traffic patterns
The cat-and-mouse game continues. Evasion libraries like uTLS raise the bar. But defense in depth—JA4 combined with behavioral analysis, honeypots, and environment validation—maintains detection advantage.
AI scrapers built on BaaS infrastructure cannot hide their nature forever. The TLS handshake tells the truth.
Ready to add JA4 detection to your security stack?
Start Your Free Trial and see WebDecoy’s TLS fingerprinting in action. Our SDKs handle extraction, analysis, and alerting automatically.
Questions about implementing JA4 analysis? Read our documentation or contact us directly.
Related Resources:
Share this post
Like this post? Share it with your friends!
Want to see WebDecoy in action?
Get a personalized demo from our team.