JA4 Fingerprinting Against AI Scrapers: A Practical Guide

TLS fingerprinting is experiencing a renaissance. The reason is simple: Browser-as-a-Service platforms like Browserbase, Hyperbrowser, and the growing ecosystem of LLM-powered browsers can spoof nearly every JavaScript API, rotate residential IPs, and generate convincing user agents. What they cannot easily fake is the TLS handshake.

This guide walks through JA4—the modern successor to JA3—with practical examples from real AI scraping tools. Whether you are a security analyst investigating suspicious traffic or a platform engineer building detection systems, this is the technical reference you need.

Why TLS Fingerprinting Matters Now

The bot detection landscape has shifted dramatically. Traditional signals are compromised:

User-Agent strings: Trivially spoofed. Every BaaS platform rotates user agents automatically.

JavaScript environment: Stealth libraries like Puppeteer Extra patch navigator.webdriver, fake plugins arrays, spoof canvas fingerprints, and override dozens of browser APIs. These patches are well-documented and widely deployed.

IP reputation: Residential proxy networks are cheap and abundant. Traffic originates from ISP-assigned addresses that pass reputation checks.

Behavioral patterns: AI agents are getting better at mimicking human interaction timing. While still detectable, the signal is weaker than it was.

TLS fingerprinting exploits a fundamental asymmetry: spoofing the TLS handshake requires recompiling the TLS stack. You cannot just change a header or override a JavaScript property. The cipher suites, extensions, and protocol parameters are baked into the client implementation.

When Browserbase runs a stealth session claiming to be Chrome 120, the TLS handshake reveals the actual Chromium version of their cloud infrastructure. When an LLM browser built on Playwright makes requests, its TLS fingerprint matches Playwright—not Chrome.

This is the detection vector BaaS platforms cannot easily neutralize.

JA3: The Foundation

Before diving into JA4, understanding JA3 provides essential context. Developed by Salesforce in 2017, JA3 was the first widely-adopted TLS fingerprinting method.

How JA3 Works

JA3 creates a fingerprint from the TLS ClientHello message by concatenating five fields:

  1. TLS Version (2 bytes) - The protocol version offered
  2. Cipher Suites (variable) - Encryption algorithms in preference order
  3. Extensions (variable) - TLS extensions in order
  4. Elliptic Curves (variable) - Supported key exchange curves
  5. EC Point Formats (variable) - Elliptic curve point format support

These values are joined with commas and hyphens, then MD5 hashed:

TLSVersion,Ciphers,Extensions,EllipticCurves,ECPointFormats

Example raw string:

771,4866-4865-4867-49195-49199-49196-49200-52393-52392-49171-49172-156-157-47-53,0-23-65281-10-11-35-16-5-13-18-51-45-43-27-17513-21,29-23-24-25,0

MD5 hash:

cd08e31494f9531f560d64c695473da9

JA3 Limitations

JA3 served well for years but has significant limitations:

GREASE randomization: Modern browsers use GREASE (Generate Random Extensions And Sustain Extensibility) values that change between sessions. These are dummy values designed to prevent protocol ossification. JA3 includes them in the hash, meaning the same browser can produce different fingerprints.

Extension order sensitivity: JA3 captures extension order, but browsers can reorder extensions without functional impact. This creates unnecessary fingerprint variance.

TLS 1.3 challenges: TLS 1.3 encrypts more of the handshake. Some parameters visible in TLS 1.2 are now hidden.

Evasion libraries: Tools like uTLS allow programmatic manipulation of TLS parameters. An attacker can construct a ClientHello that produces any desired JA3 hash.

These limitations motivated the development of JA4.

JA4: Next-Generation TLS Fingerprinting

JA4 was released in 2023 by FoxIO to address JA3’s shortcomings. It represents a fundamental rethinking of TLS fingerprinting methodology.

The JA4 Family

JA4 is actually a suite of fingerprinting methods:

FingerprintTargetUse Case
JA4TLS ClientHelloPrimary client identification
JA4STLS ServerHelloServer configuration analysis
JA4HHTTP headersApplication-layer fingerprinting
JA4XX.509 certificatesCertificate chain analysis
JA4TTCP parametersNetwork stack identification
JA4SSHSSH handshakeSSH client fingerprinting

For AI scraper detection, JA4 and JA4H are the most relevant.

JA4 Structure

Unlike JA3’s opaque MD5 hash, JA4 uses a human-readable format with three sections:

[protocol][version][SNI][cipher_count][extension_count][ALPN]_[cipher_hash]_[extension_hash]

A real JA4 fingerprint looks like:

t13d1516h2_8daaf6152771_b0da82dd1658

Breaking this down:

Section 1: t13d1516h2

  • t - TCP (vs q for QUIC)
  • 13 - TLS 1.3
  • d - Domain SNI present (vs i for IP)
  • 15 - 15 cipher suites offered
  • 16 - 16 extensions present
  • h2 - HTTP/2 ALPN (Application-Layer Protocol Negotiation)

Section 2: 8daaf6152771

  • Truncated SHA256 of sorted cipher suite list

Section 3: b0da82dd1658

  • Truncated SHA256 of sorted extension list

Why JA4 Is Harder to Evade

GREASE filtering: JA4 strips GREASE values before hashing. Random padding no longer affects the fingerprint.

Sorted hashing: Cipher suites and extensions are sorted before hashing. Reordering no longer changes the fingerprint.

Readable format: The prefix provides immediate context without database lookups. You can see at a glance: TLS version, transport protocol, and connection characteristics.

Multiple dimensions: Combining JA4 with JA4H creates a multi-layered fingerprint that requires spoofing both TLS and HTTP layers correctly.

Real Fingerprints from AI Scraping Tools

Let us examine actual fingerprints from the tools security analysts encounter in production.

Browserbase Sessions

Browserbase runs managed Chromium instances for AI agents and web automation. Despite claiming various Chrome versions in the User-Agent, their sessions produce consistent TLS fingerprints.

Observed JA4 fingerprint:

t13d1517h2_8daaf6152771_02713d6af862

Analysis:

  • t13 - TLS 1.3 over TCP
  • d - Proper SNI handling
  • 15 - 15 cipher suites (matches Chromium)
  • 17 - 17 extensions (slightly different from stock Chrome)
  • h2 - HTTP/2 negotiation

The extension count differs from stock Chrome builds because Browserbase’s environment modifies TLS configuration. When a session claims Chrome/121.0.0.0 but presents 17 extensions instead of Chrome 121’s standard 16, this mismatch is a detection signal.

JA4H fingerprint:

ge11nn060000_c48a6182b93a_c48a6182b93a_0000000000000000

The HTTP header fingerprint reveals:

  • ge - GET method
  • 11 - 11 headers present
  • nn - No cookies, no referer
  • Standard Accept-* header patterns

Real Chrome sessions show different JA4H patterns, particularly around header ordering and cookie presence.

Playwright-Based AI Browsers

AI agents built on Playwright (including many open-source LLM browsers) share Playwright’s TLS characteristics.

Observed JA4 fingerprint:

t13d1516h2_8daaf6152771_e5627efa2ab1

The cipher hash matches Chromium (8daaf6152771), but the extension hash differs (e5627efa2ab1). This occurs because Playwright’s launch configuration modifies extension handling.

Specifically, Playwright often:

  • Disables certain extensions for stability
  • Adds automation-related extensions
  • Modifies extension order for performance

These modifications are invisible at the JavaScript layer (stealth patches hide them) but visible in the TLS handshake.

Python Requests Library

When AI agents fall back to direct HTTP requests (common for API scraping), Python’s requests library has a distinctive fingerprint.

JA4 fingerprint (Python 3.11 + requests):

t12d1307h1_c16a28f6ef30_0000000000000000

Analysis:

  • t12 - TLS 1.2 (Python’s ssl module defaults)
  • 13 - 13 cipher suites
  • 07 - 7 extensions (minimal)
  • h1 - HTTP/1.1 only (no HTTP/2)
  • Empty extension hash indicates no SNI extensions

This fingerprint is trivially detectable. Any traffic claiming to be Chrome but presenting this fingerprint is definitively not Chrome.

curl and wget

Command-line tools used for testing and scripting have distinct fingerprints:

curl 7.x JA4:

t12d1309h1_c35a2a7e3d2f_0000000000000000

wget JA4:

t12d0907h1_b8ea3a52c2bc_0000000000000000

Both show:

  • TLS 1.2 preference
  • Minimal extension support
  • HTTP/1.1 only
  • Low cipher suite counts

Go HTTP Client

Go’s net/http package has a recognizable fingerprint:

Go 1.21 JA4:

t13d1310h2_9dc936c68ed4_000000000000
  • TLS 1.3 support
  • 13 cipher suites
  • 10 extensions
  • HTTP/2 capable

Go clients claiming to be browsers are immediately detectable by this fingerprint.

Node.js (undici/fetch)

Modern Node.js uses undici for HTTP:

Node 20 JA4:

t13d1411h2_7b5a4dc2bc8e_d43e45c10a9f
  • TLS 1.3
  • 14 cipher suites
  • 11 extensions
  • HTTP/2

The cipher and extension hashes differ significantly from browser implementations.

Building a JA4 Detection System

Here is a practical architecture for security teams implementing JA4-based detection.

Data Collection

Capturing JA4 requires TLS handshake visibility. Options include:

Reverse proxy with TLS termination:

# nginx with ssl_preread for JA4 extraction
stream {
    server {
        listen 443 ssl;
        ssl_preread on;

        # Log TLS parameters for JA4 generation
        access_log /var/log/nginx/tls_fingerprints.log tls_fingerprint;
    }
}

Load balancer integration:

  • HAProxy: ssl_fc_sni, ssl_fc_cipher variables
  • AWS ALB: TLS metadata in access logs
  • Cloudflare: JA3/JA4 available in Firewall Rules

Dedicated TLS inspection:

# Using scapy for packet capture
from scapy.all import sniff
from scapy.layers.tls.handshake import TLSClientHello

def extract_ja4(packet):
    if packet.haslayer(TLSClientHello):
        hello = packet[TLSClientHello]
        # Extract cipher suites
        ciphers = [c.name for c in hello.ciphers]
        # Extract extensions
        extensions = [e.type for e in hello.ext]
        # Generate JA4
        return generate_ja4(hello.version, ciphers, extensions)

Fingerprint Database

Maintain a database mapping fingerprints to known clients:

CREATE TABLE tls_fingerprints (
    id SERIAL PRIMARY KEY,
    ja4 VARCHAR(50) NOT NULL,
    ja4h VARCHAR(100),
    client_name VARCHAR(100),
    client_version VARCHAR(50),
    is_browser BOOLEAN DEFAULT FALSE,
    is_automation BOOLEAN DEFAULT FALSE,
    is_known_bot BOOLEAN DEFAULT FALSE,
    threat_score INTEGER DEFAULT 0,
    first_seen TIMESTAMP DEFAULT NOW(),
    last_seen TIMESTAMP DEFAULT NOW(),
    occurrence_count INTEGER DEFAULT 1,
    notes TEXT
);

-- Index for fast lookups
CREATE INDEX idx_ja4 ON tls_fingerprints(ja4);
CREATE INDEX idx_ja4h ON tls_fingerprints(ja4h);

-- Sample entries
INSERT INTO tls_fingerprints (ja4, client_name, is_browser, is_automation) VALUES
('t13d1516h2_8daaf6152771_b0da82dd1658', 'Chrome 120', TRUE, FALSE),
('t13d1517h2_8daaf6152771_02713d6af862', 'Browserbase', FALSE, TRUE),
('t13d1516h2_8daaf6152771_e5627efa2ab1', 'Playwright', FALSE, TRUE),
('t12d1307h1_c16a28f6ef30_0000000000000000', 'Python requests', FALSE, TRUE);

Detection Logic

class JA4Detector:
    def __init__(self, db_connection):
        self.db = db_connection
        self.cache = {}

    def analyze_request(self, ja4: str, ja4h: str, user_agent: str) -> dict:
        """
        Analyze a request for fingerprint anomalies.

        Returns:
            dict with threat_score, signals, and verdict
        """
        signals = []
        threat_score = 0

        # 1. Check known fingerprint database
        known = self.lookup_fingerprint(ja4)
        if known:
            if known['is_known_bot']:
                signals.append({
                    'type': 'known_bot_fingerprint',
                    'detail': known['client_name'],
                    'confidence': 95
                })
                threat_score += 40

            if known['is_automation'] and 'Chrome' in user_agent:
                signals.append({
                    'type': 'automation_claiming_browser',
                    'detail': f"{known['client_name']} claiming {user_agent}",
                    'confidence': 90
                })
                threat_score += 45

        # 2. Analyze JA4 prefix for anomalies
        prefix_analysis = self.analyze_ja4_prefix(ja4)
        if prefix_analysis['anomalies']:
            signals.extend(prefix_analysis['anomalies'])
            threat_score += prefix_analysis['score_boost']

        # 3. Cross-reference with User-Agent
        ua_consistency = self.check_ua_consistency(ja4, user_agent)
        if not ua_consistency['consistent']:
            signals.append({
                'type': 'ja4_ua_mismatch',
                'detail': ua_consistency['detail'],
                'confidence': ua_consistency['confidence']
            })
            threat_score += int(ua_consistency['confidence'] * 0.5)

        # 4. Check JA4H if available
        if ja4h:
            http_analysis = self.analyze_ja4h(ja4h, user_agent)
            signals.extend(http_analysis['signals'])
            threat_score += http_analysis['score_boost']

        # Determine verdict
        if threat_score >= 80:
            verdict = 'block'
        elif threat_score >= 50:
            verdict = 'challenge'
        elif threat_score >= 25:
            verdict = 'flag'
        else:
            verdict = 'allow'

        return {
            'threat_score': min(threat_score, 100),
            'signals': signals,
            'verdict': verdict,
            'ja4': ja4,
            'ja4h': ja4h
        }

    def analyze_ja4_prefix(self, ja4: str) -> dict:
        """Analyze the human-readable JA4 prefix."""
        anomalies = []
        score_boost = 0

        # Parse prefix: t13d1516h2
        prefix = ja4.split('_')[0]

        # Extract components
        transport = prefix[0]  # t or q
        tls_version = prefix[1:3]  # 13, 12, 11, 10
        sni = prefix[3]  # d or i
        cipher_count = int(prefix[4:6])
        ext_count = int(prefix[6:8])
        alpn = prefix[8:]  # h1, h2, etc.

        # Check for outdated TLS
        if tls_version in ['10', '11']:
            anomalies.append({
                'type': 'outdated_tls',
                'detail': f'TLS 1.{tls_version[-1]} is deprecated',
                'confidence': 80
            })
            score_boost += 25

        # Check for HTTP/1.1 only (unusual for modern browsers)
        if alpn == 'h1':
            anomalies.append({
                'type': 'no_http2_support',
                'detail': 'Client does not support HTTP/2',
                'confidence': 60
            })
            score_boost += 15

        # Check for low extension count (automation tools)
        if ext_count < 10:
            anomalies.append({
                'type': 'low_extension_count',
                'detail': f'Only {ext_count} TLS extensions (browsers have 15+)',
                'confidence': 70
            })
            score_boost += 20

        return {
            'anomalies': anomalies,
            'score_boost': score_boost
        }

    def check_ua_consistency(self, ja4: str, user_agent: str) -> dict:
        """Check if JA4 fingerprint is consistent with claimed User-Agent."""

        # Known browser JA4 patterns (cipher hash portion)
        chrome_cipher_hash = '8daaf6152771'
        firefox_cipher_hash = '5b6e3c2d1a9f'
        safari_cipher_hash = '3d4e5f6a7b8c'

        # Extract cipher hash from JA4
        parts = ja4.split('_')
        if len(parts) >= 2:
            cipher_hash = parts[1]
        else:
            return {'consistent': True, 'detail': 'Unable to parse JA4', 'confidence': 0}

        # Check claimed browser vs actual fingerprint
        if 'Chrome' in user_agent and cipher_hash != chrome_cipher_hash:
            return {
                'consistent': False,
                'detail': f'Claims Chrome but cipher hash is {cipher_hash}',
                'confidence': 85
            }

        if 'Firefox' in user_agent and cipher_hash != firefox_cipher_hash:
            return {
                'consistent': False,
                'detail': f'Claims Firefox but cipher hash is {cipher_hash}',
                'confidence': 85
            }

        return {'consistent': True, 'detail': None, 'confidence': 0}

Alert Integration

Feed detection results into your security infrastructure:

def send_to_siem(detection_result: dict, request_metadata: dict):
    """Send detection event to SIEM."""

    event = {
        'timestamp': datetime.utcnow().isoformat(),
        'event_type': 'tls_fingerprint_detection',
        'source_ip': request_metadata['client_ip'],
        'destination': request_metadata['host'],
        'user_agent': request_metadata['user_agent'],
        'ja4': detection_result['ja4'],
        'ja4h': detection_result.get('ja4h'),
        'threat_score': detection_result['threat_score'],
        'verdict': detection_result['verdict'],
        'signals': detection_result['signals'],
        'severity': 'high' if detection_result['threat_score'] >= 80 else 'medium'
    }

    # Splunk HEC
    if SPLUNK_ENABLED:
        requests.post(
            SPLUNK_HEC_URL,
            headers={'Authorization': f'Splunk {SPLUNK_TOKEN}'},
            json={'event': event}
        )

    # Elastic
    if ELASTIC_ENABLED:
        es_client.index(
            index='security-tls-fingerprints',
            document=event
        )

Analysis Workflows for Security Teams

Investigating Suspicious Traffic

When you identify potentially automated traffic, JA4 analysis follows this workflow:

Step 1: Extract fingerprints from logs

# Parse nginx logs for JA4 data
grep "ja4=" /var/log/nginx/access.log | \
  awk -F'ja4=' '{print $2}' | \
  cut -d' ' -f1 | \
  sort | uniq -c | sort -rn | head -20

Step 2: Identify anomalous patterns

-- Find fingerprints claiming Chrome but not matching Chrome's signature
SELECT
    ja4,
    user_agent,
    COUNT(*) as requests,
    COUNT(DISTINCT source_ip) as unique_ips
FROM access_logs
WHERE user_agent LIKE '%Chrome%'
  AND ja4 NOT IN (SELECT ja4 FROM known_chrome_fingerprints)
GROUP BY ja4, user_agent
ORDER BY requests DESC;

Step 3: Cross-reference with known automation tools

-- Match against automation fingerprint database
SELECT
    l.ja4,
    l.user_agent,
    k.client_name as detected_client,
    COUNT(*) as request_count
FROM access_logs l
JOIN tls_fingerprints k ON l.ja4 = k.ja4
WHERE k.is_automation = TRUE
GROUP BY l.ja4, l.user_agent, k.client_name
ORDER BY request_count DESC;

Building Detection Rules

Cloudflare Firewall Rule:

(cf.bot_management.ja3_hash in {"e7d705a3286e19ea42f587b344ee6865"
  "cd08e31494f9531f560d64c695473da9"})
or (http.user_agent contains "Chrome" and
    not cf.bot_management.ja3_hash in {"known_chrome_hash_1" "known_chrome_hash_2"})

HAProxy ACL:

# Block known automation fingerprints
acl automation_ja4 req.fhdr(x-ja4) -m str t13d1517h2_8daaf6152771_02713d6af862
acl automation_ja4 req.fhdr(x-ja4) -m str t13d1516h2_8daaf6152771_e5627efa2ab1
acl automation_ja4 req.fhdr(x-ja4) -m str t12d1307h1_c16a28f6ef30_0000000000000000

http-request deny if automation_ja4

Reporting Dashboard Metrics

Track these JA4-related metrics:

  • Fingerprint diversity: Unique JA4 hashes per day
  • Mismatch rate: Requests where JA4 contradicts User-Agent
  • Automation percentage: Traffic from known automation fingerprints
  • New fingerprint alerts: Previously unseen JA4 hashes
  • Block rate by fingerprint: Effectiveness of fingerprint-based rules

Evasion Techniques and Countermeasures

Current Evasion Methods

uTLS library: Allows Go programs to mimic arbitrary TLS fingerprints:

import (
    tls "github.com/refraction-networking/utls"
)

// Mimic Chrome's TLS fingerprint
config := &tls.Config{...}
conn, _ := tls.Dial("tcp", "example.com:443",
    config, &tls.ClientHelloID{
        Client:  "Chrome",
        Version: "120",
    })

Countermeasure: Combine JA4 with behavioral analysis. Even with perfect TLS mimicry, automation exhibits detectable interaction patterns.

TLS proxy chaining: Route traffic through a browser-based proxy to inherit its fingerprint.

Countermeasure: Analyze latency patterns. Proxy-chained requests show characteristic timing signatures.

Browser farm services: Use actual browser instances to establish TLS connections, then inject automation.

Countermeasure: Monitor for impossible behavior—no human clicks 1000 times per second with perfect accuracy.

Defense in Depth

JA4 is one layer of a comprehensive detection system:

  1. TLS fingerprinting (JA4): Network layer, hard to spoof
  2. HTTP fingerprinting (JA4H): Application layer correlation
  3. JavaScript environment checks: Detect stealth patches
  4. Behavioral analysis: Interaction pattern detection
  5. Honeypots: Definitive automation signals

When all layers agree the client is legitimate, confidence is high. When layers disagree—Chrome User-Agent, Playwright JA4, synthetic mouse movements—the verdict is clear.

WebDecoy’s JA4 Implementation

WebDecoy integrates JA4 fingerprinting as a core detection signal. Our SDKs for Node.js, Go, and PHP automatically extract TLS parameters and generate JA4/JA4H fingerprints.

Detection flow:

Request → TLS Handshake → Extract JA4

                    Compare against database

                    Cross-reference with User-Agent

                    Add to threat score

                    Combine with behavioral signals

                    Verdict: allow/challenge/block

What we detect:

  • Browserbase sessions claiming browser User-Agents
  • Playwright/Puppeteer automation
  • Python/Go/Node HTTP clients
  • curl/wget command-line tools
  • Unknown automation with low TLS extension counts
  • Version mismatches between claimed and actual browsers

Integration:

const WebDecoy = require('@webdecoy/node-sdk');

const webdecoy = new WebDecoy({
  apiKey: 'your-api-key',
  enableTLSFingerprinting: true,
  tlsAnalysis: {
    checkJA4: true,
    checkJA4H: true,
    blockKnownAutomation: true,
    alertOnMismatch: true
  }
});

app.use(webdecoy.middleware());

Conclusion

TLS fingerprinting via JA4 provides a detection vector that BaaS platforms cannot easily neutralize. While they can patch JavaScript APIs, rotate IPs, and generate convincing user agents, the TLS handshake reveals the underlying client implementation.

For security analysts, JA4 offers:

  • Immediate classification: Human-readable prefix provides instant context
  • Database correlation: Match against known automation tools
  • Mismatch detection: Compare fingerprint against claimed browser
  • Evasion resistance: Sorted hashing defeats randomization

For platform engineers, JA4 integrates into:

  • Firewall rules: Block known automation fingerprints
  • Threat scoring: Add TLS signals to multi-factor detection
  • SIEM alerting: Feed fingerprint mismatches to security operations
  • Forensics: Investigate suspicious traffic patterns

The cat-and-mouse game continues. Evasion libraries like uTLS raise the bar. But defense in depth—JA4 combined with behavioral analysis, honeypots, and environment validation—maintains detection advantage.

AI scrapers built on BaaS infrastructure cannot hide their nature forever. The TLS handshake tells the truth.


Ready to add JA4 detection to your security stack?

Start Your Free Trial and see WebDecoy’s TLS fingerprinting in action. Our SDKs handle extraction, analysis, and alerting automatically.

Questions about implementing JA4 analysis? Read our documentation or contact us directly.


Related Resources:

Want to see WebDecoy in action?

Get a personalized demo from our team.

Request Demo