Reverse Engineering Credential Stuffing Attacks: A Technical Deep Dive

Most write-ups on credential stuffing stop at “attackers replay leaked passwords.” That framing is technically correct and operationally useless. If you actually want to defend a login endpoint, you need to understand what the attack looks like on the wire — what tools generate it, what their configs encode, how they handle MFA and CAPTCHA, and which signals survive the residential-proxy-and-stealth-browser arms race.

This is a hands-on breakdown. We’ll trace a single credential from a paste site to a successful account takeover, look at real config files from OpenBullet 2 and SilverBullet, and then walk through the detection signals that actually work in 2026 — not just rate limits.

The Attack Economy in One Diagram

Credential stuffing isn’t a person at a keyboard. It’s a supply chain.

breach dump       (raw combo lists)

combo cleaner     (dedupe, normalize, shard)

checker tool      (OpenBullet 2, SilverBullet)

proxy provider    (residential rotation per request)

target login      (your endpoint)

valid combos      (verified hits)

account market    (resold to ATO operators)

Each layer is specialized and commoditized:

  • Breach dumps are aggregated into “combo lists” — typically email:password or user:password files, sometimes hundreds of millions of lines.
  • Combo cleaners dedupe, normalize, and shard combos by domain hint (@gmail.com, @yahoo.com) so checks can be parallelized.
  • Checkers are configurable HTTP automation tools — OpenBullet 2, SilverBullet, BL Tools, the SentryMBA legacy holdouts.
  • Proxy providers sell residential IPs by GB. Bright Data, IPRoyal, and dozens of grey-market resellers rotate ASN-clean IPs through the checker.
  • Account marketplaces (Genesis-style stores, private TG channels) buy validated valid:hits files and resell access.

The unit economics are brutal: combos cost $5–$20 per million, residential bandwidth runs $3–$15/GB, and a 0.1–0.5% hit rate is profitable when validated streaming, retail, or banking accounts resell for $1–$50 each.

That’s the context. Now the wire.

Anatomy of a Single Login Attempt

Pick any modern login endpoint. From the attacker’s side, here’s what one credential check looks like end-to-end.

Step 1: The Config File

OpenBullet 2 attacks are driven by .opk config files written in LoliCode — a DSL that compiles to C# blocks. A minimal credential-stuffing config for a site with a POST /api/login endpoint and a CSRF token looks like this:

REQUIRE PROXIES
DATA TYPE = CREDENTIALS

# 1. Hit the login page to harvest the CSRF token + session cookie
REQUEST GET "https://target.example/login"
  HEADER "User-Agent: <USERAGENT>"
  HEADER "Accept-Language: en-US,en;q=0.9"

PARSE "<input name=\"csrf_token\" value=\"(.+?)\"" LR -> VAR "CSRF"

# 2. Submit credentials
REQUEST POST "https://target.example/api/login"
  CONTENT "email=<INPUT.USER>&password=<INPUT.PASS>&csrf_token=<CSRF>"
  CONTENTTYPE "application/x-www-form-urlencoded"
  HEADER "Origin: https://target.example"
  HEADER "Referer: https://target.example/login"

# 3. Classify response
KEYCHECK
  KEYCHAIN SUCCESS OR
    KEY "Set-Cookie" Contains "session_id="
    KEY "<SOURCE>" Contains "\"authenticated\":true"
  KEYCHAIN FAIL OR
    KEY "<SOURCE>" Contains "Invalid credentials"
  KEYCHAIN BAN OR
    KEY "<RESPONSECODE>" EqualTo "429"
    KEY "<SOURCE>" Contains "captcha"

A few things to notice:

  1. CSRF and session bootstrapping are handled. Anyone who thinks “we have a CSRF token, we’re fine” is years behind. Configs harvest tokens on the fly.
  2. The KEYCHECK block encodes the entire response taxonomy — success, fail, ban, retry, MFA-challenge, captcha-challenge. Each maps to a different bucket so the operator can post-process.
  3. BAN is a routing decision, not an outage. A banned response just rotates the proxy and replays the combo. Your 429 is the attacker’s continue.

Step 2: The Proxy Layer

The same config runs against a proxy list — usually socks5://user:pass@host:port lines. Modern checkers integrate with residential providers via API and pull a fresh IP per request. The IPs:

  • Belong to consumer ISPs (Comcast, Spectrum, BT, Deutsche Telekom).
  • Rotate ASN and geography in ways that defeat naive IP blocklists.
  • Often share an IP with a legitimate user at the same time — the residential-proxy SDK is bundled in a free VPN or “free game” the homeowner installed.

This is why “block IPs with too many failed logins” stopped working around 2018. The attacker sees one IP per request. You see one request per IP.

Step 3: The Stealth Layer

Higher-effort campaigns don’t even use raw HTTP. They drive a real Chromium through Puppeteer or Playwright with stealth patches, or use Browser-as-a-Service like Browserbase / Hyperbrowser to outsource the fingerprint problem entirely. From your server, you see:

  • A real TLS handshake from a real Chrome build.
  • A full DOM-rendering client that executes your JS challenges.
  • Mouse movement, scroll events, keystroke timings — all generated, often with recorded human traces replayed back.

If your detection stack is “User-Agent + IP rep + rate limit,” the attack is invisible.

Why the Classical Defenses Fail

Let’s enumerate the defenses most teams reach for first, and the specific reason each one degrades against modern tooling.

DefenseWhy it degrades
Rate limiting per IPOne request per residential IP. You’d need to throttle at single-digit-per-IP-per-day to bite, which kills NAT’d users.
Rate limiting per accountEffective for targeted brute force, useless for stuffing — each combo is a different account.
CAPTCHA on loginSolver services (2Captcha, CapSolver) cost $1–$3 per 1000 reCAPTCHA v2, and AI vision now solves most variants without human-in-the-loop.
Geo / ASN blockingResidential proxy pools cover every country and consumer ISP. Geo-blocking your own US users is the only real outcome.
Block known bad UAsConfigs randomize UAs from a curated pool of real Chrome/Firefox strings.
MFAHelps a lot — but doesn’t help validation. Attackers still confirm valid:hits, then sell the credential to phishers who run MFA-bypass kits (Evilginx, Tycoon).

None of these are useless. They just need to be the floor, not the ceiling.

Detection Signals That Actually Work

The signals that survive in 2026 are the ones the attacker can’t cheaply spoof at scale. Roughly in order of cost-to-attacker:

1. TLS Fingerprint (JA4)

A Playwright-driven Chromium has a different TLS ClientHello than a real Chrome from a real desktop. Same advertised version, different cipher suite ordering, different extension list. JA4 captures this in a hashable form.

What to do: log the JA4 of every login request, cluster, and look for clusters that account for an outsized share of failed logins. We covered this in detail in JA4 Fingerprinting Against AI Scrapers — the same playbook applies to login endpoints.

2. HTTP/2 Fingerprint

HTTP/2 settings frames, header order, and pseudo-header order vary by client library. The Go net/http HTTP/2 implementation, Python httpx, and a real Chrome are trivially distinguishable. Akamai’s Akamai-H2 fingerprint and the http2-fingerprint open-source projects formalize this.

A login request whose H2 fingerprint says “Go client” but whose User-Agent says “Chrome 124 on macOS” is automated. Full stop.

3. Combo Replay Detection

This is the highest-signal, lowest-effort detection most teams skip. You don’t need to know the attacker — you need to know the credential.

When a login attempt arrives, hash the username:password pair (with a per-tenant salt) and look it up against:

  • Known breach corpora — Have I Been Pwned’s Pwned Passwords k-anonymity API gives you password-hash hits without ever sending the full password.
  • A short-window seen-cache. If the same (user_hash, password_hash) pair was attempted in the last 24 hours from a different IP/JA4, it’s almost certainly a checker cycling proxies.

This single check catches the bulk of low-effort campaigns and is invisible to the attacker.

4. Pre-Login Behavior Telemetry

Real users land on /login from a referrer, scroll, focus the email field, paste or type, blur, then submit. The whole sequence takes 4–30 seconds. A checker hits /login once for the CSRF token and POST /api/login 200ms later — sometimes from a different proxy.

Useful pre-login signals to capture from the page itself:

  • Time-on-page before submit
  • Field-fill order and inter-keystroke timing
  • Whether the password field was focused via tab vs. click vs. never
  • Whether pointermove events fired between page load and submit
  • Whether the form was submitted via Enter keydown vs. mouse click on the button

We dive into the keystroke side of this in the FCaptcha keystroke biometrics post. The same telemetry pipeline feeds login defense.

5. Endpoint Decoys

Real users don’t fetch /api/login directly — they go through the form. So expose a never-linked, never-rendered endpoint like /api/v1/authenticate-legacy that no human will ever hit, and treat any POST to it as automated. Same idea for hidden form fields named password_confirm that should always be empty on submit.

This is the credential-stuffing analogue of the endpoint and form honeypot patterns used elsewhere on the site.

6. Response Symmetry

Operators rely on response differences to classify attempts. If your 200 OK + "Invalid credentials", 200 OK + redirect to MFA, and 200 OK + session cookie look meaningfully different in size, headers, or timing, you’re feeding the KEYCHECK block.

Make every login response — success, fail, MFA-required, locked, throttled — return the same status code, the same body length (within a small jitter window), and the same baseline timing. Encode the actual outcome in a body the client parses after a server-set cookie or a signed token. The attacker’s checker sees noise; the legitimate browser sees a normal flow.

This one defense alone makes config development dramatically more expensive.

Putting It Together: A Layered Stack for the Login Endpoint

A defense stack that holds up against current tooling looks roughly like this:

  1. Edge: TLS / HTTP/2 fingerprint logged and scored. Drop or challenge requests whose fingerprint cluster is overrepresented in failed logins over the last hour.
  2. Pre-form telemetry: Page-side script captures interaction signals and submits a signed token with the login POST. Missing or replayed tokens fail closed.
  3. Endpoint decoys: A hidden honeypot field and a never-linked auth endpoint, monitored for any traffic.
  4. Combo intelligence: Hash and check (user, password) against breach corpora and a short-term seen-cache. Force a step-up on hits.
  5. Symmetric responses: Identical shape, length, and timing across all login outcomes.
  6. MFA + risk-based step-up: WebAuthn for high-value accounts. TOTP / push for everyone else, triggered by risk score rather than every login.
  7. SIEM correlation: Login telemetry into the same pipeline as the rest of your bot signals so a compromised account that suddenly starts scraping or carding is caught downstream. We walk through that integration in the SIEM bot detection post.

Notice what isn’t on the list: a giant CAPTCHA wall, a bigger IP blocklist, or aggressive rate limits that break NAT’d users. Those are the defenses attackers have already priced in.

Where This Goes Next

Two trends to watch over the next 12 months:

LLM-driven checkers. Instead of hand-written LoliCode configs, operators are starting to use LLM agents that can navigate a login flow, parse the response semantically, and self-heal when the form changes. This collapses the time between a target site shipping a defense and a working bypass. TLS and HTTP/2 fingerprinting hold up here because the LLM still has to make HTTP calls through some runtime — and that runtime has a fingerprint.

Session token harvesting. As MFA adoption rises, the economic value shifts from raw valid:hits to active session cookies. AitM phishing kits like Evilginx and Tycoon already monetize this. Your login defense doesn’t stop the phish, but binding sessions to JA4 + device fingerprint + IP-ASN tuple makes a stolen cookie expire the moment it leaves the victim’s browser.

Conclusion

Credential stuffing is no longer a brute-force problem. It’s a content-delivery problem dressed up as authentication: leaked credentials, residential bandwidth, and stealth automation, delivered to your POST /login at a price the attacker has already optimized.

The good news is that the same asymmetries that make the attack cheap — generic tooling, shared infrastructure, replayed credentials — also make it detectable, if you instrument the right layer. TLS fingerprints, pre-form telemetry, breach-corpus lookups, decoy endpoints, and response symmetry are the pieces of a defense that doesn’t fall over the first time the attacker swaps proxies.

WebDecoy ships these signals (JA4, behavioral telemetry, endpoint decoys, and combo intelligence) as a single layer in front of your login endpoint, so you’re not stitching them together yourself.

If you want to see what hits your /login today, start a free trial and point WebDecoy at it for 14 days. The first surprise is almost always the volume.

Want to see WebDecoy in action?

Get a personalized demo from our team.

Request Demo