Anubis: How the Viral AI Scraper Firewall Actually Works

If you have visited a Linux distro mirror, a GNOME wiki, a Forgejo or Gitea instance, the FFmpeg project, or a long list of small public-interest sites in 2025 and 2026, you have probably seen the same page. A smiling anthropomorphic jackal stares at you with a thumbs up while a spinner counts up to a hundred percent. Then you get into the site like nothing happened.

That jackal is named Anubis. The project is named after the Egyptian deity who weighs the souls of the dead, which is also the slogan its docs cheerfully use. Underneath the cute mascot is a reverse proxy written in Go by Xe Iaso that demands a SHA-256 proof-of-work from any client that looks like a browser before the backend ever sees the request. It launched in 2024, hit 19,000 GitHub stars by early 2026, and is currently running in front of more public infrastructure than most commercial bot vendors will admit.

This piece walks through what Anubis actually does at the protocol level, where the design is genuinely clever, where it leaks, and how it compares to the broader bot detection stack. We will use the v1.25.0 release (codename Necron, shipped February 2026) as the reference point. All of the code references come straight out of the public TecharoHQ/anubis repository.

Why Anubis Exists

The honest framing for Anubis is that it is a response to one specific failure mode in the modern web. The failure mode is this: a small project hosts a wiki or a forge or a documentation site on a single VPS. An AI company’s training crawler discovers the site, decides that every URL is worth fetching, and starts pulling thousands of requests per second across every revision of every page. The origin falls over. The project owner does not have an enterprise contract with Cloudflare or Akamai. They do not have a CDN, a WAF, or a dedicated security engineer. They have a $5 droplet and a Postgres database.

This is the population of operators Anubis is built for. The project’s README is unusually candid about its own positioning:

Anubis is a bit of a nuclear response. This will result in your website being blocked from smaller scrapers and may inhibit “good bots” like the Internet Archive. In most cases, you should not need this and can probably get by using Cloudflare to protect a given origin. However, for circumstances where you can’t or won’t use Cloudflare, Anubis is there for you.

That self-description is doing a lot of work. It is essentially saying that the design tradeoff is “deliberately overbroad protection in exchange for zero ops complexity beyond running another Go binary.” Once you understand that tradeoff, every weird thing about Anubis makes sense.

Architectural Placement

Anubis is a Go binary that runs as a reverse proxy. It does not replace your TLS terminator. The standard production placement is behind a primary edge proxy (nginx, Caddy, Traefik) and in front of the application server, with the edge proxy forwarding all traffic to Anubis on its bound port (default :8923). Anubis evaluates the request and either serves a response itself or proxies upstream to your real backend.

sequenceDiagram
    participant Client
    participant Edge as nginx / Caddy
    participant Anubis
    participant Backend

    Client->>Edge: GET /article (no cookie)
    Edge->>Anubis: Forward request
    Anubis->>Anubis: Match bot policy (Mozilla UA -> CHALLENGE)
    Anubis-->>Client: 200 OK + challenge page (challenge, difficulty)
    Client->>Client: Spawn Web Worker pool, brute-force nonce
    Client->>Anubis: POST /api/pass-challenge (nonce, hash, elapsed)
    Anubis->>Anubis: Recompute SHA-256, verify leading zeros
    Anubis-->>Client: 302 + Set-Cookie techaro.lol-anubis (ed25519 JWT)
    Client->>Edge: GET /article (with cookie)
    Edge->>Anubis: Forward request
    Anubis->>Anubis: Verify JWT signature, check exp/nbf
    Anubis->>Anubis: Random secondary screening?
    Anubis->>Backend: Proxy upstream
    Backend-->>Client: Article HTML

The Request Lifecycle

The decision logic, in order:

  1. The request arrives at Anubis from the edge proxy.
  2. Anubis evaluates the request against the loaded bot policy, top to bottom, looking for the first matching rule.
  3. The rule’s action determines what happens next. Actions are ALLOW, DENY, CHALLENGE, or WEIGH. We will get to each.
  4. If the request was challenged and a techaro.lol-anubis cookie is present, Anubis verifies the cookie’s ed25519-signed JWT, checks exp and nbf, and optionally re-validates the proof-of-work via a probabilistic secondary screening pass. If the cookie is missing, expired, or fails validation, Anubis serves the proof-of-work page.
  5. If the action is ALLOW, Anubis proxies the request to the configured upstream untouched.
  6. If the action is DENY, Anubis serves an error page styled to look like the upstream returned a page, with HTTP status 200 by default. More on that choice in a moment.

The cookie is the actual gate. The proof-of-work is just the price of admission for getting the cookie. Once you have a valid cookie, you can scrape the site for the cookie’s lifetime, which is seven days by default. This matters for the threat model, and most discussion of Anubis ignores it.

Random secondary screening

The lifecycle above hides one detail that does meaningful work against cookie-replay attacks. After Anubis confirms a cookie’s signature and expiry, the request hits a probabilistic re-validation gate that the project’s design doc calls “secondary screening.” A configurable fraction of cookie-bearing requests get bounced back through a fresh proof-of-work even if their cookie is valid. The intuition is exactly the same as airport security: most travelers walk through, but random pulls catch the cases that a static check cannot.

This is a small but important detail. A scraper that mints a single cookie and replays it across ten thousand requests will, on average, hit the secondary screen on some predictable fraction of those requests. If the scraper does not have a PoW solver wired in, those requests fail and the scrape stalls. The screen rate is tunable per rule, which means high-value content can be configured to re-challenge aggressively while low-value content stays cookie-only.

The Proof-of-Work, In Detail

The challenge is conceptually identical to Hashcash and to any other SHA-256 leading-zero proof-of-work scheme. The client receives a random challenge string, picks an integer nonce, computes SHA-256(challenge + nonce) as a hex string, and submits the result. The server verifies that the response has the configured number of leading hexadecimal zeros and matches the recomputed hash.

Here is the verification logic, lifted directly from the Anubis source (lib/challenge/proofofwork/proofofwork.go):

calcString := fmt.Sprintf("%s%d", challenge, nonce)
calculated := internal.SHA256sum(calcString)

if subtle.ConstantTimeCompare([]byte(response), []byte(calculated)) != 1 {
    return chall.NewError("validate", "invalid response",
        fmt.Errorf("%w: wanted response %s but got %s",
            chall.ErrFailed, calculated, response))
}

if !strings.HasPrefix(response, strings.Repeat("0", rule.Challenge.Difficulty)) {
    return chall.NewError("validate", "invalid response",
        fmt.Errorf("%w: wanted %d leading zeros but got %s",
            chall.ErrFailed, rule.Challenge.Difficulty, response))
}

Two technical details that are worth pulling out, because they have practical consequences.

The difficulty is in hex characters, not bits. The DefaultDifficulty constant in anubis.go is set to 4. That looks alarming if you are used to Bitcoin-style PoW where difficulty is measured in bits, because 4 bits of work is essentially free. Anubis is asking for 4 leading hex characters of zero, which is 16 bits, which is about 65,536 expected hashing attempts. That is a few hundred milliseconds on a modern x86 core, a second or two on a phone with weak single-thread performance, and roughly six orders of magnitude shy of what an attacker would care about if they had a GPU. We will come back to this.

The response is the full hash, not just the nonce. Most Hashcash-style implementations send only the nonce and let the server recompute the hash. Anubis sends both. The server still recomputes and constant-time compares, so this is not a security issue, but it does mean the client side is shipping a 64-character hex string per challenge.

The client-side solver

The browser-side solver lives in web/js/algorithms/fast.ts. It is the part of Anubis that runs while you are staring at the jackal. The implementation is more sophisticated than the verifier suggests.

The default algorithm spawns a worker pool sized to navigator.hardwareConcurrency / 2, rounded down, minimum one. Each worker takes a starting nonce and a stride equal to the number of workers, so worker zero tries nonces 0, N, 2N, worker one tries 1, N+1, 2N+1, and so on. The workers post intermediate progress back to the main thread (that is what fills the progress bar) and the first worker to find a winning nonce posts the result. The main thread then terminates the rest of the pool and submits the response.

There is a subtlety in the algorithm selection logic that is worth knowing about if you are debugging slow Anubis pages on Firefox:

if (
  navigator.userAgent.includes("Firefox") ||
  navigator.userAgent.includes("Goanna")
) {
  console.log("Firefox detected, using pure-JS fallback");
  workerMethod = "purejs";
}

In secure contexts (https), Anubis would otherwise use a WebCrypto-backed worker that calls crypto.subtle.digest('SHA-256', ...) for each attempt. On Firefox, that path is slower than a pure-JS SHA-256 implementation because of how SubtleCrypto’s promise machinery interacts with tight loops, so Anubis falls back to a hand-rolled JS hash. This is a deliberate optimization. It is also one of the reasons Firefox users sometimes report Anubis pages taking visibly longer than Chrome users on the same site.

What that difficulty actually costs

Let’s put numbers on it. With the default difficulty of 4 hex zeros and a four-thread worker pool on a recent laptop:

HardwareMedian solve time at difficulty 4
M2 MacBook Air, 4 workers100 to 250 ms
Ryzen 5950X desktop, 4 workers60 to 150 ms
Pixel 7, 2 workers400 to 900 ms
Low-end Android (Helio P22, single core after browser throttling)2 to 6 seconds
Headless Chromium on a $4/month VPS, 1 worker800 ms to 2 s

The math of difficulty

Difficulty in Anubis is measured in leading hexadecimal zero characters. Each hex character is 4 bits. The expected number of hashes you need to find a passing result is 16^difficulty. The scaling is exponential, which is what makes the difficulty knob load-bearing.

DifficultyRequired leading zerosExpected hashesMedian time, 4-worker laptop
3000~4,09610 to 30 ms
4 (default)0000~65,536100 to 250 ms
500000~1,048,5761.5 to 4 s
6000000~16,777,21625 to 60 s
70000000~268,435,4567 to 15 minutes

Moving from difficulty 4 to difficulty 5 does not double the work. It increases it by sixteen times. From 4 to 6 is a 256x increase. This is what people mean when they say “just turn the difficulty up” and immediately regret it. Difficulty 6 is unusable as a default for any audience that includes phones. The Anubis bot policy ships with a deliberately mean rule that uses difficulty 16 and an algorithm explicitly named slow, intended for known-bot user agents only:

- name: generic-bot-catchall
  user_agent_regex: (?i:bot|crawler)
  action: CHALLENGE
  challenge:
    difficulty: 16   # impossible
    algorithm: slow  # intentionally waste CPU cycles and time

16^16 is approximately 1.8 * 10^19. That is not a number any real client is supposed to finish in a human lifetime. It is a polite way to say “no” while still pretending to challenge.

Browser solvers versus native solvers

The asymmetry that Anubis advertises (legitimate user pays once, scraper pays ten thousand times) assumes both parties run the same solver. That assumption is wrong, and it is worth being explicit about why.

The browser solver runs JavaScript SHA-256 inside Web Workers, with a fallback to crypto.subtle.digest() on secure contexts that are not Firefox. The fastest mode is a hand-rolled pure-JS SHA-256 running across navigator.hardwareConcurrency / 2 workers. On a modern laptop that delivers somewhere around 1 to 3 million hashes per second across the pool.

A native solver written in C, Go, or Rust against an optimized SHA-256 library (Intel SHA-NI, ARMv8 cryptographic extensions, or OpenSSL’s hand-tuned assembly) delivers 100 to 500 million hashes per second on a single modern CPU core. That is two to three orders of magnitude faster than the browser pool. A GPU implementation lifts that ceiling another three orders of magnitude on top, into the billions of hashes per second.

The practical consequence is that a scraper operator who bothers to write a tiny solver in their language of choice can mint Anubis cookies in single-digit milliseconds per cookie at default difficulty. The asymmetric tax is real, but it is not “user pays 200ms, scraper pays 200ms times the number of pages.” It is closer to “user pays 200ms once per week, scraper pays 2ms once per week, and then they both walk through the door for seven days.” Framing Anubis as a tax that filters lazy scrapers from determined ones is more accurate than framing it as a wall. Anubis’s own design doc tacitly acknowledges this by calling the PoW a “placeholder.”

On its face that does not sound like a lot. The honest answer is that it is not, and Anubis does not pretend otherwise. The official design doc is unusually direct about this:

Ultimately, this is a hack whose real purpose is to give a “good enough” placeholder solution so that more time can be spent on fingerprinting and identifying headless browsers (EG via how they do font rendering) so that the challenge proof of work page doesn’t need to be presented to known legitimate users.

The proof-of-work is a stalling tactic that buys Anubis the chance to do real work in the future. Today, the real work is the bot policy.

The Bot Policy Language

The bot policy is where Anubis stops being a generic PoW system and starts being an opinionated bot firewall. The default policy lives in data/botPolicies.yaml and is mostly composed of imports from other files. Each rule has a name, a match condition, and an action.

A minimal policy from the docs:

bots:
  - name: cloudflare-workers
    headers_regex:
      CF-Worker: .*
    action: DENY
  - name: well-known
    path_regex: ^/.well-known/.*$
    action: ALLOW
  - name: favicon
    path_regex: ^/favicon.ico$
    action: ALLOW
  - name: robots-txt
    path_regex: ^/robots.txt$
    action: ALLOW
  - name: generic-browser
    user_agent_regex: Mozilla
    action: CHALLENGE

Read top to bottom, that says:

  • Anything with a CF-Worker header gets denied. This blocks abuse via Cloudflare Workers as cheap scraping infrastructure.
  • .well-known, favicon.ico, and robots.txt are always allowed because breaking them breaks the internet.
  • Anything else with Mozilla in its user agent gets the challenge.

The four actions are:

ActionWhat happens
ALLOWProxy to upstream, skip all further checks
DENYServe an error page that looks like success, status 200 by default
CHALLENGEShow proof-of-work page or validate the existing cookie
WEIGHAdd or subtract from a request weight score, then continue evaluating rules

WEIGH is the most recent and most interesting addition. Instead of branching to a terminal action, it adjusts a running score that other rules can consume. The default config uses it to penalize requests that hit known abusive ASNs, that come from countries with aggressive scraper populations, or that arrive while the system load is high. Once the score crosses a threshold, the request gets challenged or denied. This is Anubis quietly evolving toward request scoring, which is the same architecture every commercial bot vendor eventually lands on.

What the default policy actually blocks

The default policy imports (data)/meta/ai-block-aggressive.yaml, which in turn pulls in a denylist that reads like a who’s who of AI training crawlers. From data/bots/ai-catchall.yaml:

AI2Bot, Ai2Bot-Dolma, aiHitBot, Amazonbot, anthropic-ai,
Brightbot 1.0, Bytespider, Claude-Web, cohere-ai,
cohere-training-data-crawler, Cotoyogi, Crawlspace, Diffbot,
DuckAssistBot, FacebookBot, Factset_spyderbot, FirecrawlAgent,
FriendlyCrawler, Google-CloudVertexBot, GoogleOther,
GoogleOther-Image, GoogleOther-Video, iaskspider/2.0,
ICC-Crawler, ImagesiftBot, img2dataset, imgproxy,
ISSCyberRiskCrawler, Kangaroo Bot, meta-externalagent,
Meta-ExternalAgent, meta-externalfetcher, Meta-ExternalFetcher,
NovaAct, omgili, omgilibot, Operator, PanguBot,
Perplexity-User, PerplexityBot, PetalBot, QualifiedBot, Scrapy,
SemrushBot-OCOB, SemrushBot-SWA, Sidetrade indexer bot,
TikTokSpider, Timpibot, VelenPublicWebCrawler, Webzio-Extended,
wpbot, YouBot

Every one of those is matched as a substring of the user agent and gets a hard DENY. That includes OpenAI’s Operator, Anthropic’s Claude-Web and anthropic-ai, Perplexity’s two crawlers, Meta’s external agents, ByteDance, Amazon, Cohere, and several lesser-known AI sourcing crawlers. The list is maintained by hand and synced from the community-curated ai.robots.txt project.

This is the most boring and most important part of Anubis. The proof-of-work gets the press, but the denylist is what actually blocks most of the AI traffic for sites that deploy default settings, because the AI crawlers identify themselves honestly in their user agents. Anubis is, in practice, doing two jobs: a known-bot denylist for the well-behaved bad actors, and a proof-of-work tax for everyone else.

The complementary allowlist

The default policy also imports (data)/crawlers/_allow-good.yaml, which allowlists the search engines that Anubis explicitly does not want to challenge. This list, current as of v1.25.0, covers Google, Apple, Bing, DuckDuckGo, Qwant, the Internet Archive, Kagi, Marginalia, and Mojeek. The allowlist matches user agent strings and, where the search engine publishes its IP ranges, also enforces the IP range match. The Qwant rule in the docs is a good example:

- name: qwantbot
  user_agent_regex: \+https\://help\.qwant\.com/bot/
  action: ALLOW
  remote_addresses: ["91.242.162.0/24"]

The reason Google specifically is on the allowlist is interesting and a little uncomfortable. The policy file’s own comments admit it: “Search engine crawlers to allow, defaults to: Google (so they don’t try to bypass Anubis).” Translation: if you block Googlebot, Google will treat your site as if it doesn’t exist. So you let them through whether you trust them or not. This is a perennial tension in any anti-bot system, and it is essentially impossible to resolve without one of the big search engines committing to a verifiable signed bot identity scheme, which they have so far declined to do.

This is the part that most coverage of Anubis misses, and it is what makes the system actually usable.

Once a client passes the proof-of-work, Anubis sets a cookie named techaro.lol-anubis. From the constants in anubis.go:

var CookieName = "techaro.lol-anubis"
const CookieDefaultExpirationTime = 7 * 24 * time.Hour

The cookie contains a signed JWT. Every subsequent request from that client, for seven full days, simply presents the cookie and skips the challenge entirely. This is what makes the user experience tolerable. Without the cookie, you would be solving a challenge on every page load. With the cookie, you solve one challenge per week per browser per origin.

The JWT

The token is a JWT signed with an ed25519 keypair. The choice of ed25519 (rather than the more common HMAC or RSA) is deliberate: signatures are 64 bytes, verification is fast, and the curve has no known weaknesses that anyone is shipping to production. The claims, lifted from the design doc:

ClaimMeaning
challengeThe challenge string derived from request metadata
nonceThe integer nonce the client found
responseThe full SHA-256 hash that satisfies the difficulty
iatIssued-at timestamp
nbfNot-before, set to one minute prior to iat
expExpiry, set to seven days after iat by default

The nbf field with a one-minute backdate is a small but pragmatic detail. It absorbs clock skew between the Anubis instance that issued the cookie and any peer that later validates it. Without it, a client whose system clock runs slightly behind would fail validation immediately.

The cookie is configured without the kinds of identifiers that trigger an EU cookie banner. It does not contain a user identifier, does not contain tracking data, and is scoped to the host that issued it. The project’s docs are explicit that operators should still consider their own jurisdictional disclosures.

The ephemeral keypair problem

Here is a detail that bites in production. From the Anubis design doc:

Anubis uses an ed25519 keypair to sign the JWTs issued when challenges are passed. Anubis will generate a new ed25519 keypair every time it starts. At this time, there is no way to share this keypair between instances of Anubis.

This has two operational consequences. First, every restart of Anubis invalidates every outstanding cookie. Users who were happily cached for the week now solve a fresh challenge after a routine deploy. Second, you cannot horizontally scale Anubis behind a load balancer in the default configuration, because instance A cannot verify a JWT that instance B issued. Run more than one Anubis container and a request can round-robin between them, see its cookie rejected, and serve a challenge to a user who already solved one fifteen seconds earlier. Anubis’s roadmap calls this out as future work. For now, the realistic deployment is one Anubis instance per service, with a persistent storage backend (bbolt, valkey, or s3api) for challenge state but an ephemeral key for signing.

Threat modeling consequences

For threat modeling purposes, the cookie is the actual security boundary. The proof-of-work is solved once. Everything after that is “do you have a valid week-long token.” That has three concrete consequences.

A single scraper that bothers to solve the PoW once gets a week of free crawling per IP. If you can solve a difficulty-4 hash in milliseconds with a native solver, you can mint a fresh cookie at any point during the week and crawl freely. The proof-of-work is not a per-request tax. It is a per-week-per-browser tax. Secondary screening blunts this but does not eliminate it.

Cookie management is the actual attack surface for sophisticated scrapers. Modern crawlers that target Anubis-protected sites generally do not solve the PoW from scratch on every request. They run a real headless Chromium, solve the challenge once, capture the cookie, and reuse it across thousands of subsequent fetches. From an HTTP/2 fingerprinting standpoint they look identical to a legitimate Chromium for the lifetime of the cookie.

The defense scales with cookie revocation, not with PoW difficulty. The right knob to turn when Anubis lets a scraper through is the cookie TTL and the per-IP / per-fingerprint rules that determine when a cookie is considered tainted. Cranking the proof-of-work difficulty up to 6 or 8 makes the user experience miserable while doing essentially nothing to a scraper that solves once per week.

You can change the cookie expiration via the cookieExpiry configuration. You probably should, if you are deploying Anubis seriously. Twenty-four hours is a reasonable starting point for higher-value content, and one hour is reasonable for content where you would rather impose a UX tax than let scrapers through. The default of a week is a UX-friendly setting that biases toward false negatives.

The HTTP 200 Trick

This is one of those design choices that looks like a bug and is actually load-bearing. From the default policy:

status_codes:
  CHALLENGE: 200
  DENY: 200

When Anubis serves the challenge page or the denial page, it sends them with HTTP 200 OK by default. Not 401, not 403, not 503, not 429. A 200 OK with a page that is, semantically, neither the requested resource nor an error.

The reason is in the comment immediately above:

By default, send HTTP 200 back to clients that either get issued a challenge or a denial. This seems weird, but this is load-bearing due to the fact that the most aggressive scraper bots seem to really, really, want an HTTP 200 and will stop sending requests once they get it.

This is a very specific empirical observation about scraper behavior. Naive scrapers retry aggressively on non-200 status codes because they assume the server is rate limiting or having transient issues. They do not retry on 200 because, as far as their state machine is concerned, the page loaded successfully. They take the HTML, hand it to their downstream pipeline, and move on. The fact that the HTML is a soul-weighing jackal instead of the article they wanted is not a problem they have any way to detect.

This is the kind of thing you can only learn by running a system like Anubis at scale against real adversaries and watching the logs. It is also a fragile trick. The moment a scraper operator adds a content check (“does this page contain the words I expected”) it stops working. But until they do, you get the benefit, and the well-behaved automated systems (real search engines, RSS readers, monitoring tools) all handle 200 responses cleanly regardless.

Dataset Poisoning Mode

A feature that landed in preview during 2025 and is worth knowing about even if you do not enable it: Anubis can serve poisoned data to clients it considers bots. The admin docs describe it as Anubis’s take on the same active-defense pattern that tools like iocaine and Nepenthes use.

The mechanism is simple. When a client matches certain weight thresholds (default: 25 prior hits in the poisoning maze on the same User-Agent and 25 hits on the same network-clamped IP, where IPv4 is clamped to /24 and IPv6 to /48), Anubis adds 30 weight points and starts including a small block of invisible HTML in the challenge and error pages. The HTML contains links that browsers ignore but scrapers follow. Those links lead into a recursive maze of cheap, vacuous, self-contained pages.

The content of those pages is what makes the feature stand out. Anubis uses spintax (a syntax for generating variant strings, originally from email spam tooling) to produce what the docs describe as “vapid LinkedIn posts with some western occultism thrown in for good measure.” One example from the docs:

There’s a moment when visionaries are being called to realize that the work can’t be reduced to optimization, but about resonance. We don’t transform products by grinding endlessly, we do it by holding the vision. Because meaning can’t be forced, it unfolds over time when culture are in integrity. This moment represents a fundamental reimagining in how we think about work.

This is, in the project’s own words, “pseudoprofound anti-content.” The design principles for the poisoned pages are explicit:

  • Each page must render in under ten milliseconds on commodity hardware.
  • Pages must be vacuous enough that a human bounces but a scraper does not.
  • Pages must be large enough that a scraper does not classify them as empty errors.
  • Pages must be fully self-contained (no external resources) to load fast and not incur additional load on the origin.

The implementation is in preview and has acknowledged limitations. All Anubis instances generate poisoning data the same way, which makes the pattern fingerprintable by a sufficiently motivated adversary. The poisoning routes are also currently nailed to the /.within.website/x/cmd/anubis URL hierarchy, which is even easier to fingerprint. The roadmap promises WebAssembly-customizable generation logic, configurable weight thresholds, and the ability to use it as a real per-deployment data poisoning system rather than a shared one.

Strategically, this is the part of Anubis that is most aligned with the active-defense / deception philosophy that drives honeypot-based bot detection. A pure firewall says “no.” A poisoning layer says “yes, here is what you asked for,” and then waits to see what the scraper does with it. The two approaches are complementary: Anubis uses the firewall path to block confident negatives and the poisoning path to punish persistent abusers without them realizing they have been caught. If you are interested in this pattern more broadly, our writeup on honeypot strategies for AI bots goes deeper on placement and detection logic.

Where Anubis Works

The setup where Anubis genuinely shines is exactly the one it was built for. Concretely:

  • A single-origin web service running on commodity hosting, not behind Cloudflare or Fastly.
  • Public content that the operator wants humans to read but does not want AI training pipelines to ingest.
  • A small or medium traffic volume where every doubling of requests is a real cost problem.
  • A maintainer population that is not a security team and does not want a dashboard.

Wikis, public-interest documentation, source code forges (Gitea, Forgejo, GitLab self-hosted), code mirrors, distro package indexes, FFmpeg’s documentation, and KDE’s wiki are all in this bucket. None of those operators can afford a six-figure WAF contract, and none of them can absorb a thousandfold increase in request volume from a single AI crawler.

In this profile, Anubis is genuinely valuable. The “nuclear response” framing is appropriate because the alternative is the origin going down. A small amount of friction for legitimate readers is much better than the site being unreachable.

The viral adoption story makes sense in this light. Anubis caught on through 2024 and 2025 partly because Xe Iaso shipped it MIT-licensed and partly because the use case was real. The Patreon-funded support model is unusual but consistent with the audience: people who run small sites on small budgets, plus a few larger organizations that adopted it after their own ops engineers got tired of fighting scrapers.

Where Anubis Breaks

The places Anubis breaks are not subtle, and most of them are acknowledged in the project’s own docs. Listing them is not a takedown; it is what you need to know to deploy it responsibly.

Determined scrapers solve the challenge once

The PoW is a per-cookie tax, not a per-request tax. Any scraper that integrates a SHA-256 solver and a cookie store, which is approximately ten lines of Python around requests plus a worker that calls hashlib, can solve the default difficulty-4 challenge in well under a second on a single core and then crawl freely for a week. The economics of mass scraping are unchanged in this case. The only thing the scraper has to do is care.

Most do not care today, which is the empirical reason Anubis works. That gap closes the moment the major AI training data vendors decide it is worth a sprint to add SHA-256 solvers to their crawl infrastructure. Some have already done this on specific high-value targets.

GPU solvers are not even theoretical

SHA-256 is exactly the workload GPUs are built for. A consumer GPU does on the order of a billion hashes per second on Bitcoin-style mining loads. A difficulty-4 Anubis challenge is 65,000 expected attempts. A GPU solves it in 65 microseconds. Even if you crank Anubis to difficulty 7 (which would make legitimate phones unusable), you are at sixteen million attempts, or sixteen milliseconds on a GPU. The asymmetry is not in the defender’s favor for any well-resourced adversary, and there is no PoW difficulty that fixes that.

The reason this rarely matters in practice is that mass scraping operations are not running GPU pools today. They are running CPU instances on cloud providers because CPU is cheaper per unit of HTTP throughput. The threat model could shift as scraper budgets grow.

Headless browsers pass cleanly

A headless Chromium running a real V8 engine and a real DOM can solve the proof-of-work using the exact same web workers that Anubis is shipping to legitimate users. There is no fingerprinting layer in the default Anubis configuration that distinguishes “Puppeteer driving Chromium” from “Linda’s laptop running Chromium.” The proof-of-work is not a humanity test; it is a “is anyone doing the work” test. Headless Chromium does the work.

This is the gap that fingerprinting layers (TLS JA3/JA4, Canvas, font-rendering, behavioral) are designed to fill. Anubis acknowledges this in the design doc when it says the PoW is a placeholder while real fingerprinting is built. As of v1.25.0, that fingerprinting layer is incomplete.

The Firefox / Linux / hardened-browser false positive rate

Anubis’s default generic-browser rule is user_agent_regex: Mozilla, which catches roughly every real browser. The challenge page assumes Web Workers, modern JS, third-party scripts, and (in the WebCrypto path) a secure context. Most of that works fine for stock Chrome and Safari users. Edge cases that legitimately fail or perform badly include:

  • Tor Browser with strict-mode privacy settings disabling Web Workers.
  • Hardened Firefox profiles that block crypto.subtle in non-https subframes.
  • Old Android browsers without proper navigator.hardwareConcurrency.
  • Text-mode browsers (w3m, lynx, Browsh) that have no Web Worker support at all.
  • Screen readers paired with browsers that strip the challenge JavaScript.

The Anubis team has been steadily widening browser support, and most of these cases have workarounds in the policy file (specific allow rules by user agent), but you have to know to add them. Out of the box, Anubis is hostile to a non-trivial slice of users.

Anything that fetches HTML over HTTP and does not run JavaScript is going to fail the challenge by default. RSS feeds typically live at predictable paths, so you can carve out ^/feed/?$ or ^/rss/?$ as ALLOW rules. Podcast XML and OPML are similar. Link previews from Slack, Discord, Mastodon, and the various Open Graph fetchers are harder because they hit arbitrary paths.

Anubis ships an openGraph passthrough configuration that detects link-preview requests and serves OG metadata directly without challenging. It is not enabled by default. If you are deploying Anubis on content that gets shared on social platforms, you almost certainly want to enable it, or your shared links will all preview as a smiling jackal.

Accessibility

The challenge page is visual. A screen reader will announce “Making sure you’re not a bot” and a percentage that ticks up. That is workable but not great. There is no audio fallback, no alternative challenge mode, and no clear path for users on assistive tech who happen to land on a wrong-difficulty rule. The Anubis maintainers are aware of this and have made improvements, but accessibility is not the project’s strong suit.

Operating a reverse proxy is itself the cost

Anubis is a Go binary you run alongside your existing reverse proxy. That sounds simple until you account for:

  • Storage backend choice (memory does not work across multiple instances, and the ephemeral signing key makes horizontal scaling impractical).
  • Cookie domain handling for multi-subdomain deployments.
  • TLS termination layering with nginx or Caddy in front.
  • Bot policy YAML changes requiring a reload.
  • Upgrade handling, since the cookie JWT format can change across major versions.

None of those are hard problems for an experienced operator. All of them are real problems for the “tiny project on a single VPS” population that Anubis is built for. The deployment guide handles most of it, but you are still adding a service to your stack, and that service is on the critical path for every request.

Observability

Anubis ships a separate Prometheus metrics server, bound by default to a different port from the main proxy. It exports counters for challenges issued, challenges passed, challenges failed, requests allowed, requests denied, and a histogram for client-reported solve times broken down by algorithm. The metrics server also exposes /healthz for liveness probes and the standard Go pprof routes for profiling. If you are running Anubis in production, scraping these metrics is the only way to know whether your difficulty setting is calibrated and whether you are under an active scraping attack. The histogram for solve time is the most useful real-time signal: a sudden bimodal distribution (one cluster at human-laptop speeds and a second cluster at near-zero) is the fingerprint of a native solver attacking your origin.

Branding, BotStopper, and the Sustainability Model

Anubis ships with a smiling chibi anthropomorphic jackal as its default mascot, drawn in an anime-influenced style. The character has shown up in commentary as a feature, a bug, and a meme depending on the audience. It is worth covering as a technical decision rather than an aesthetic one, because the project has built a sustainability model on top of it.

The default Anubis distribution does not let you change the mascot. You can change the CSS surrounding the challenge page, you can localize the text, you can re-skin the spinner, but the jackal stays. The project’s commercial side, BotStopper, is the supported path to remove the mascot and customize the branding. BotStopper is available via GitHub Sponsors at $50 per month or via direct invoicing for enterprise contracts. It ships as a separate container image (ghcr.io/techarohq/botstopper/anubis) that builds on the open-source core and adds custom images, custom CSS, custom titles, and a private bug tracker. The roadmap promises an unreleased “private challenge implementation that does advanced fingerprinting to check if the client is a genuine browser.”

This is not a license restriction in the open-source sense. The MIT license still applies and you are free to fork and re-skin yourself. What you cannot do is use the official Anubis distribution with your own branding. The model is closer to “open core with a paid premium tier” than to a copyleft constraint, and it doubles as a soft funding lever for a project that has zero corporate backers and one maintainer on Patreon.

For operators evaluating Anubis in an enterprise context, this is a meaningful detail. Some organizations will not deploy software whose default error pages include a mascot they cannot control. BotStopper is the answer to that audience. For everyone else, the jackal is part of the deal, and that is fine.

How This Compares to the Alternative Approaches

Anubis sits at one particular point in the design space for self-hostable bot detection. It is useful to be specific about where the other points are.

At a glance

DimensionAnubisCloudflare / Enterprise WAFreCAPTCHA / hCaptchaFCaptcha
DeploymentSelf-hosted reverse proxyVendor edgeThird-party JSSelf-hosted library
PrivacyHigh (no traffic leaves your infra)Low (all traffic transits vendor)Low (Google or Intuition tracking)High (no third-party calls)
User interactionBackground PoW, no interactionVariable (challenges, captchas)High (click, image grid)Background PoW + invisible biometrics
Bypass costMedium (native solvers, headless browsers)High (requires IP rotation + fingerprint spoofing)Low ($1-3 per 1000 via solver APIs)Medium-high (defeats both solvers and naive headless)
Host costLow (one Go binary)High (enterprise tier pricing)Low (SaaS, free tier)Low (open source, self-hosted)
Behavioral signalsNone todayExtensiveLimitedKeystroke cadence, mouse, focus events
Active defenseDataset poisoning (preview)NoneNoneHoneypot fields, decoy endpoints
Open sourceYes, MITNoNoYes, MIT
Scales horizontallyLimited (ephemeral signing key)Yes (vendor edge)Yes (vendor edge)Yes (stateless library)

The pattern that falls out of the table is that Anubis and FCaptcha are the two open self-hostable systems, and they cover different parts of the workflow. Cloudflare and reCAPTCHA are vendor systems that trade privacy and operational independence for scale.

Versus Cloudflare Bot Fight Mode / Turnstile

Cloudflare offers Anubis-style protection as a checkbox in their dashboard. They have the largest training set of bot behavior in the industry, they sit on roughly twenty percent of web traffic, and they can do TLS, HTTP/2, and behavioral fingerprinting at the edge. The asymmetry that Cloudflare has, that nobody else can match, is volume. They see your scraper’s last thousand requests across a thousand other sites and can score them in real time.

Anubis cannot do any of that. It sees only your traffic. The flip side is that Anubis does not require routing your traffic through someone else’s edge, it does not require trusting Cloudflare with the plaintext of every request, it does not silently degrade for users on Tor or VPNs the way Cloudflare does, and it does not break when Cloudflare has an outage that takes down half the internet for ninety minutes.

The honest framing is that they are for different audiences. If you can use Cloudflare, you probably should. If you cannot or will not, Anubis is one of the few credible options.

Versus FCaptcha

FCaptcha is also open-source self-hostable bot detection. We build it, so we have an obvious bias, and the comparison is worth being precise about.

The architectural difference is that Anubis is a reverse proxy and FCaptcha is a library. Anubis sits in front of your service and intercepts every request. FCaptcha is installed into your application code and only runs on the endpoints you care about. That distinction has knock-on effects:

  • Anubis defaults to challenging every page. FCaptcha defaults to challenging only the high-value endpoints (signup, login, forms, API calls).
  • Anubis ships a single proof-of-work and a denylist. FCaptcha combines a proof-of-work, behavioral biometrics (keystroke cadence, mouse movement, focus events), and honeypot fields into a single risk score.
  • Anubis stores a cookie that grants a week of free passes. FCaptcha re-evaluates on every protected request, so a scraper that solves one challenge does not get a week of access.
  • Anubis runs as a separate Go service. FCaptcha runs in-process in Node.js, Go, or Python and ships with reference server SDKs in each.

For “I have a wiki being crushed by AI crawlers and no security team,” Anubis is the right tool. The reverse-proxy model is simpler than threading a library through every page render. For “I have a signup form being credential-stuffed and I need behavioral signals plus PoW plus honeypots in one place,” FCaptcha is the right tool. They are not direct competitors; they are at different points on the deployment-complexity / signal-richness curve.

Versus fingerprinting-only stacks

There is a class of self-hosted projects that try to replace PoW entirely with fingerprinting (TLS JA3/JA4, HTTP/2 frame analysis, Canvas, font rendering). These work well against scripted scrapers and badly against headless browsers, because a real headless Chromium has a TLS fingerprint identical to a real Chromium.

The honest answer is that fingerprinting and proof-of-work are complementary. Anubis acknowledges this in its own design doc. Fingerprinting tells you “this client looks suspicious.” Proof-of-work makes that client pay to prove it is not just suspicious in volume. The best self-hostable stack combines both. Anubis today does only the second half.

Should You Deploy Anubis

The deployment decision comes down to a small number of questions.

  • Is your origin being crushed by AI scraper traffic? If yes, and you cannot fix it with Cloudflare, Anubis is the most credible self-hostable option available.
  • Is your site primarily content for human readers? Anubis tolerates that workload well. Cookie reuse across a session means the friction is bounded.
  • Is your site primarily conversion-driven (signup, checkout, lead capture)? Anubis is the wrong tool. The cold-start friction is a conversion problem, and you want a library-style solution that only fires on the endpoints that matter.
  • Do you have a non-trivial population of users on Firefox, Tor, RSS readers, or mobile devices with low CPU? Plan to spend time tuning the policy file before you ship.
  • Are you operationally able to run another Go service in production? If not, look at hosted alternatives or library-style solutions like FCaptcha first.

A reasonable migration path for many small-site operators is: start with Anubis defaults, watch the metrics, and progressively allowlist the legitimate traffic that is getting caught. The bot policy YAML is designed for this iteration loop and the documentation is good. Just do not assume default settings are the right settings for your audience.

Where Anubis Is Going

The project is moving in the directions you would expect. The WEIGH action that landed in late 2025 is a clear signal that Anubis is converging toward a request-scoring architecture rather than a single-rule firewall. The Thoth service (Techaro’s paid hosted side, mostly GeoIP and ASN data) is doing the work of distributing the kinds of reputation data that an open-source project cannot maintain on its own. The default policy now includes load-based difficulty adjustment, which is the right knob if you want the friction to be lowest when the system is calm and highest when it is being abused.

What is not on the public roadmap, and probably never will be in a way that matters, is closing the headless-browser gap with TLS or JS-engine fingerprinting. That work is genuinely hard, requires constant maintenance against drift in the underlying browsers, and is the kind of thing that commercial vendors invest in because they have customers paying for it. The Anubis answer to “what if the scraper uses a real Chromium” is going to remain “make the proof-of-work expensive enough to hurt at volume and rely on the cookie expiry to limit the damage.”

Closing

Anubis is real software that solves a real problem for a real audience. It is also, by its author’s own admission, a nuclear response. The interesting thing about it is not the proof-of-work, which is conventional Hashcash, but the design choices around the proof-of-work: the seven-day cookie, the HTTP 200 lie, the YAML policy DSL, the deliberate decision to focus on small operators rather than enterprise edge cases.

If you operate a public web service and are tired of AI crawlers eating your bandwidth, Anubis is worth a serious afternoon of reading and a careful weekend of deployment. If you operate a conversion-driven site, you want a different tool, and an honest assessment of Anubis’s tradeoffs is the fastest way to figure out you need it.

If you want a more behavioral, library-style approach with PoW, biometrics, and honeypots in one package, take a look at FCaptcha and the broader WebDecoy platform. The two systems solve adjacent problems and we will happily tell you which one fits your situation.

Related reading:

Want to see WebDecoy in action?

Get a personalized demo from our team.

Request Demo