How to Detect Browser-as-a-Service Scrapers in 2025
Technical guide to detecting Browserbase, Skyvern, and stealth AI agents. TLS fingerprinting, behavioral analysis, and honeypots beat evasion tactics.
WebDecoy Team
WebDecoy Security Team
How to Detect Browser-as-a-Service Scrapers in 2025
Browserbase just raised $40 million at a $300 million valuation. Their pitch to developers? Run thousands of headless browsers in the cloud with “stealth mechanisms to avoid bot detection.” They are not alone.
A new category of infrastructure has emerged: Browser-as-a-Service (BaaS). These platforms provide cloud-hosted Chromium instances specifically designed to evade detection. They rotate residential IPs, spoof user agents, strip automation markers, and patch JavaScript APIs. Their entire value proposition is making your bot detection obsolete.
The market is exploding. Browserbase has 20,000+ developer signups running 50 million browser sessions. Skyvern automates browser workflows with computer vision and LLMs. Hyperbrowser markets itself as “purpose-built for AI agents that operate on websites with advanced detection systems.” These platforms power the next generation of web scrapers, AI agents, and yes, attackers.
Here is the uncomfortable truth: traditional bot detection cannot catch them. But behavioral analysis can.
The Rise of Browser-as-a-Service
What BaaS Platforms Actually Do
Browser-as-a-Service platforms provide cloud-hosted browser infrastructure for automation at scale. Unlike traditional scraping tools that send raw HTTP requests, BaaS platforms run real Chromium browsers that execute JavaScript, render pages, and maintain sessions exactly like legitimate users.
The major players in 2025:
Browserbase - The market leader with $67.5 million in total funding. Offers managed headless browsers with session persistence, proxy support, and their Stagehand SDK for AI agent development. Used by Perplexity, Vercel, and 11x. Markets “stealth mechanisms to avoid bot detection” as a core feature.
Skyvern - Y Combinator-backed platform that combines computer vision with LLMs to automate browser workflows. Claims 64.4% accuracy on WebBench benchmarks. Specializes in form filling, login automation, and RPA tasks. Can operate on websites it has never seen before.
Hyperbrowser - Explicitly “purpose-built for AI agents that operate on websites with advanced detection systems.” Focuses on stealth, persistence, and staying undetected on sites with aggressive anti-bot measures. Integrated with their HyperAgent framework.
Kernel - Uses unikernel technology for sub-300ms cold starts. SOC2 and HIPAA compliant. Targets fraud detection and e-commerce monitoring use cases.
Browser Use - Open-source alternative gaining traction. Provides browser automation primitives that integrate with various AI frameworks.
MultiOn - AI agent platform that can browse the web autonomously to complete tasks on behalf of users.
The Business Model: Stealth as a Feature
These platforms compete on evasion capability. From Browserbase’s marketing: advanced debugging tools, session recording, proxy support, and “stealth mechanisms to avoid bot detection.” From Hyperbrowser: “engineered to stay undetected and maintain stable sessions over time, even on sites with aggressive anti-bot measures.”
This is not subtle. Stealth is the product.
The pricing models reflect the value proposition. Browserbase charges for browser sessions. Hyperbrowser charges for compute time. The implicit promise: their infrastructure investment in evasion means your scrapers will not get blocked.
For defenders, this changes the calculus. You are no longer detecting amateur scrapers with obvious tells. You are detecting professional infrastructure specifically designed to defeat you.
How BaaS Platforms Evade Traditional Detection
Understanding evasion techniques is essential for building detection that works. Here is what you are up against.
Stripping navigator.webdriver
The navigator.webdriver property is set to true when a browser is controlled by automation tools. It was designed as a standard signal for websites to detect automation. Every BaaS platform removes it:
// What detection checks for
if (navigator.webdriver === true) {
flagAsBot();
}
// How BaaS platforms evade
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
// Or via Chrome flags
// --disable-blink-features=AutomationControlledThis is table stakes. Any platform charging for browser automation handles this automatically.
Dynamic User-Agent Generation
BaaS platforms generate different user agents for each session. Stytch’s research revealed an important detail: Browserbase generates slightly different user-agents each session, which sometimes aligns with the underlying Chromium runtime but sometimes attempts to be deceptive by claiming to be a different version.
This creates detectable inconsistencies. The user agent claims Chrome 120, but the browser’s actual capabilities match Chrome 118. The TLS fingerprint reveals the true Chromium version. Canvas rendering behavior does not match the claimed browser.
Example of what Browserbase sessions produce:
Session 1: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
Session 2: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
Session 3: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/121.0.0.0 Safari/537.36The claimed browser version varies, but the underlying Chromium runtime is fixed. This mismatch is a detection vector.
Residential IP Rotation
BaaS platforms route traffic through residential proxy networks. Instead of datacenter IPs that are easy to block, traffic originates from ISP-assigned addresses that appear to be real home users.
The economics work: residential proxies cost $5-15 per GB. For high-value scraping (competitor pricing, lead generation, content aggregation), this is negligible. The result is traffic that passes basic IP reputation checks.
Patching JavaScript APIs
Modern stealth frameworks patch dozens of browser APIs to hide automation signatures:
// Chrome object spoofing
window.chrome = {
runtime: {},
loadTimes: function() {},
csi: function() {},
app: {}
};
// Plugins array spoofing
Object.defineProperty(navigator, 'plugins', {
get: () => [
{ name: 'Chrome PDF Plugin', filename: 'internal-pdf-viewer' },
{ name: 'Chrome PDF Viewer', filename: 'mhjfbmdgcfjbbpaeojofohoefgiehjai' },
{ name: 'Native Client', filename: 'internal-nacl-plugin' }
]
});
// Languages spoofing
Object.defineProperty(navigator, 'languages', {
get: () => ['en-US', 'en']
});
// Hardware concurrency spoofing
Object.defineProperty(navigator, 'hardwareConcurrency', {
get: () => 8
});
// Device memory spoofing
Object.defineProperty(navigator, 'deviceMemory', {
get: () => 8
});Puppeteer Stealth includes 17 separate evasion modules covering canvas fingerprinting, WebGL parameters, audio context, and more. BaaS platforms build on these techniques with proprietary improvements.
Removing Identifying Headers
BaaS platforms strip or modify headers that reveal automation:
# Headers that get removed or modified
X-Requested-With: (removed if set to automation tool name)
Sec-Ch-Ua-Platform: (spoofed to match claimed OS)
Sec-Ch-Ua: (regenerated to match claimed browser)
Accept-Language: (set to realistic values)Header order is also normalized. Automation tools often send headers in predictable sequences that differ from real browsers.
Using Real Browser Engines
Unlike HTTP libraries that fake browser behavior, BaaS platforms run actual Chromium instances. This means:
- JavaScript executes correctly
- CSS renders properly
- DOM APIs behave as expected
- WebSocket connections work
- Service workers function normally
This defeats detection that relies on JavaScript execution anomalies or missing browser capabilities.
Why Stealth Mode Fails Against Behavioral Analysis
BaaS platforms have solved the static fingerprinting problem. They can make a browser look legitimate across dozens of technical signals. What they cannot solve: making automation behave like humans.
Mouse Movement Entropy
Human mouse movement is chaotic. We overshoot targets, correct course, accelerate and decelerate irregularly, and move in curves rather than straight lines. Our movements have high entropy.
Automation moves efficiently. Even with randomization, the patterns are detectably different:
// Human mouse movement characteristics
{
movement_count: 147, // Many movements in 5 seconds
linear_path_ratio: 0.12, // Mostly curved paths
velocity_variance: 0.84, // Highly variable speed
grid_aligned_ratio: 0.03, // Random pixel positions
overshoots: 4, // Corrections visible
micro_movements: 23 // Small adjustments near targets
}
// BaaS automation characteristics
{
movement_count: 8, // Few movements
linear_path_ratio: 0.91, // Straight lines
velocity_variance: 0.08, // Constant speed
grid_aligned_ratio: 0.67, // Round number coordinates
overshoots: 0, // Perfect targeting
micro_movements: 0 // No adjustments
}Even when BaaS platforms add “human-like” randomization, statistical analysis reveals the synthetic patterns. Bezier curves used to simulate natural movement have detectable mathematical signatures. Random delays follow uniform distributions instead of natural human timing distributions.
Scroll Pattern Analysis
Humans scroll irregularly. We scroll fast through content we are skimming, slow down for content we are reading, and pause at interesting sections. We use scroll momentum on trackpads and mice. We overshoot and scroll back.
Automation scrolls programmatically:
// Human scroll pattern
[
{ delta: 127, timestamp: 0 },
{ delta: 89, timestamp: 43 },
{ delta: 52, timestamp: 89 }, // Momentum decay
{ delta: 23, timestamp: 142 },
{ delta: 8, timestamp: 201 },
// Pause while reading
{ delta: 234, timestamp: 4502 }, // Fast scroll past content
{ delta: 178, timestamp: 4538 },
{ delta: -45, timestamp: 4892 }, // Scroll back up
]
// BaaS automation scroll pattern
[
{ delta: 100, timestamp: 0 },
{ delta: 100, timestamp: 100 },
{ delta: 100, timestamp: 200 }, // Constant delta
{ delta: 100, timestamp: 300 }, // Constant timing
{ delta: 100, timestamp: 400 },
]The mechanical consistency is unmistakable to statistical analysis.
Click Timing Distributions
Human reaction times follow specific distributions. The time between seeing a target and clicking it clusters around 200-400ms for simple targets, with a characteristic right-skewed distribution reflecting cognitive processing.
Automation clicks are either too fast (instant clicks) or artificially delayed with uniform randomness:
// Human click timing (ms from target appearing)
[247, 312, 289, 198, 267, 334, 223, 278, 301, 256]
// Mean: 271ms, Std Dev: 42ms, Skewness: 0.34
// BaaS automation click timing
[150, 180, 160, 170, 155, 175, 165, 145, 185, 158]
// Mean: 164ms, Std Dev: 13ms, Skewness: 0.02
// Too consistent, wrong distribution shapeNavigation Pattern Analysis
Humans browse chaotically. We open multiple tabs, revisit pages, take detours, and follow tangential links. Our navigation reflects attention and interest.
AI agents navigate systematically. They follow link structures methodically, rarely backtrack unnecessarily, and optimize for task completion:
Human session:
Homepage → Products → Product A → Reviews → Product A →
Homepage → About → Products → Product B → Cart →
Products → Product A → Cart → Checkout
AI agent session:
Homepage → Products → Product A → Product B → Product C →
Product D → Product E → Product F → Product G → Product H
// Systematic crawl with no backtrackingThis navigation fingerprint persists regardless of how realistic the technical fingerprint appears.
Honeypot Link Effectiveness
The most reliable detection technique: invisible traps that only automation follows.
<!-- Hidden from visual users, visible in DOM -->
<a href="/admin-backup-2024"
style="position:absolute;left:-9999px;opacity:0;pointer-events:none;"
tabindex="-1"
aria-hidden="true">
Admin Backup Portal
</a>Human users never see this link. They cannot click it. But BaaS platforms parsing HTML and following links will find it.
For AI agents using computer vision (like Skyvern), honeypots can be visually rendered but contextually inappropriate:
<!-- Appears on product page but links to trap -->
<a href="/inventory-api-v2/dump"
class="text-xs text-gray-200 absolute bottom-0">
Export All Data
</a>An AI agent trying to extract data will follow promising-looking links. A human shopping for products will not click tiny gray text at the bottom of the page.
Detection Techniques That Actually Work
Given the sophistication of BaaS evasion, detection requires techniques that target fundamentally unfakeable signals.
TLS/JA3/JA4 Fingerprinting
Every TLS handshake reveals the true client identity. The cipher suites offered, their order, supported elliptic curves, extensions, and protocol versions create a unique fingerprint.
JA3 creates a hash from: SSL version, accepted ciphers, list of extensions, elliptic curves, and elliptic curve formats.
JA4 improves on JA3 by sorting extensions alphabetically (defeating randomization), adding ALPN values, SNI information, and distinguishing TCP from QUIC.
Real Chrome 120 JA4:
t13d1517h2_8daaf6152771_b0da82dd1658
Browserbase session claiming Chrome 120:
t13d1516h2_8daaf6152771_a9f2e3c71b42
// Different hash reveals different TLS stackEven when the user agent claims to be Chrome 120, the TLS fingerprint reveals the actual Chromium version running on BaaS infrastructure. The mismatch is a strong bot signal.
Cloudflare’s implementation demonstrates the power of this approach: TLS fingerprint detection is a primary mechanism for their bot management, and anti-bot systems compare incoming TLS fingerprints against databases of known patterns before any HTTP data is exchanged.
Browser Capability Verification
The claimed browser should support specific capabilities. Test for them:
// If User-Agent claims Chrome 120
const expectedFeatures = {
'Array.prototype.toSorted': true, // Added Chrome 110
'Array.prototype.toReversed': true, // Added Chrome 110
'Intl.NumberFormat.formatRange': true, // Added Chrome 106
'structuredClone': true, // Added Chrome 98
'CSS.supports("accent-color", "red")': true // Added Chrome 93
};
// Test actual support
for (const [feature, expected] of Object.entries(expectedFeatures)) {
const actual = eval(`typeof ${feature} !== 'undefined'`);
if (actual !== expected) {
flagAsInconsistent('capability_mismatch', feature);
}
}BaaS platforms running older Chromium versions while claiming newer user agents will fail these checks.
JavaScript Environment Consistency
Stealth patches leave traces. The way properties are defined differs from native implementations:
// Check if navigator.webdriver was patched vs naturally absent
const descriptor = Object.getOwnPropertyDescriptor(navigator, 'webdriver');
if (descriptor && descriptor.get &&
descriptor.get.toString().includes('undefined')) {
// Property was patched to return undefined
flagAsStealth();
}
// Check prototype chain integrity
const originalPlugins = Navigator.prototype.__lookupGetter__('plugins');
if (!originalPlugins) {
// Getter was removed or replaced
flagAsStealth();
}
// Check for override detection
const nativeCode = /\[native code\]/;
if (!nativeCode.test(navigator.plugins.toString())) {
// plugins.toString() was overridden
flagAsStealth();
}Canvas/WebGL Fingerprint Anomalies
BaaS platforms run on cloud infrastructure without GPUs. They use software rendering (SwiftShader, llvmpipe) that produces distinct fingerprints:
function detectSoftwareRendering() {
const canvas = document.createElement('canvas');
const gl = canvas.getContext('webgl');
const debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
if (!debugInfo) {
return { suspicious: true, reason: 'debug_info_blocked' };
}
const renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
const softwareIndicators = [
'SwiftShader',
'llvmpipe',
'Mesa',
'Software Rasterizer',
'Microsoft Basic Render Driver',
'ANGLE'
];
for (const indicator of softwareIndicators) {
if (renderer.includes(indicator)) {
return {
suspicious: true,
reason: 'software_renderer',
renderer
};
}
}
return { suspicious: false, renderer };
}Real users have real GPUs. Cloud-hosted browsers have software rendering. This is very difficult to spoof without actual GPU hardware.
Multi-Signal Correlation
No single signal is definitive. Sophisticated detection combines weak signals into strong verdicts:
class BotDetector {
constructor() {
this.signals = {};
this.weights = {
tls_mismatch: 40,
software_renderer: 35,
stealth_patches: 30,
capability_mismatch: 25,
behavioral_anomaly: 50,
honeypot_interaction: 100,
navigation_pattern: 35,
mouse_entropy_low: 40,
timing_distribution: 30
};
}
addSignal(name, detected, metadata = {}) {
if (detected) {
this.signals[name] = { detected: true, metadata };
}
}
calculateScore() {
let score = 0;
for (const [signal, data] of Object.entries(this.signals)) {
if (data.detected && this.weights[signal]) {
score += this.weights[signal];
}
}
return score;
}
getVerdict() {
const score = this.calculateScore();
if (score >= 100) return 'block';
if (score >= 60) return 'challenge';
if (score >= 30) return 'flag';
return 'allow';
}
}A session might pass user-agent validation, have acceptable headers, and avoid obvious automation markers. But the combination of software rendering + low mouse entropy + systematic navigation + capability mismatches produces a high-confidence bot verdict.
How WebDecoy Catches BaaS Scrapers
WebDecoy’s architecture is specifically designed for the BaaS threat landscape. Here is how we catch what other solutions miss.
Server-Side Detection Layer
Before any page content loads, we analyze:
TLS Fingerprinting: We extract JA3 and JA4 fingerprints from every connection and compare against our database of known browser signatures. When Browserbase sessions claim to be Chrome 120 but present a different TLS fingerprint, we catch the mismatch immediately.
Header Analysis: We check header order, presence of expected headers, and consistency with claimed browser identity. BaaS platforms normalize headers but often miss subtle ordering details that real browsers exhibit.
IP Intelligence: While residential proxies defeat simple IP blocking, we analyze ASN patterns, geographic consistency, and IP behavior across our network. A “residential” IP that sends requests at datacenter speeds and volumes gets flagged.
Client-Side Behavioral Analysis
Our detection script collects signals that BaaS platforms cannot fake:
Mouse Movement Analysis: We track cursor position, velocity, acceleration, and path characteristics. Our statistical models distinguish human chaos from synthetic randomization with high accuracy.
Interaction Timing: We measure click timing distributions, keyboard cadence, and scroll behavior. These biometric signals persist regardless of how sophisticated the browser fingerprint spoofing becomes.
Session Behavior: We analyze navigation patterns, page engagement, and content interaction. AI agents exhibit systematic behavior that humans do not.
Honeypot Technology
We deploy multiple honeypot layers:
Hidden Link Traps: Invisible links that only crawlers following the DOM will discover. Any interaction is a definitive bot signal.
Decoy Endpoints: Fake API endpoints that appear in page source but are not used by legitimate application flows. Scrapers probing for data access will hit these traps.
Canary Content: Unique text strings that we monitor for unauthorized reproduction. When your content appears in AI training data or competitor sites, you know exactly which scraper took it.
Real-Time Threat Scoring
Every session gets a continuously updated threat score:
{
"session_id": "sess_x9y8z7",
"threat_score": 87,
"signals": {
"tls_mismatch": { "detected": true, "weight": 40 },
"mouse_entropy_low": { "detected": true, "weight": 40 },
"software_renderer": { "detected": false },
"navigation_systematic": { "detected": true, "weight": 35 },
"honeypot_proximity": { "detected": true, "weight": 25 }
},
"verdict": "challenge",
"baas_likely": true,
"suspected_platform": "browserbase"
}Even when Browserbase rotates IPs and spoofs user agents, WebDecoy’s behavioral analysis identifies the automation. The threat score reflects our confidence, and you control the response policy.
SIEM Integration
Detection events flow to your security infrastructure in real-time:
- Splunk: Direct integration via HTTP Event Collector
- Elastic: Native integration for security operations
- CrowdStrike: Threat intelligence sharing
- Custom Webhooks: JSON payloads for any system
This enables network-level response beyond application blocking.
Implementation Recommendations
Start with Honeypots
Honeypots provide the highest confidence signals with zero false positives. A human user cannot interact with an invisible element. Any interaction is definitive proof of automation.
Deploy immediately:
- Hidden form fields that trigger on any input
- Invisible links to trap endpoints
- CSS-hidden content that only parsers see
Layer Detection Methods
No single technique catches everything. Combine:
- Honeypots (zero false positives, catches 70-80%)
- TLS fingerprinting (fast, server-side, catches mismatches)
- Behavioral analysis (catches sophisticated evasion)
- Multi-signal correlation (highest accuracy)
Monitor for Adaptation
BaaS platforms update their evasion techniques. Monitor your detection effectiveness:
- Track detection rates by signal type
- Watch for new bypass patterns
- Update behavioral baselines regularly
- Share threat intelligence with the security community
Use Progressive Challenges
Do not block immediately on weak signals:
- Low confidence: Log and observe
- Medium confidence: Add rate limiting
- High confidence: Challenge with CAPTCHA
- Definitive (honeypot): Block
This approach catches bots while minimizing false positives for legitimate users.
The Arms Race Continues
Browser-as-a-Service is not going away. The market is growing, funding is flowing, and the platforms are getting more sophisticated. Stagehand has thousands of GitHub stars. Browserbase serves enterprise customers. AI agents that need to browse the web will increasingly rely on this infrastructure.
But the fundamental asymmetry favors defenders who invest in behavioral analysis. BaaS platforms can fake technical fingerprints. They cannot fake being human. The mouse movements, timing distributions, navigation patterns, and interaction behaviors that distinguish humans from automation remain detectable no matter how realistic the browser environment appears.
The question is not whether you can detect BaaS scrapers. The question is whether your current solution is designed for this threat.
Traditional bot detection built for the Selenium era will not catch Browserbase sessions. Fingerprint-only solutions will not catch stealth browsers with residential proxies. You need detection that analyzes behavior, correlates multiple signals, and adapts as evasion evolves.
WebDecoy was built for exactly this threat. Our multi-layer architecture combines server-side TLS analysis, client-side behavioral detection, and honeypot technology to catch BaaS scrapers that other solutions miss.
Ready to detect BaaS scrapers targeting your site?
Start Your Free Trial and deploy WebDecoy in under 5 minutes. See the difference in your detection rates immediately.
Have questions about catching specific AI browsers? Read our documentation or contact us directly.
Related Resources:
Share this post
Like this post? Share it with your friends!
Want to see WebDecoy in action?
Get a personalized demo from our team.