I added one field and found a 4,154-IP botnet

Last week I shipped a small feature for IntrusionLabs: click any HASSH fingerprint on an actor page and see every other actor sharing it. About four hundred lines of code. It took an afternoon.

The first fingerprint I clicked returned 4,154 IPs.

That's the post.

Well, that's the headline. The actual story is more interesting, because once you start looking at the data through that lens, the disguise the operator was using to look like 4,154 independent attackers comes apart in a single SQL query. Here's what I found, and the cheap-but-load-bearing data point that made it visible.

What IntrusionLabs is

IntrusionLabs runs a small public network of honeypot sensors (Cowrie SSH and OpenCanary multi-protocol) on rented VPS in three regions: Newark, Seattle, and Singapore. Free, public, no signup. The data flows into a Django app that scores attacker behavior, corroborates it against open-source threat feeds (Spamhaus DROP, Tor exit nodes, DShield, CINS Army, BlocklistDE, abuse.ch's Feodo Tracker), and clusters the result into campaigns.

I built it because the existing options weren't sized right for me. Free feeds (AbuseIPDB, plain DShield) are noisy and lump everything together as "scanner." Commercial CTI feeds cost five figures a year and you can't see what's inside them. There's a middle that doesn't really exist: cheap, transparent, drillable, honest. That's what IntrusionLabs is trying to be.

It's a solo project. I'm not pretending it has the volume CrowdSec has. We track about 14k threat actors at any given moment, not 14M. The pitch is depth and pivot-ability, not coverage.

What HASSH is

HASSH is an MD5 hash of the algorithms an SSH client offers during the key exchange: KEX, encryption, MAC, and compression algorithm sets, in the exact order the client lists them. It was published by Salesforce engineering in 2018 and it works because clients that share an SSH library, version, and configuration produce the same algorithm offering, which produces the same hash.

Two attackers running the same SSH client on different IPs in different countries on different cloud providers will have the same HASSH. A botnet running its own custom SSH library will have one HASSH across every infected host. A scanner using ZGrab will identify itself with the same HASSH on every connection regardless of which Google Cloud VM it's currently coming from.

That's the whole insight, and it's all I'm going to say about HASSH itself, because the original whitepaper is short and worth reading directly.

The thing that matters for what I found: an attacker can rent new IPs every day, but they can't easily change the fingerprint of the SSH client they're running. That's a different shape of disguise than the one our subnet- and ASN-based clustering was built for, and it makes a different shape of operator visible.

Adding the pivot

I'd added a hassh field to our SessionProfile model six weeks ago and had been quietly populating it from every Cowrie session. Last week I shipped the actual pivot view at /tools/hassh/<fingerprint>/: given a fingerprint, list every actor whose sessions carry it. Plus a JSON API at /api/v1/fingerprints/hassh/<fingerprint> and a top-clusters feed at /api/v1/clusters/hassh/top.

The campaign detector that promotes a HASSH cluster to a tracked threat campaign is twelve lines of real logic:

candidates = (
    SessionProfile.objects
    .filter(hassh__gt="", last_seen__gte=cutoff)
    .exclude(src_ip__in=benign_ips)
    .values("hassh")
    .annotate(n=Count("src_ip", distinct=True))
    .filter(n__gte=HASSH_MIN_ACTORS)
)

The thresholds are deliberately conservative. To qualify as a campaign, a HASSH needs at least 50 distinct non-benign actors sharing it, and those actors need to span at least 3 distinct /16 subnets. The /16 dispersion check is the important one: a single /16 means a hosting farm, which our existing subnet detector already catches. Requiring geographic dispersion isolates the operators who deliberately spread their infrastructure across providers and countries to defeat conventional clustering.

The query above is essentially the whole detector. The rest of the file is bookkeeping: building the campaign record, deduping membership, lifecycle management. Twelve lines of real logic, dropped into the existing campaign-detection orchestrator that already runs every aggregation cycle.

I expected the detector to surface a few small clusters that we'd then have to investigate to see if they were real. It surfaced two on first run.

The finding

The top result was HASSH 03a80b21afa810682a776a7d42e5e6fb.

4,154 distinct source IPs all-time
662 IPs still active in the last 7 days
109,248 sessions total
All 4,154 IPs running SSH-2.0-libssh_0.11.1: same library, same patch version, same banner string, every connection

You can see the live cluster page here: /tools/hassh/03a80b21afa810682a776a7d42e5e6fb/. The numbers will have drifted slightly by the time you read this (the 7-day window slides) but the shape is stable.

The infrastructure spread is the part that should make a defender's eyes widen. Top 10 ASNs of the all-time cluster:

ASN	Org	Actor count
AS4811	China Telecom Group	254
AS135377	UCLOUD HK	193
AS14061	DigitalOcean	186
AS136052	PT Cloud Hosting Indonesia	182
AS4134	Chinanet	138
AS8075	Microsoft Azure	112
AS150436	Byteplus SG	106
AS38365	Beijing Baidu	104
AS16276	OVH SAS	98
AS137718	Beijing Volcano Engine	87

That's Chinese state telcos, Western clouds, regional Southeast Asian providers, and EU hosters, all running the same SSH client, all hitting our sensors, all part of one operator's infrastructure. The geographic distribution is just as dispersed: 74 distinct countries in the active 7-day window alone.

You can read this two ways. The optimistic read is that the operator is using a commodity SSH library (libssh) that is actually run by lots of unrelated people, and we're collapsing legitimate diversity into a fake cluster. I considered that. It doesn't hold up. Generic libssh deployments don't cluster on SSH-2.0-libssh_0.11.1 exactly. There are dozens of patch versions in use, and tools that wrap libssh almost always rewrite the version banner to something more specific. Seeing 4,154 IPs all carrying the same version banner across 10 different ASNs in 74 different countries is not coincidence. It is one operator deploying one binary on a lot of rented VPS.

The pessimistic read, and I think the correct one, is that this is one operator running one tool on roughly 4,000 IPs they've rented across the world's cloud markets, deliberately spread to look like 4,154 unrelated attackers. They picked Chinese telco IPs to look like a Chinese threat. They picked DigitalOcean to look like an opportunistic Western VPS-renter. They picked PT Cloud Indonesia to look like a regional botnet. They picked Microsoft Azure to look like a compromised enterprise tenant. The HASSH undoes that disguise in a single query because they didn't change their tooling.

The second cluster, a different shape

The second cluster the detector flagged was 53 IPs, all running SSH-2.0-ZGrab ZGrab SSH Survey. ZGrab is the open-source banner grabber from the ZMap project, a legitimate research tool used by Censys, by academic researchers, by anyone who wants to map the internet's SSH surface.

But the 53 IPs in this cluster are all on Google Cloud (AS396982, Google LLC), with no reverse DNS attribution to any known Censys / Shadowserver / Onyphe range. Real Censys IPs reverse to *.censys-scanner.com. Real Shadowserver IPs reverse to *.shadowserver.org. These don't.

So the tool is benign in the sense that ZGrab is a public, widely-used scanner. But the deployment is suspicious. Somebody is running the ZGrab binary from rented Google Cloud IPs to do their own SSH survey, without any of the attribution that legitimate research scanners publish. The detector caught it not because ZGrab is bad but because the coordination across 53 unattributed Google Cloud IPs is a behavioral signal in its own right.

This matters because it's a different shape of finding than the libssh botnet. The libssh cluster is one operator running one custom tool. The ZGrab cluster is one operator running someone else's standard tool, dressed up in cloud infrastructure that doesn't claim attribution. The same detector caught both. That's the part that made me think the architecture is right, not just the first finding.

Why this matters to defenders

Per-IP blocking is whack-a-mole when the operator owns 4,000 IPs and rents new ones daily. You'll never finish.

HASSH-based clustering targets the tooling, which the operator can't change without rewriting their software. If you can match on a HASSH, you can block the operator's entire fleet (present and future) with a single rule that triggers on a property of the connection that is much more stable than the source IP.

The concrete defender action: hit /api/v1/clusters/hassh/top, get JSON of the current top operator clusters, push the HASSH match rules to your SIEM or your edge SSH rate-limiter. It's free. It's public. It's rate-limited at 60 requests per hour per IP, which is plenty for hourly polling. There's no signup, no API key, no commercial-use-restricted free tier.

curl https://intrusionlabs.com/api/v1/clusters/hassh/top

If you write up what you do with it, send me a link. I want to know.

What we don't see

A few honest limitations, because if you're trusting this kind of clustering you should know where it's blind.

We don't catch operators who use diverse tooling spread across multiple HASSH fingerprints. A sophisticated actor running a few different SSH clients across their fleet will produce a few different fingerprints, none of which individually clear the 50-actor threshold.

We don't catch anything below the 50-actor or 3-/16-subnet thresholds. Small but real coordinated operations are invisible to this detector. (We're working on credential-fingerprint and JA4 TLS pivots that will help triangulate them; same architecture, different artifact.)

We don't have global sensor coverage. Three sensors: Newark, Seattle, and Singapore. Plenty of attackers are hitting other sensors run by Greynoise, Shadowserver, and the academic researchers, and we won't see those. Adding more sensors is on the roadmap but is real-money infrastructure cost on a solo budget.

The detector is good at finding coordinated, distributed, single-operator clusters. It is not a complete attack surface. Anyone who tells you their CTI feed is complete is selling you something.

What's next

The same architecture (capture the artifact at session level, expose it as a pivot, auto-detect campaigns) extends to a few more things in the queue:

JA4 TLS fingerprints for HTTPS traffic, mirroring HASSH for SSH
Credential-set fingerprints to cluster operators who use the same password lists
File-hash pivots for malware payloads dropped on Cowrie sessions
Cross-fingerprint clustering: actors sharing HASSH AND credential list AND JA4 are near-certainly the same operator

Each one will probably produce a finding worth its own post. I'll write those up as they come.

Try it

The cluster page: https://intrusionlabs.com/tools/hassh/03a80b21afa810682a776a7d42e5e6fb/
The top-clusters JSON feed: https://intrusionlabs.com/api/v1/clusters/hassh/top
The technical deep-dive on how the detector decides: https://intrusionlabs.com/features/operator-discovery/

The data is free and public. The source code isn't open today, but if you're a fellow cowrie or OpenCanary operator, a CTI engineer working on similar pipelines, or a researcher who wants to dig into the implementation, get in touch. We're happy to talk through how this works and share what we've learned. A practitioner-facing portal, where vetted CTI professionals can work directly on the data, is on the longer roadmap. If you'd be a candidate, that's also worth a conversation.