Features / Operator discovery
// Deep dive — for CTI practitioners

Operator Discovery

How session-level HASSH fingerprints surface coordinated SSH operators that subnet- and ASN-based clustering can't see — and what that's worth to your threat model.

TL;DR

We compute the HASSH MD5 of every SSH client that connects to our cowrie sensors. When ≥50 distinct non-benign actors share a fingerprint and span ≥3 distinct /16 subnets, we promote that cluster to a campaign. The /16 dispersion check ensures we surface distributed operators, not single-provider hosting farms (those are caught by subnet/ASN clustering already). The detector runs every aggregation cycle and excludes 259 benign-tagged scanners.

// What we see right now

Distinct fingerprints
102
across all sessions
Sessions w/ HASSH
403,640
33,370 in last 7d
Active campaigns
2
hassh_cluster type
Benign suppressed
259
excluded from clusters
Campaign IPs
HASSH 03a80b21afa8… — SSH-2.0-libssh_0.11.1 (666 IPs, 74 countries) 666
HASSH dd9bcf093c35… — SSH-2.0-ZGrab ZGrab SSH Survey (52 IPs, 1 countries) 52
Live data — refreshes on every aggregation cycle. Click a campaign to see members + evidence.

// Why HASSH

HASSH is an MD5 hash of the SSH client's KEX (key exchange), encryption, MAC, and compression algorithm sets, in the order the client offers them. Two clients running the same SSH library, version, and configuration produce the same HASSH — even if they're connecting from different IPs, ASNs, or countries.

That's the whole insight: an attacker can rent new IPs, but they can't easily change their tooling. A botnet operator controlling 4,000 compromised hosts is going to use the same SSH client on all of them, because that's what the malware ships with. A scanner deployed across 500 cloud VMs is going to identify identically on every connection, because that's what ZGrab does. HASSH collapses the cosmetic disguise of distributed infrastructure to expose the operator behind it.

Volume-based reputation services (Censys, Shodan, AbuseIPDB, CrowdSec) cannot cluster on HASSH because they don't run cowrie-class capture that produces HASSH. GreyNoise classifies SSH traffic but doesn't expose HASSH-based pivots publicly. We do both: every fingerprint is pivotable at /tools/hassh/<fp>/ and /api/v1/fingerprints/hassh/<fp>, and the campaign detector promotes qualifying clusters automatically.

// How the detector decides

A HASSH fingerprint qualifies as a hassh_cluster campaign when all of these are true:

  1. ≥50 distinct non-benign actors share the fingerprint. The threshold filters out small operators, accidental tool collisions, and minor scanner fleets.
  2. Those actors span ≥3 distinct /16 subnets. A single /16 means a hosting farm — already caught by our subnet detector. We require geographic dispersion to surface distributed operators specifically.
  3. At least one actor was active in the last 7 days. Stale fingerprints stop being campaigns; the lifecycle manager closes them after 30 days of inactivity.
  4. Benign-tagged actors are excluded from the count. Legitimate research scanners (Censys, Shodan, Shadowserver, etc.) don't pad our campaign counts — see classification methodology.

The thresholds are intentionally conservative. Tightening them would catch smaller operators but contaminate the cluster set with false positives. Loosening them would surface more activity but make the "this is a real distributed operator" claim less defensible. The current values (50 actors, 3 /16s) are tuned from production data and published in the source: apps/threats/campaigns.py — constants HASSH_MIN_ACTORS and HASSH_MIN_SUBNETS_16.

// Top fingerprints in the last 7 days

Distinct non-benign actors per HASSH, last 7 days. Click any fingerprint to drill into its actor list, geographic spread, and ASN distribution.

HASSH (truncated) Actors (7d)
03a80b21afa81068… 666
dd9bcf093c355da7… 52
16443846184eafde… 41
084386fa7ae5039b… 33
98f63c4d9c87edbd… 29
Bulk feed available at /api/v1/clusters/hassh/top (JSON, public, rate-limited).

// What the detector won't catch

Important to be honest about the limits:

  • Operators using diverse tooling. A sophisticated actor running multiple SSH clients across their fleet will produce multiple HASSH fingerprints. Each one might fall below the 50-actor threshold individually. Credential-fingerprint and JA4 pivots (planned, GH #186) will help triangulate these.
  • Single-provider hosting farms. Intentionally excluded by the /16 dispersion check — those are caught by our subnet and ASN detectors instead.
  • Non-SSH attacks. HASSH is SSH-specific. The same architectural pattern (capture → pivot → detector) will extend to JA4 for TLS, but that's a separate feature in the queue.
  • Brand-new operators. The 7-day window means a fresh operator needs at least one sensor touch from each of 50 IPs before they surface. Slow-and-low operators that probe gradually will take longer to cluster — that's the cost of a high-confidence threshold.

// How to use it

Pivot from a known fingerprint
GET /api/v1/fingerprints/hassh/<fp>

7-day window default; ?window=all for history. Returns up to 500 actors per call.

Top operators feed
GET /api/v1/clusters/hassh/top

Top fingerprints by recent non-benign actor count. Cached 5 min. Use as an anomaly signal.

Browse a fingerprint visually
/tools/hassh/<fp>/

HTML pivot view: actor table, top countries, top ASNs sidebar. Linkable from any actor detail page.

Subscribe to the campaign feed

Active hassh_cluster campaigns appear in the standard threat feed with a HASSH badge.

// See also