Features / Scanner classification

// Deep dive, for CTI practitioners

Scanner Classification

How we populate all four intent axes, malicious, suspicious, benign, unknown, from first-party behavioral signals and external corroboration, under a declared precedence that's auditable in the source.

TL;DR

Two writers populate ThreatActor.intent: a hostname classifier owns benign (curated scanner orgs) and tor_exit-based suspicious; a behavioral reconciler owns the rest. The reconciler reads session pattern, confidence, external corroboration, and ASN-DROP membership, already on the actor. and promotes unknown to suspicious or malicious under a published precedence. benign is sticky; Tor-suspicious is preserved.

// What we see right now

Malicious

1,525

4.8% of actors

Suspicious

17,158

53.6% of actors

Benign

259

0.8% of actors

Unknown

13,052

40.8% of actors

31,994 actors total. The reconciler runs every aggregation cycle, so numbers drift as new behavior lands and as old actors age out. The large unknown cohort is the honest answer for actors whose observable behavior doesn't meet any promotion floor, we'd rather leave them unlabeled than guess.

What would promote an unknown actor?

Any of the following, whichever fires first wins:

A session pattern classifier tags them as malware_dropper, data_exfiltrator, or interactive_operator at confidence ≥ 0.35 → malicious.
A session pattern classifier tags them as credential_harvester, opportunistic_bruter, proxy_abuser, or a protocol-specific bruter at confidence ≥ 0.30 → suspicious.
Two or more independent external feeds corroborate them → suspicious.
Their ASN is on Spamhaus ASN-DROP and they have ≥ 10 events → suspicious.
Their rDNS resolves to a curated scanner organization → benign.

See the full rule table below for thresholds and source references.

// Precedence

Two writers can disagree about an actor. When they do, the stronger verdict wins, in this order:

benign > malicious > suspicious > unknown

Benign is sticky. Once the hostname classifier flags an actor as a known research scanner (Censys, Shodan, Shadowserver, etc.), no behavioral signal overrides that, matches the invariant the benign-confidence cap depends on.
Tor-suspicious is preserved. The hostname classifier owns the tor_exit reason; the reconciler sees suspicious on a Tor exit and leaves it alone, because "Tor exit" is strictly stronger than anything behavioral signals would say about the same actor.
Behavioral promotions are reversible. If an actor's behavior weakens below the floor on a later cycle, they fall back to unknown. No intent is locked-in except benign.

The precedence is a module docstring in apps/threats/intent.py, not a runtime flag. Changing it requires editing the source and landing a migration-grade change.

// Promotion rules

Each rule is a module constant. First match wins; absence of a match leaves intent unchanged (usually unknown).

Axis	Trigger	Threshold
malicious	Behavioral pattern: malware_dropper, data_exfiltrator, interactive_operator	confidence ≥ 0.35
suspicious	Behavioral pattern: credential_harvester, opportunistic_bruter, proxy_abuser, plus protocol-specific bruters (mysql/ftp/telnet)	confidence ≥ 0.30
suspicious	External corroboration from independent OSINT feeds	≥ 2 distinct feeds
suspicious	ASN on Spamhaus ASN-DROP + observable activity	≥ 10 events
benign	Reverse DNS matches curated scanner-organization registry (Censys, Shodan, Shadowserver, Onyphe, Stretchoid, Internet-Measurement, Modat, &c)	FCrDNS-validated

Thresholds live at apps/threats/intent.py as module constants (MALICIOUS_CONFIDENCE_FLOOR, SUSPICIOUS_CORROBORATION_MIN, &c). A manage.py classify_intent --dry-run management command prints the projected distribution shift before you commit to a threshold change.

// Top malicious actors right now

Highest-confidence malicious actors with the reconciler's reason string. The reason is stored in enrichment_metadata.intent for every promotion, so each verdict is auditable.

Actor	Reason	Confidence	Country
92.118.39.77	behavioral:interactive_operator conf=0.82	0.86	US
80.94.92.55	behavioral:interactive_operator conf=0.49	0.86	RO
2.57.122.209	behavioral:interactive_operator conf=0.52	0.85	RO
92.118.39.49	behavioral:interactive_operator conf=0.65	0.85	US
92.118.39.50	behavioral:interactive_operator conf=0.82	0.85	US

// Top suspicious actors right now

Actors meeting one of the suspicious-axis rules, a behavioral bruter pattern, multi-feed corroboration, ASN-DROP activity, or a Tor exit.

Actor	Reason	Confidence	Country
193.32.162.84	behavioral:credential_harvester conf=0.82	0.86	RO
85.240.193.104	behavioral:credential_harvester conf=0.61	0.85	PT
80.158.109.51	behavioral:credential_harvester conf=0.44	0.85	DE
189.217.130.86	behavioral:credential_harvester conf=0.46	0.85	MX
45.164.39.253	behavioral:credential_harvester conf=0.56	0.85	BR

// The benign cohort

Benign actors are research scanners, they don't threaten you, but they do scan you, and treating their traffic as threat-scored noise contaminates any reputation product. We tag them, cap their confidence at 0.1, exclude them from campaign clustering, and leave them in the public data so analysts can audit the call.

Domain	IPs
shadowserver.org	68
censys-scanner.com	53
onyphe.net	38
deepfield.net	16
internet-measurement.com	15
stretchoid.com	15
modat.io	15
internet-census.org	11

Full registry: /intelligence/scanners/

// What the classifier won't catch

Honest about the limits:

Low-signal actors. An actor that only shows up once, on one sensor, with a scanner-shaped session won't meet any promotion floor. They stay unknown, which is the honest answer, not a gap.
New benign scanners not in the registry. Our scanner-organization registry is curated. A new legitimate research operator won't be flagged benign until we add their domain. Until then, their activity may trigger a behavioral-suspicious promotion if it looks like scanning.
Mixed-intent actors. An actor whose behavior genuinely straddles suspicious and malicious (the same IP dropping malware and running a credential harvester) collapses to whichever rule fires first. The primary_threat_category field captures the strongest signal; the rest is in session-level detail.
Heuristic, not learned. No ML model in the loop. Weights and floors are hand-tuned and published. If you disagree with them, the precedence is on the page and the constants are in the source, rescore in your own pipeline.

// How to use it

Filter by intent on the API

GET /api/v1/threats/ips?intent=malicious

Also accepts suspicious, benign, unknown. Combinable with min_confidence, max_age_hours, category, limit.

Inspect a verdict

GET /api/v1/actor/<ip>

Includes intent, intent_reason (the exact rule that fired, e.g. behavioral:malware_dropper conf=0.72), intent_source, and intent_reconciled_at. 404 on unseen IPs, 422 on private/malformed.

Benign-only blocklist exclusion

Benign actors are already excluded from /feeds/v1/ips.txt. Safe to ingest without filtering out Censys et al.

Audit the distribution yourself

Per-axis counts and samples on this page refresh every request. The rule that promoted each actor is stored on the actor, no black box, no cached verdict.

// See also

/intelligence/ - full methodology: collection, classification, scoring, sources
/intelligence/methodology - published 6-signal confidence formula with weights
/intelligence/scanners/ - benign scanner registry (full org list)
/features/operator-discovery/ - how HASSH clustering complements the intent axes