Features / Scanner classification
// Deep dive — for CTI practitioners

Scanner Classification

How we populate all four intent axes — malicious, suspicious, benign, unknown — from first-party behavioral signals and external corroboration, under a declared precedence that's auditable in the source.

TL;DR

Two writers populate ThreatActor.intent: a hostname classifier owns benign (curated scanner orgs) and tor_exit-based suspicious; a behavioral reconciler owns the rest. The reconciler reads session pattern, confidence, external corroboration, and ASN-DROP membership — already on the actor — and promotes unknown to suspicious or malicious under a published precedence. benign is sticky; Tor-suspicious is preserved.

// What we see right now

Malicious
873
6.0% of actors
Suspicious
4,689
32.3% of actors
Benign
259
1.8% of actors
Unknown
8,703
59.9% of actors

14,524 actors total. The reconciler runs every aggregation cycle, so numbers drift as new behavior lands and as old actors age out. The large unknown cohort is the honest answer for actors whose observable behavior doesn't meet any promotion floor — we'd rather leave them unlabeled than guess.

// Precedence

Two writers can disagree about an actor. When they do, the stronger verdict wins, in this order:

benign  >  malicious  >  suspicious  >  unknown
  • Benign is sticky. Once the hostname classifier flags an actor as a known research scanner (Censys, Shodan, Shadowserver, etc.), no behavioral signal overrides that — matches the invariant the benign-confidence cap depends on.
  • Tor-suspicious is preserved. The hostname classifier owns the tor_exit reason; the reconciler sees suspicious on a Tor exit and leaves it alone, because "Tor exit" is strictly stronger than anything behavioral signals would say about the same actor.
  • Behavioral promotions are reversible. If an actor's behavior weakens below the floor on a later cycle, they fall back to unknown. No intent is locked-in except benign.

The precedence is a module docstring in apps/threats/intent.py, not a runtime flag. Changing it requires editing the source and landing a migration-grade change.

// Promotion rules

Each rule is a module constant. First match wins; absence of a match leaves intent unchanged (usually unknown).

Axis Trigger
malicious Behavioral pattern: malware_dropper, data_exfiltrator, interactive_operator
suspicious Behavioral pattern: credential_harvester, opportunistic_bruter, proxy_abuser, plus protocol-specific bruters (mysql/ftp/telnet)
suspicious External corroboration from independent OSINT feeds
suspicious ASN on Spamhaus ASN-DROP + observable activity
benign Reverse DNS matches curated scanner-organization registry (Censys, Shodan, Shadowserver, Onyphe, Stretchoid, Internet-Measurement, Modat, &c)

Thresholds live at apps/threats/intent.py as module constants (MALICIOUS_CONFIDENCE_FLOOR, SUSPICIOUS_CORROBORATION_MIN, &c). A manage.py classify_intent --dry-run management command prints the projected distribution shift before you commit to a threshold change.

// Top malicious actors right now

Highest-confidence malicious actors with the reconciler's reason string. The reason is stored in enrichment_metadata.intent for every promotion, so each verdict is auditable.

Actor Reason
130.12.180.51 behavioral:data_exfiltrator conf=0.61
107.175.34.74 behavioral:malware_dropper conf=0.61
14.103.123.80 behavioral:malware_dropper conf=0.60
109.206.241.199 behavioral:malware_dropper conf=0.54
141.148.151.4 behavioral:malware_dropper conf=0.58

// Top suspicious actors right now

Actors meeting one of the suspicious-axis rules — a behavioral bruter pattern, multi-feed corroboration, ASN-DROP activity, or a Tor exit.

Actor Reason
81.192.46.45 behavioral:credential_harvester conf=0.55
135.235.138.43 behavioral:credential_harvester conf=0.56
154.198.162.75 behavioral:credential_harvester conf=0.55
103.13.207.34 behavioral:credential_harvester conf=0.56
213.209.159.159 behavioral:credential_harvester conf=0.66

// The benign cohort

Benign actors are research scanners — they don't threaten you, but they do scan you, and treating their traffic as threat-scored noise contaminates any reputation product. We tag them, cap their confidence at 0.1, exclude them from campaign clustering, and leave them in the public data so analysts can audit the call.

// What the classifier won't catch

Honest about the limits:

  • Low-signal actors. An actor that only shows up once, on one sensor, with a scanner-shaped session won't meet any promotion floor. They stay unknown — which is the honest answer, not a gap.
  • New benign scanners not in the registry. Our scanner-organization registry is curated. A new legitimate research operator won't be flagged benign until we add their domain. Until then, their activity may trigger a behavioral-suspicious promotion if it looks like scanning.
  • Mixed-intent actors. An actor whose behavior genuinely straddles suspicious and malicious (the same IP dropping malware and running a credential harvester) collapses to whichever rule fires first. The primary_threat_category field captures the strongest signal; the rest is in session-level detail.
  • Heuristic, not learned. No ML model in the loop. Weights and floors are hand-tuned and published. If you disagree with them, the precedence is on the page and the constants are in the source — rescore in your own pipeline.

// How to use it

Filter by intent on the API
GET /api/v1/threats?intent=malicious

Also accepts suspicious, benign, unknown. Combinable with other filters.

Inspect a verdict
GET /api/v1/actor/<ip>

Response includes intent plus enrichment_metadata.intent.reason — the exact rule that fired.

Benign-only blocklist exclusion

Benign actors are already excluded from /feeds/v1/ips.txt. Safe to ingest without filtering out Censys et al.

Audit the distribution yourself

Per-axis counts and samples on this page refresh every request. The rule that promoted each actor is stored on the actor — no black box, no cached verdict.

// See also