Scanner Classification
How we populate all four intent axes — malicious, suspicious, benign, unknown — from first-party behavioral signals and external corroboration, under a declared precedence that's auditable in the source.
Two writers populate ThreatActor.intent: a hostname classifier owns benign (curated scanner orgs) and tor_exit-based suspicious; a behavioral reconciler owns the rest. The reconciler reads session pattern, confidence, external corroboration, and ASN-DROP membership — already on the actor — and promotes unknown to suspicious or malicious under a published precedence. benign is sticky; Tor-suspicious is preserved.
// What we see right now
14,524 actors total. The reconciler runs every aggregation cycle, so numbers drift as new behavior lands and as old actors age out. The large unknown cohort is the honest answer for actors whose observable behavior doesn't meet any promotion floor — we'd rather leave them unlabeled than guess.
// Precedence
Two writers can disagree about an actor. When they do, the stronger verdict wins, in this order:
- Benign is sticky. Once the hostname classifier flags an actor as a known research scanner (Censys, Shodan, Shadowserver, etc.), no behavioral signal overrides that — matches the invariant the benign-confidence cap depends on.
- Tor-suspicious is preserved. The hostname classifier owns the tor_exit reason; the reconciler sees suspicious on a Tor exit and leaves it alone, because "Tor exit" is strictly stronger than anything behavioral signals would say about the same actor.
- Behavioral promotions are reversible. If an actor's behavior weakens below the floor on a later cycle, they fall back to unknown. No intent is locked-in except benign.
The precedence is a module docstring in
apps/threats/intent.py,
not a runtime flag. Changing it requires editing the source and landing a
migration-grade change.
// Promotion rules
Each rule is a module constant. First match wins; absence of a match leaves intent unchanged (usually unknown).
| Axis | Trigger |
|---|---|
| malicious | Behavioral pattern: malware_dropper, data_exfiltrator, interactive_operator |
| suspicious | Behavioral pattern: credential_harvester, opportunistic_bruter, proxy_abuser, plus protocol-specific bruters (mysql/ftp/telnet) |
| suspicious | External corroboration from independent OSINT feeds |
| suspicious | ASN on Spamhaus ASN-DROP + observable activity |
| benign | Reverse DNS matches curated scanner-organization registry (Censys, Shodan, Shadowserver, Onyphe, Stretchoid, Internet-Measurement, Modat, &c) |
Thresholds live at
apps/threats/intent.py
as module constants
(MALICIOUS_CONFIDENCE_FLOOR,
SUSPICIOUS_CORROBORATION_MIN, &c). A
manage.py classify_intent --dry-run
management command prints the projected distribution shift before you
commit to a threshold change.
// Top malicious actors right now
Highest-confidence malicious actors with the reconciler's reason string. The reason is stored in enrichment_metadata.intent for every promotion, so each verdict is auditable.
| Actor | Reason |
|---|---|
| 130.12.180.51 | behavioral:data_exfiltrator conf=0.61 |
| 107.175.34.74 | behavioral:malware_dropper conf=0.61 |
| 14.103.123.80 | behavioral:malware_dropper conf=0.60 |
| 109.206.241.199 | behavioral:malware_dropper conf=0.54 |
| 141.148.151.4 | behavioral:malware_dropper conf=0.58 |
// Top suspicious actors right now
Actors meeting one of the suspicious-axis rules — a behavioral bruter pattern, multi-feed corroboration, ASN-DROP activity, or a Tor exit.
| Actor | Reason |
|---|---|
| 81.192.46.45 | behavioral:credential_harvester conf=0.55 |
| 135.235.138.43 | behavioral:credential_harvester conf=0.56 |
| 154.198.162.75 | behavioral:credential_harvester conf=0.55 |
| 103.13.207.34 | behavioral:credential_harvester conf=0.56 |
| 213.209.159.159 | behavioral:credential_harvester conf=0.66 |
// The benign cohort
Benign actors are research scanners — they don't threaten you, but they do scan you, and treating their traffic as threat-scored noise contaminates any reputation product. We tag them, cap their confidence at 0.1, exclude them from campaign clustering, and leave them in the public data so analysts can audit the call.
| Domain | IPs |
|---|---|
| shadowserver.org | 68 |
| censys-scanner.com | 53 |
| onyphe.net | 38 |
| deepfield.net | 16 |
| internet-measurement.com | 15 |
| stretchoid.com | 15 |
| modat.io | 15 |
| internet-census.org | 11 |
// What the classifier won't catch
Honest about the limits:
- Low-signal actors. An actor that only shows up once, on one sensor, with a scanner-shaped session won't meet any promotion floor. They stay unknown — which is the honest answer, not a gap.
- New benign scanners not in the registry. Our scanner-organization registry is curated. A new legitimate research operator won't be flagged benign until we add their domain. Until then, their activity may trigger a behavioral-suspicious promotion if it looks like scanning.
- Mixed-intent actors. An actor whose behavior genuinely straddles suspicious and malicious (the same IP dropping malware and running a credential harvester) collapses to whichever rule fires first. The primary_threat_category field captures the strongest signal; the rest is in session-level detail.
- Heuristic, not learned. No ML model in the loop. Weights and floors are hand-tuned and published. If you disagree with them, the precedence is on the page and the constants are in the source — rescore in your own pipeline.
// How to use it
GET /api/v1/threats?intent=malicious
Also accepts suspicious, benign, unknown. Combinable with other filters.
GET /api/v1/actor/<ip>
Response includes intent plus enrichment_metadata.intent.reason — the exact rule that fired.
Benign actors are already excluded from /feeds/v1/ips.txt. Safe to ingest without filtering out Censys et al.
Per-axis counts and samples on this page refresh every request. The rule that promoted each actor is stored on the actor — no black box, no cached verdict.
// See also
- /intelligence/ — full methodology: collection, classification, scoring, sources
- /intelligence/methodology — published 6-signal confidence formula with weights
- /intelligence/scanners/ — benign scanner registry (full org list)
- /features/operator-discovery/ — how HASSH clustering complements the intent axes