Features / Pipeline freshness
// Deep dive, for operators

Check our lag before you trust our verdict

A threat feed's verdict is only as current as its data. Most feeds won't tell you when they went stale. IntrusionLabs publishes every stage of its pipeline — ingestion, aggregation, enrichment, campaign detection, per-collector heartbeats — at a public endpoint. Point your monitoring at it; gate on it.

TL;DR

GET /api/v1/health/ returns JSON covering ten subsystems, each with a status (ok / degraded / down) and the timing details behind that status. Right now 3/3 edge collectors are reporting healthy. Every severity threshold is published below — nothing is hidden behind a "trust us, we're fresh" badge.

// Ten subsystems, one endpoint

Key What it covers
database PostgreSQL operational DB
timeseries_db TimescaleDB honeypot event store
cache Django cache layer
valkey Valkey pub/sub bus
collectors Edge node heartbeats
ingestion Event ingestion rate
pipeline Aggregation lag
enrichment Hostname classification
campaign_freshness Campaign detection
threats Active threat volume

// Severity thresholds (the ones you can alert on)

Every threshold below is defined in apps/api/health.py. If you want your monitoring to page when our data goes stale, these are the numbers to gate on.

Metric OK Degraded Down
ingestion_lag_seconds ≤120s 120–600s >600s
aggregation_lag_seconds agg_age ≤600s agg_age ≤900s agg_age >900s
collector heartbeat ≤5 min 5–15 min >15 min
enrichment age ≤2h 2–6h >6h
campaign age ≤2h 2–6h >6h

// Why publish this at all

Every threat feed has outages. Sensors go down, aggregation jobs hang, feeds miss a sync. The question is whether you, the consumer, find out. When a feed's pipeline goes stale and the feed keeps serving yesterday's verdicts as though they were current, you make block/allow decisions on data that's lying to you about its freshness.

IntrusionLabs' approach: publish the lag, let you gate on it. If aggregation_lag_seconds exceeds 900, skip the current query and check back when it recovers. If a specific collector stopped reporting, the collectors.nodes block names it. If the enrichment queue is backed up, the enrichment block says so.

This is what "threat intelligence with receipts" means in practice — the receipts include the timestamps.

// Per-collector detail

The collectors.nodes block of the response names every active edge collector, its location code, the timestamp of its last ingest, minutes since that ingest, and the collector's git SHA. 3/3 are healthy right now. The pipeline.per_node block cross-references that against actual event arrival lag, so a collector that's heartbeating but not forwarding events is visible as "heartbeat ok, event lag elevated."

New collector geographies come online as we add them. The version field exposed at the top level of the health response reflects the exact git SHA running in production — handy for correlating a behavior change with a deploy.

// How to use it

  • Pre-flight every batch query. If the top-level status is not ok, defer the query or degrade gracefully.
  • Alert on your own infrastructure. Scrape the endpoint into Prometheus; alert when aggregation_lag_seconds or ingestion_lag_seconds exceed the thresholds above.
  • Correlate with deploy SHA. The version field at the root of the response is the exact commit running in production. Useful for "did something change at this time?" investigations.
  • Check before publishing conclusions. If you're about to write up a finding sourced from IL data, sanity-check freshness at the time the query ran.
// Honest about limits

The health endpoint reports what our own code can see. If the Django process is up but a downstream dependency returns bad data silently, we won't catch that here — use it as a necessary condition, not sufficient.

Thresholds are current as of this writing; they'll change as sensor footprint grows. If we tighten them, the page updates. If you want stability for alerting, pin your own copy of the threshold table or scrape the raw numbers rather than the status string.

// See also