Check our lag before you trust our verdict
A threat feed's verdict is only as current as its data. Most feeds won't tell you when they went stale. IntrusionLabs publishes every stage of its pipeline — ingestion, aggregation, enrichment, campaign detection, per-collector heartbeats — at a public endpoint. Point your monitoring at it; gate on it.
GET /api/v1/health/ returns JSON covering ten subsystems, each with a status (ok / degraded / down) and the timing details behind that status. Right now 3/3 edge collectors are reporting healthy. Every severity threshold is published below — nothing is hidden behind a "trust us, we're fresh" badge.
// Ten subsystems, one endpoint
| Key | What it covers |
|---|---|
| database | PostgreSQL operational DB |
| timeseries_db | TimescaleDB honeypot event store |
| cache | Django cache layer |
| valkey | Valkey pub/sub bus |
| collectors | Edge node heartbeats |
| ingestion | Event ingestion rate |
| pipeline | Aggregation lag |
| enrichment | Hostname classification |
| campaign_freshness | Campaign detection |
| threats | Active threat volume |
// Severity thresholds (the ones you can alert on)
Every threshold below is defined in apps/api/health.py. If you want your monitoring to page when our data goes stale, these are the numbers to gate on.
| Metric | OK | Degraded | Down |
|---|---|---|---|
| ingestion_lag_seconds | ≤120s | 120–600s | >600s |
| aggregation_lag_seconds | agg_age ≤600s | agg_age ≤900s | agg_age >900s |
| collector heartbeat | ≤5 min | 5–15 min | >15 min |
| enrichment age | ≤2h | 2–6h | >6h |
| campaign age | ≤2h | 2–6h | >6h |
// Why publish this at all
Every threat feed has outages. Sensors go down, aggregation jobs hang, feeds miss a sync. The question is whether you, the consumer, find out. When a feed's pipeline goes stale and the feed keeps serving yesterday's verdicts as though they were current, you make block/allow decisions on data that's lying to you about its freshness.
IntrusionLabs' approach: publish the lag, let you gate on it. If aggregation_lag_seconds exceeds 900, skip the current query and check back when it recovers. If a specific collector stopped reporting, the collectors.nodes block names it. If the enrichment queue is backed up, the enrichment block says so.
This is what "threat intelligence with receipts" means in practice — the receipts include the timestamps.
// Per-collector detail
The collectors.nodes block of the response names every active edge collector, its location code, the timestamp of its last ingest, minutes since that ingest, and the collector's git SHA. 3/3 are healthy right now. The pipeline.per_node block cross-references that against actual event arrival lag, so a collector that's heartbeating but not forwarding events is visible as "heartbeat ok, event lag elevated."
New collector geographies come online as we add them. The version field exposed at the top level of the health response reflects the exact git SHA running in production — handy for correlating a behavior change with a deploy.
// How to use it
- Pre-flight every batch query. If the top-level status is not ok, defer the query or degrade gracefully.
- Alert on your own infrastructure. Scrape the endpoint into Prometheus; alert when aggregation_lag_seconds or ingestion_lag_seconds exceed the thresholds above.
- Correlate with deploy SHA. The version field at the root of the response is the exact commit running in production. Useful for "did something change at this time?" investigations.
- Check before publishing conclusions. If you're about to write up a finding sourced from IL data, sanity-check freshness at the time the query ran.
The health endpoint reports what our own code can see. If the Django process is up but a downstream dependency returns bad data silently, we won't catch that here — use it as a necessary condition, not sufficient.
Thresholds are current as of this writing; they'll change as sensor footprint grows. If we tighten them, the page updates. If you want stability for alerting, pin your own copy of the threshold table or scrape the raw numbers rather than the status string.
- GET /api/v1/health/ → — the live endpoint
- Confidence scoring → — the published formula for the verdicts themselves
- Provenance → — every verdict traces to its raw event