Observability Quickstart¶
Use this guide to apply and validate the local observability stack (Prometheus, Grafana, Jaeger, Loki).
Prerequisites¶
- Docker and Docker Compose running
- Local stack bootstrapped (
make dev)
Apply Observability Setup¶
./infrastructure/observability/scripts/setup-improvements.sh
The script restarts required services and performs baseline checks. It validates the currently provisioned dashboards and datasources before checking endpoints.
Validate Services¶
| Service | URL | Expected |
|---|---|---|
| Grafana | http://localhost:3000 |
Login page available (aion/aion) |
| Prometheus | http://localhost:9090 |
Targets up |
| Jaeger | http://localhost:16686 |
Service list available |
Generate Traffic¶
for i in {1..50}; do curl -s http://localhost:5001/aion/api/v1/health > /dev/null; done
Then confirm traces and metrics appear in dashboards.
Grafana Verification¶
- Open aion-api - RED Metrics Dashboard (Professional).
- Confirm latency, error-rate, and throughput panels are populated.
- Use trace links from exemplars to jump into Jaeger.
Record A Baseline¶
For performance-ready documentation, capture at least:
| Metric | Why it matters |
|---|---|
| request volume or smoke source | makes the observation reproducible |
| p95 latency | catches slow-tail regressions |
| error rate | separates slow from broken |
| top slow endpoints | identifies where to zoom in |
| trace exemplar or trace id | supports root-cause follow-up |
If the flow under review is projection or realtime sensitive, combine dashboard data with:
make outbox-diagnose
make record-projection-smoke
make realtime-record-smoke
Loki Verification¶
In Grafana Explore, choose Loki datasource and run:
{container_name="/aion-api-dev"} | json
Filter by fields like trace_id and request_id to correlate logs with traces.
Troubleshooting¶
- Empty dashboards: verify Prometheus scrape targets are healthy.
- No traces in Jaeger: check OTLP endpoint env vars in API container.
- Missing logs in Loki: verify Fluent Bit container status and labels.
Next Step¶
Read Platform Runtime for runtime-level observability wiring and conventions. For performance-specific documentation, use Performance Readiness.