Observability Stack

Stack completo de observabilidade: métricas, logs, traces, SLOs e alerting

3 Pilares da Observabilidade📊MetricsO quê?📝LogsPor quê?🔍TracesOnde?🔥PrometheusPull every 15sscrapePULL 15sMetric TypesCounter ↑ only (requests_total)Gauge ↑↓ (temperature, queue_size)Histogram buckets (request_duration)Summary quantiles (p50, p99)Log Pipeline📝App LogsStructured JSON🔄FluentdCollector🗄️Loki / ELKStorageTrace Pipeline📡OTel SDKAuto-instrument⚙️OTel CollectorProcess🔍JaegerTrace BackendDistributed Trace1️⃣Service Atrace-id: abc2️⃣Service Bspan-id: def3️⃣Service Cspan-id: ghiheaderheaderGolden SignalsUSE (infra): Utilization | Saturation | ErrorsRED (svc): Rate | Errors | DurationFour Golden: Latency|Traffic|Errors|SaturationService Level ObjectivesSLI: p99 latency < 200ms (medição)SLO: 99.9% dos requests < 200ms (objetivo)SLA: 99.9% uptime, senão crédito 10% (contrato)ERROR BUDGETAlerting Pipeline🚨AlertmanagerGroup + Dedup📟PagerDutyOn-call💬SlackWarning📈GrafanaDashboards

Os 3 pilares: Metrics (o quê), Logs (por quê), Traces (onde) — correlação entre eles

0/8