Skip to content

Alert Triage Integration

LLM-powered alert triage for observability systems, with batch processing and structured output.

Turn alert noise into prioritized action by letting an LLM classify severity, surface root causes, and recommend remediation steps automatically.

Community — runs on the OSS / Community SDK edition.

This example demonstrates using LLMs to triage monitoring alerts, providing priority assessment, likely causes, and suggested remediation actions. It’s designed to work with Alertmanager-format alerts.

Difficulty: Starter 🟢 · LLM

  • Summary: Alert triage with LLM-powered analysis
  • Scenario: Classify and prioritize alerts using LLM reasoning
  • tech_tags in manifest: LLM — example id alert-triage in conformance/examples_manifest.json.
  • SDK: Use an installed SDK tree (NXUSKIT_SDK_DIR, NXUSKIT_LIB_PATH as needed); test-examples.sh resolves Go/Rust/Python deps from that tree only — see README.md, scripts/setup-sdk.sh, and scripts/test-examples.sh.
  • Languages in this example: go, rust (paths under this directory; Python may live under a sibling python/ or shared reference per Language Implementations).
  • Models: Set cloud provider API keys and/or run Ollama locally when you execute the Run steps (interactive flags like --help / --verbose are documented below).

SOC alert triage, IT incident management.

LLM

LanguagePathStatus
Rustrust/Available
Gogo/Available

Attach an installed SDK (NXUSKIT_SDK_DIR). See the repository README.md and scripts/test-examples.sh.

Terminal window
# From `/examples/integrations/alert-triage`:
cd rust && cargo build
cd go && make build
  • Batch alert processing for efficiency
  • Structured JSON output with priority, cause, and actions
  • Compatible with Alertmanager webhook format
  • Prioritization based on severity and context

Input alerts follow the Alertmanager format:

{
"alertname": "HighMemoryUsage",
"severity": "warning",
"instance": "web-server-01",
"description": "Memory usage above 85% for 5 minutes"
}

Each alert receives a triage result:

{
"alertname": "HighMemoryUsage",
"priority": 3,
"summary": "Memory pressure on web server",
"likely_cause": "Memory leak or increased traffic",
"suggested_actions": [
"Check for memory leaks in application logs",
"Review recent deployments",
"Consider horizontal scaling"
]
}
PriorityMeaningResponse
1CriticalImmediate action required
2HighRespond within 1 hour
3MediumRespond within 4 hours
4LowRespond within 24 hours
5InformationalNo action required
use alert_triage::{triage_alerts, Alert};
let alerts = vec![
Alert { alertname: "...", severity: "critical", ... },
];
let results = triage_alerts(&provider, "llama3", &alerts).await?;
for result in results {
println!("{}: Priority {} - {}", result.alertname, result.priority, result.summary);
}
alerts := []Alert{{AlertName: "...", Severity: "critical", ...}}
results, err := TriageAlerts(ctx, provider, "llama3", alerts)
for _, result := range results {
fmt.Printf("%s: Priority %d - %s\n", result.AlertName, result.Priority, result.Summary)
}
Terminal window
cd rust
cargo run
Terminal window
cd go
go run .

All examples support debugging flags:

Terminal window
# Verbose mode - show raw HTTP request/response data
cargo run -- --verbose # Rust
go run . --verbose # Go
# Step mode - pause at each step with explanations
cargo run -- --step # Rust
go run . --step # Go
# Combined mode
cargo run -- --verbose --step

Or use environment variables:

Terminal window
export NXUSKIT_VERBOSE=1
export NXUSKIT_STEP=1

A sample_alerts.json file is provided with example alerts for testing.

Terminal window
# Rust
cd rust && cargo test
# Go
cd go && go test -v
  1. Alertmanager webhook: Receive alerts via HTTP webhook
  2. PagerDuty integration: Update incident priority based on triage
  3. Slack notifications: Send enriched alerts to Slack channels
  4. Runbook linking: Match alerts to relevant runbooks
  5. Historical learning: Improve triage based on past incident resolutions