Alert Triage Integration
LLM-powered alert triage for observability systems, with batch processing and structured output.
Turn alert noise into prioritized action by letting an LLM classify severity, surface root causes, and recommend remediation steps automatically.
Edition
Section titled “Edition”Community — runs on the OSS / Community SDK edition.
Overview
Section titled “Overview”This example demonstrates using LLMs to triage monitoring alerts, providing priority assessment, likely causes, and suggested remediation actions. It’s designed to work with Alertmanager-format alerts.
What this demonstrates
Section titled “What this demonstrates”Difficulty: Starter 🟢 · LLM
- Summary: Alert triage with LLM-powered analysis
- Scenario: Classify and prioritize alerts using LLM reasoning
tech_tagsin manifest:LLM— example idalert-triageinconformance/examples_manifest.json.
Prerequisites
Section titled “Prerequisites”- SDK: Use an installed SDK tree (
NXUSKIT_SDK_DIR,NXUSKIT_LIB_PATHas needed);test-examples.shresolves Go/Rust/Python deps from that tree only — see README.md,scripts/setup-sdk.sh, andscripts/test-examples.sh. - Languages in this example: go, rust (paths under this directory; Python may live under a sibling
python/or shared reference per Language Implementations). - Models: Set cloud provider API keys and/or run Ollama locally when you execute the Run steps (interactive flags like
--help/--verboseare documented below).
Real-World Application
Section titled “Real-World Application”SOC alert triage, IT incident management.
Technologies
Section titled “Technologies”LLM
Language Implementations
Section titled “Language Implementations”| Language | Path | Status |
|---|---|---|
| Rust | rust/ | Available |
| Go | go/ | Available |
Attach an installed SDK (NXUSKIT_SDK_DIR). See the repository README.md and scripts/test-examples.sh.
# From `/examples/integrations/alert-triage`:cd rust && cargo buildcd go && make buildFeatures
Section titled “Features”- Batch alert processing for efficiency
- Structured JSON output with priority, cause, and actions
- Compatible with Alertmanager webhook format
- Prioritization based on severity and context
Alert Format
Section titled “Alert Format”Input alerts follow the Alertmanager format:
{ "alertname": "HighMemoryUsage", "severity": "warning", "instance": "web-server-01", "description": "Memory usage above 85% for 5 minutes"}Triage Output
Section titled “Triage Output”Each alert receives a triage result:
{ "alertname": "HighMemoryUsage", "priority": 3, "summary": "Memory pressure on web server", "likely_cause": "Memory leak or increased traffic", "suggested_actions": [ "Check for memory leaks in application logs", "Review recent deployments", "Consider horizontal scaling" ]}Priority Scale
Section titled “Priority Scale”| Priority | Meaning | Response |
|---|---|---|
| 1 | Critical | Immediate action required |
| 2 | High | Respond within 1 hour |
| 3 | Medium | Respond within 4 hours |
| 4 | Low | Respond within 24 hours |
| 5 | Informational | No action required |
Library usage
Section titled “Library usage”use alert_triage::{triage_alerts, Alert};
let alerts = vec![ Alert { alertname: "...", severity: "critical", ... },];
let results = triage_alerts(&provider, "llama3", &alerts).await?;for result in results { println!("{}: Priority {} - {}", result.alertname, result.priority, result.summary);}alerts := []Alert{{AlertName: "...", Severity: "critical", ...}}
results, err := TriageAlerts(ctx, provider, "llama3", alerts)for _, result := range results { fmt.Printf("%s: Priority %d - %s\n", result.AlertName, result.Priority, result.Summary)}cd rustcargo runcd gogo run .Interactive Modes
Section titled “Interactive Modes”All examples support debugging flags:
# Verbose mode - show raw HTTP request/response datacargo run -- --verbose # Rustgo run . --verbose # Go
# Step mode - pause at each step with explanationscargo run -- --step # Rustgo run . --step # Go
# Combined modecargo run -- --verbose --stepOr use environment variables:
export NXUSKIT_VERBOSE=1export NXUSKIT_STEP=1Sample Data
Section titled “Sample Data”A sample_alerts.json file is provided with example alerts for testing.
Testing
Section titled “Testing”# Rustcd rust && cargo test
# Gocd go && go test -vIntegration Ideas
Section titled “Integration Ideas”- Alertmanager webhook: Receive alerts via HTTP webhook
- PagerDuty integration: Update incident priority based on triage
- Slack notifications: Send enriched alerts to Slack channels
- Runbook linking: Match alerts to relevant runbooks
- Historical learning: Improve triage based on past incident resolutions