Projects

Projects across DevOps, cloud, SRE, Kubernetes, MLOps, and AI automation.

These projects reflect the systems I have been building: release checks, support automation, incident triage, drift monitoring, SLO alerting, and Kubernetes operations. The repositories include local tests, CLI or API behavior, and deployment-readable infrastructure assets.

iOS | SwiftUI | SRE | Swift Testing

OpsPulse

I built a native iOS and iPadOS SRE incident commander app with deterministic service metrics, tested SLO/error-budget calculations, runbook-driven incident workflow, Reliability Lab simulations, and Markdown post-incident review export.

OpsPulse overview dashboard in iPhone Simulator OpsPulse service detail SLO screen in iPhone Simulator OpsPulse incident commander workflow screen in iPhone Simulator
Repository

Release Reliability | Python | FastAPI | Prometheus

Release Rollback Decision Service

I built a deterministic release-gate service that evaluates rollout evidence, customer impact, latency, saturation, and incident signals to recommend continue, pause, or rollback decisions.

Repository

AI Support | Vector Search | FastAPI | Evaluation

Vector Search Support Assistant

I built a deterministic support assistant that retrieves the best incident runbook, returns cited next actions, and includes retrieval evaluation gates for support-quality checks.

Repository

Incident Triage | Python | CLI/API | Kubernetes

Incident Log Summarizer

I built a production-support style CLI and API that extracts evidence, classifies severity, maps likely causes, and returns structured summaries for incident handoffs.

Repository

MLOps | Drift Monitoring | FastAPI | Terraform

MLOps Model Drift Monitor

I built a drift monitor that compares baseline and current model inputs, produces JSON drift reports, and exposes both CLI and API paths with metrics and deployment scaffolding.

Repository

SRE | Prometheus | SLO Burn Rate | API

Prometheus SLO Alert Lab

I built an SRE lab that evaluates multi-window burn rates, routes page/ticket/ok decisions, and writes incident-ready reports for platform reliability practice.

Repository

Kubernetes | Log Triage | Runbooks | Metrics

Kubernetes Log Anomaly Triage Service

I built a Kubernetes-focused triage service that scores log batches, identifies likely incident classes, returns runbook steps, and exposes Prometheus metrics.

Repository