memory
GPU Infrastructure & AI Model Deployment
Deployed and managed GPU SKUs across old architecture, testing MIG
profiles and multiple SKU configurations. Upgraded decommissioned GPUs with newer models,
dramatically improving ML inference performance.
speed 300 → 12 req/sec
hub
Multi-Cluster AKS Orchestration
Led comprehensive upgrades of AKS, Istio, and Terraform providers across
three distinct architectures. Deployed two new hub clusters from scratch for
3 environments and onboarded foundational code for our dev team.
query_stats
Observability & ECK Transformation
Developed comprehensive ECK resources — repos, dependent charts, service
accounts with minimal permissions, and ArgoCD applications. Implemented
Prometheus Push Gateway for ephemeral job metrics, eliminating blind spots
in short-lived processes.
smart_toy
OpenAI & AI Platform Enablement
Deployed GPT-4o models across all environments for Data Science
teams. Provisioned managed deployments for production, configured Redis cache
for ML data acceleration, and set up Grafana OpenAI dashboards.
architecture
Reference Architecture Modernization
Redesigned ServiceBus module from per-team individual modules into a
unified, reusable architecture. Created protected production branches with automated tagging
workflows and fixed critical storage class retain policies.
build
SRE Tooling & Automation
Engineered debug container image with curated troubleshooting tools,
reducing MTTR. Built helm-diff PR scripts, AKS logshipper for audit logs, Vault troubleshooting
scripts, and automated Azure Blob lifecycle management.