Vishwanath Kamble | Infrastructure Engineer

terminal Featured Builds

Engineering projects

Custom tools built to eliminate developer friction and optimize daily workflows.

klarity

A read-only Kubernetes diagnostic CLI built with Go. Scans multiple clusters and namespaces in parallel, classifies unhealthy workloads by root cause, and renders categorized terminal tables with one-line error summaries extracted directly from pod logs.

Go Kubernetes CLI

policy Core Philosophy

Strictly Read-Only (Safe for Prod)

Multi-Cluster Parallel Scanning

skewlog

open_in_new

A self-hosted web app for Helm chart version diffing. Instantly visualize changes in values.yaml and templates before upgrading your cluster. Features automated breaking change detection and local SQLite caching.

Helm Kubernetes SRE Tooling

route Active Development

Breaking changes parsing engine

Native CLI tool option

bucketlens

open_in_new

A fast, lightweight utility designed to provide visibility into cloud storage buckets. Instantly analyze storage usage, permissions, and optimize lifecycle policies from a single interface.

AWS S3 Storage SRE Tooling

hop2

open_in_new

A terminal navigator and command aliasing tool built with Python. Designed to create quick, memorable aliases and instantly hop between directories on the CLI, drastically reducing navigation time.

Python CLI Automation

rocket_launch Career

Experience in depth

Nine years building infrastructure at scale — from Wall Street trading floors to AI-powered enterprise platforms.

iManage

Site Reliability Engineer Specialist

Dec 2023 — Present

Chicago, IL

memory

GPU Infrastructure & AI Model Deployment

Deployed and managed GPU SKUs across old architecture, testing MIG profiles and multiple SKU configurations. Upgraded decommissioned GPUs with newer models, dramatically improving ML inference performance.

hub

Multi-Cluster AKS Orchestration

Led comprehensive upgrades of AKS, Istio, and Terraform providers across three distinct architectures. Deployed two new hub clusters from scratch for 3 environments and onboarded foundational code for our dev team.

query_stats

Observability & ECK Transformation

Developed comprehensive ECK resources — repos, dependent charts, service accounts with minimal permissions, and ArgoCD applications. Implemented Prometheus Push Gateway for ephemeral job metrics, eliminating blind spots in short-lived processes.

smart_toy

OpenAI & AI Platform Enablement

Deployed GPT-4o models across all environments for Data Science teams. Provisioned managed deployments for production, configured Redis cache for ML data acceleration, and set up Grafana OpenAI dashboards.

architecture

Reference Architecture Modernization

Redesigned ServiceBus module from per-team individual modules into a unified, reusable architecture. Created protected production branches with automated tagging workflows and fixed critical storage class retain policies.

build

SRE Tooling & Automation

Engineered debug container image with curated troubleshooting tools, reducing MTTR. Built helm-diff PR scripts, AKS logshipper for audit logs, Vault troubleshooting scripts, and automated Azure Blob lifecycle management.

Instabase

Product / Site Reliability Engineer

Jun 2022 — Dec 2023

Remote

cloud

Multi-Cloud Platform Deployment

Orchestrated the Instabase platform deployment across AWS, Azure, and GCP for three major enterprise clients. Standardized the Terraform modules to support diverse environments (OpenShift vs. Native Cloud), ensuring a consistent delivery pipeline regardless of the underlying infrastructure provider.

domain 3 Enterprise Clients

savings

S3 Storage Cost Optimization

Reduced S3 storage costs by 87% through implementing lifecycle management policies, versioning rules, and archiving dead buckets. This initiative shrank the storage footprint dramatically without impacting data availability for ongoing customer operations.

trending_down 235 TB → 77 TB

notifications_active

Real-Time Alerting & EKS Upgrades

Built a Slack bot to monitor "zombie" sandboxes, identifying and decommissioning over 15 stale environments to eliminate cloud waste. Led critical EKS upgrades (1.21 → 1.23) and migrated storage classes from EBS to EFS, ensuring zero downtime for developer workflows.

delete_forever 15+ Envs Pruned

cleaning_services

Container Image & Security Cleanup

Designed a Python script to identify unused ECR images across active EKS clusters. Implemented ECR lifecycle policies for automated cleanup and addressed vulnerabilities by upgrading base images, significantly reducing the attack surface.

delete_sweep 140K → 2K Images

Citadel

Sr. Infrastructure Engineer & SME Linux

Jan 2020 — May 2022

Chicago, IL / London, UK

auto_fix_high

Automation at Scale

Automated ticket handling on Jira and ServiceNow — creating, updating, and closing over 10,000 tickets in 5 months. Engineered scripts to fix YAML config errors across hundreds of VMs, saving hundreds of hours of manual engineering time.

confirmation_number 10,000+ tickets

school

Linux SME Program & Mentorship

Enabled the team's expansion to a 24/7 "Follow-the-Sun" schedule by standardizing global knowledge. Authored 80+ Wiki pages of deep-dive troubleshooting guides and SOPs. Personally trained 7 new hires across Austin, Chicago, London, and Hong Kong—traveling internationally to ensure consistent operational standards.

public Global Training Lead

schedule

Server Hardware Lifecycle

Designed a comprehensive lifecycle tracking system that interfaced with datacenter inventory APIs. This automation proactively flagged warranty expirations and end-of-life assets, eliminating manual spreadsheet toil and ensuring 100% compliance with hardware refresh policies for the global fleet.

timer 90% time saved

rule_folder

K8s Rollout & Config Hygiene

Served as the Operations liaison for the "Trailblazer" Kubernetes deployment. Developed automated sanitation scripts to enforce naming conventions and data sovereignty—programmatically detecting and moving misfiled server YAMLs (e.g., relocating JPN assets from NYC folders) to their correct datacenter paths.

support_agent Trailblazer L1 Support

Citadel

Infrastructure Operations Developer & Intern

Jun 2016 — Dec 2019

Chicago, IL / New York, NY

dashboard

Luna — Infra Management Platform

Built the central tooling hub adopted by all infrastructure teams (Windows, Network, Storage). Features included a firm-wide "Server Lookup" (locating specific racks/cabinets) and Zabbix API integration to automate maintenance windows, preventing false alerts during patching.

groups 100% Team Adoption

network_check

Network Monitoring & Datacenter Ops

Developed a custom Python application monitoring over 500 network devices (Cisco, Palo Alto, Arista, Meraki). The tool used REST APIs to poll device health in real-time, providing a unified dashboard for internet circuits, VPN tunnels, and telecom status across global offices.

router 500+ Devices

lan

Automated Incident Response

Scripted an automated BGP peering check that detected circuit outages instantly. The system auto-created ServiceNow tickets and emailed ISPs with the exact Circuit ID and downtime logs, drastically reducing Mean Time to Detect (MTTD) and eliminating manual vendor triage.

bolt Auto-Vendor Ticketing

nightlight

Night Shift Operations

Managed the entire global infrastructure solo for 12 months, ensuring uptime for critical trading systems during off-hours. Following this tenure, authored the night-shift training curriculum and mentored two junior engineers to expand the rotation into a sustainable team.

person Sole Operator (1 Year)

code Expertise

Technical skills

Infrastructure & Cloud

KubernetesDockerHelmTerraformTerragruntAWSAzureGCPOpenShift

Observability & DevOps

PrometheusGrafanaJaegerKibanaElasticsearchArgoCDHashicorp VaultFluentbitKiali

Languages & Frameworks

PythonBashGoJavaScriptFlaskREST APIsHTML5 / CSS3C++

Networking & Linux

IstioBGP / OSPFCisco SwitchesPalo Alto FirewallsLinux AdminRed HatSplunk

AI / ML Infrastructure

GPU ManagementOpenAI / GPT-4oDocument IntelligenceRedis CacheMIG ProfilesAzure Cognitive Services

CI/CD & Automation

GitHub ActionsJiraServiceNowSnykECR LifecycleGitOps

school Education

Academic background

Master of Science in Computer Science

Stevens Institute of Technology, Hoboken, NJ

GPA: 3.59 — December 2016

Bachelor of Engineering in Computer Engineering

University of Mumbai, India

August 2014

record_voice_over Knowledge Sharing

Community & Teaching

Technical Instructor

school

Launched and published highly-rated courses focused on AWS, Python, and cloud automation fundamentals. Building a global community of engineers by breaking down complex infrastructure concepts into accessible, production-ready workflows.

groups 84,000+ Students 900+ Reviews 5 Published Courses

VishwanathKamble

Engineering projects

Experience in depth

Technical skills

Academic background

Community & Teaching

Vishwanath
Kamble