Leadership & Architecture Highlights
- Architected multi-cluster Kubernetes platform supporting 50+ microservices and 20K+ edge devices with 99.9% uptime-establishing company-wide standards for reliability, IaC, and developer experience
- Involved in the architectural design and PoC of air-gapped and hybrid Kubernetes solutions for regulated industries (government, defense, healthcare), contributing to $2M+ in new business opportunities
- Reduced mean time to recovery (MTTR) by 80% by defining the SRE roadmap and standardizing observability practices
- Cut infrastructure costs by 20% ($250K+ annually) through right-sizing, autoscaling with Karpenter, and Spot instance adoption
- Built a self-service developer platform that reduced operational tickets by 60% and accelerated delivery via automated provisioning
- Mentored 10+ engineers and influenced strategic architectural decisions through design reviews and platform vision alignment
Notable Projects & Initiatives
Disaster Recovery & Data Resilience Initiative (2024-Present)
Designed and implemented comprehensive disaster recovery strategy for 5 critical data stores (PostgreSQL, Redis, Kafka, Elasticsearch, S3) with RPO <30min and RTO <2h, enabling business continuity for production IoT platform.
Multi-Cluster Migration Project (2022-2023)
Architected and executed migration of 20+ customer clusters from legacy infrastructure to modern Kubernetes platform. Designed comprehensive testing strategy, runbooks, and rollback procedures ensuring 100% successful migrations and data consistency.
Infrastructure Load Testing Framework (2021-2022)
Designed and implemented automated load testing framework supporting both performance benchmarking (10,000+ RPS) and chaos engineering (fuzzy testing). Framework validated platform scalability and identified bottlenecks before production impact.
High-Load Live Media Streaming Platform (2019-2021)
Led cross-functional team of 12 engineers (backend, frontend, DevOps) as Solution Architect and Team Lead, designing real-time media platform handling 10,000+ RPS with GPU-accelerated live video transcoding. Engineered secure content transition architecture (private VPC -> DMZ) and implemented content consistency verification algorithms ensuring data integrity across distributed storage.
Experience
- Led architecture and evolution of multi-cluster Kubernetes platform serving 20K+ edge devices in IoT industry (10,000+ RPS, 200+ microservices, 99.9% uptime) with SOC2 and GDPR compliance
- Implemented GitOps workflows with ArgoCD and Policy as Code governance, increasing deployment frequency from weekly to 50+ deploys/day while reducing deployment failures
- Designed air-gapped Kubernetes architecture and hybrid cloud solutions for regulated industries (government, defense, healthcare), enabling $2M+ in qualified sales pipeline through secure offline deployment capabilities
- Partnered with security and compliance teams to establish security scanning pipelines with Trivy and Snyk, secrets management with HashiCorp Vault, and FinOps practices using Kubecost for cost visibility
- Optimized infrastructure spend by 20% ($250K+ annually) through t-shirt sizing methodology, Karpenter autoscaling, Spot instances adoption, and right-sizing resource allocation strategies
- Built self-service developer platform reducing ops tickets by 60% through automated provisioning and standardized tooling
Backend: Go, Python, PostgreSQL, Redis, Kafka
Cloud: AWS-first (EKS, EC2, S3, RDS, Lambda) with Azure and GCP experience
Security & FinOps: HashiCorp Vault, Trivy, Snyk, Kubecost, Karpenter, Spot instances
- Led cross-functional team of 12 engineers (backend, frontend, DevOps) designing real-time media streaming platform for news industry handling 10,000+ RPS with GPU-accelerated live video transcoding
- Architected secure content delivery system with private VPC to DMZ transition and content consistency verification algorithms across distributed storage
- Led migration from monolithic infrastructure to containerized microservices, reducing deployment time by 80% while maintaining full-stack visibility from frontend to data layer
- Implemented CI/CD pipelines with Jenkins and GitLab CI, establishing IaC practices managing 100+ cloud resources
CI/CD: Jenkins, GitLab CI, Bash scripting
Monitoring: Prometheus, Grafana, ELK Stack
Cloud: AWS (EC2, ECS, CloudFront, RDS)
Full-stack development and DevOps across backend, frontend, and infrastructure. Built ETL pipelines processing 1M+ records/day, monitoring systems for 200+ services, and HIPAA-compliant infrastructure with Docker, Python, and Elasticsearch.