Platform Architect with 12+ years of experience designing cloud-native and edge-ready platforms that balance scalability, security, and developer velocity. Proven success leading architecture for multi-cluster Kubernetes systems serving 20K+ edge devices with 99.9% uptime. Expert in GitOps, observability, and infrastructure automation. Collaborates across product, security, and operations to align platform evolution with business strategy. Focused on building adaptive, business-aligned platforms that reduce costs, accelerate delivery, and ensure reliability at scale.

Leadership & Architecture Highlights

  • Architected multi-cluster Kubernetes platform supporting 50+ microservices and 20K+ edge devices with 99.9% uptime-establishing company-wide standards for reliability, IaC, and developer experience
  • Involved in the architectural design and PoC of air-gapped and hybrid Kubernetes solutions for regulated industries (government, defense, healthcare), contributing to $2M+ in new business opportunities
  • Reduced mean time to recovery (MTTR) by 80% by defining the SRE roadmap and standardizing observability practices
  • Cut infrastructure costs by 20% ($250K+ annually) through right-sizing, autoscaling with Karpenter, and Spot instance adoption
  • Built a self-service developer platform that reduced operational tickets by 60% and accelerated delivery via automated provisioning
  • Mentored 10+ engineers and influenced strategic architectural decisions through design reviews and platform vision alignment

Notable Projects & Initiatives

Disaster Recovery & Data Resilience Initiative (2024-Present)

Designed and implemented comprehensive disaster recovery strategy for 5 critical data stores (PostgreSQL, Redis, Kafka, Elasticsearch, S3) with RPO <30min and RTO <2h, enabling business continuity for production IoT platform.

Multi-Cluster Migration Project (2022-2023)

Architected and executed migration of 20+ customer clusters from legacy infrastructure to modern Kubernetes platform. Designed comprehensive testing strategy, runbooks, and rollback procedures ensuring 100% successful migrations and data consistency.

Infrastructure Load Testing Framework (2021-2022)

Designed and implemented automated load testing framework supporting both performance benchmarking (10,000+ RPS) and chaos engineering (fuzzy testing). Framework validated platform scalability and identified bottlenecks before production impact.

High-Load Live Media Streaming Platform (2019-2021)

Led cross-functional team of 12 engineers (backend, frontend, DevOps) as Solution Architect and Team Lead, designing real-time media platform handling 10,000+ RPS with GPU-accelerated live video transcoding. Engineered secure content transition architecture (private VPC -> DMZ) and implemented content consistency verification algorithms ensuring data integrity across distributed storage.

Experience

Platform Architect / SRE Tech Lead
2021-Present
ScienceSoft
  • Led architecture and evolution of multi-cluster Kubernetes platform serving 20K+ edge devices in IoT industry (10,000+ RPS, 200+ microservices, 99.9% uptime) with SOC2 and GDPR compliance
  • Implemented GitOps workflows with ArgoCD and Policy as Code governance, increasing deployment frequency from weekly to 50+ deploys/day while reducing deployment failures
  • Designed air-gapped Kubernetes architecture and hybrid cloud solutions for regulated industries (government, defense, healthcare), enabling $2M+ in qualified sales pipeline through secure offline deployment capabilities
  • Partnered with security and compliance teams to establish security scanning pipelines with Trivy and Snyk, secrets management with HashiCorp Vault, and FinOps practices using Kubecost for cost visibility
  • Optimized infrastructure spend by 20% ($250K+ annually) through t-shirt sizing methodology, Karpenter autoscaling, Spot instances adoption, and right-sizing resource allocation strategies
  • Built self-service developer platform reducing ops tickets by 60% through automated provisioning and standardized tooling
Platform & SRE: Kubernetes, Helm, ArgoCD, Terraform, Prometheus, Grafana, Loki, Jaeger, Istio
Backend: Go, Python, PostgreSQL, Redis, Kafka
Cloud: AWS-first (EKS, EC2, S3, RDS, Lambda) with Azure and GCP experience
Security & FinOps: HashiCorp Vault, Trivy, Snyk, Kubecost, Karpenter, Spot instances
DevOps Lead / Platform Engineer
2019-2021
ScienceSoft
  • Led cross-functional team of 12 engineers (backend, frontend, DevOps) designing real-time media streaming platform for news industry handling 10,000+ RPS with GPU-accelerated live video transcoding
  • Architected secure content delivery system with private VPC to DMZ transition and content consistency verification algorithms across distributed storage
  • Led migration from monolithic infrastructure to containerized microservices, reducing deployment time by 80% while maintaining full-stack visibility from frontend to data layer
  • Implemented CI/CD pipelines with Jenkins and GitLab CI, establishing IaC practices managing 100+ cloud resources
Infrastructure: Docker, Kubernetes, Terraform, Ansible, Nginx, HAProxy, Linkerd
CI/CD: Jenkins, GitLab CI, Bash scripting
Monitoring: Prometheus, Grafana, ELK Stack
Cloud: AWS (EC2, ECS, CloudFront, RDS)
DevOps / Backend Engineer
2013-2019
Various Projects

Full-stack development and DevOps across backend, frontend, and infrastructure. Built ETL pipelines processing 1M+ records/day, monitoring systems for 200+ services, and HIPAA-compliant infrastructure with Docker, Python, and Elasticsearch.

Tech: Python, PHP, Docker, Jenkins, Elasticsearch, RabbitMQ, Redis, MySQL, PostgreSQL, Ansible

Technical Expertise

Platform Architecture & Cloud

AWS-first architectures (EKS, Lambda, S3, RDS) · Azure & GCP experience · Multi-cluster Kubernetes · Hybrid cloud & air-gapped solutions · Infrastructure as Code · Service mesh architecture · SOC2 & GDPR compliance

Platform Engineering & DevOps

Kubernetes · Docker · Helm · Terraform · Ansible · GitOps · ArgoCD · CI/CD pipelines · Service mesh (Istio, Linkerd) · Developer platforms · Self-service tooling

SRE & Observability

SLI/SLO/SLA frameworks · Error budgets · Incident management · Prometheus · Grafana · Loki · Jaeger · Distributed tracing · Chaos engineering · Disaster recovery

Architecture Patterns & Practices

GitOps · Policy as Code · Infrastructure as Code · Event-driven architecture · Microservices patterns · Cloud-native design · Zero-trust security

Security & FinOps

HashiCorp Vault · Trivy · Snyk · Container security · Secrets management · Kubecost · Karpenter · AWS Spot instances · Cost optimization · Resource rightsizing

Languages & Data

Go · Python · Bash · YAML · HCL · PostgreSQL · Redis · Kafka · RabbitMQ · Elasticsearch