Site Reliability Engineer
Our client stands as a trailblazing force in the health and wellness domain, providing an individualized and holistic approach to well-being. They have passionately pursued a mission to shape a healthier global community, embracing individuals of diverse ages, genders, and physical capabilities
Your Role:
Oversee day-to-day alerts, system checks, and issue escalation.
Contribute to a 24x7 on-call support team for critical SaaS events.
Proactively set up monitors in the EKS/K8S ecosystem.
Deploy to EKS/K8s cluster using Terraform and Helm/Flux.
Enhance infrastructure health through implementing checks and scripts.
Collaborate with teams for optimal SaaS service and customer-centric deployments.
Conduct RCA and implement corrective actions.
Your Profile:
Extensive experience in issue processing (RCA, Postmortems).
Proficient in Kubernetes, AWS (especially EKS, EC2, RDS), and Terraform.
Familiarity with Docker, CI/CD tools, monitoring tools, and networking concepts.
Scripting skills in Python, NodeJS, or Go.
Experience with configuration management tools and version control systems.
Strong problem-solving and troubleshooting skills.
Ownership, proactiveness, and passion for maintaining large-scale platforms.
Join Us:
Embark on a dynamic journey with our forward-thinking team. We value innovation, excellence, and a collaborative spirit. Join us in shaping the future of technology and delivering unparalleled service to our customers.