Full-time

Lead Site Reliability Engineer

Posted by EPAM Systems, Inc. • June 05, 2026

📍 desde casa, desde casa, Mexico
Apply Now

Description

Join our team as a **Lead Site Reliability Engineer** dedicated to providing advanced support for critical Azure-based systems.
**Responsibilities**
- Resolve complex incidents to ensure system availability
- Maintain reliability and performance of Azure-based enterprise infrastructure
- Deploy observability, monitoring, and logging tools
- Automate infrastructure management with Terraform and scripting technologies
- Improve system performance and uptime through centralized monitoring
- Collaborate with multiple teams to enhance service reliability
- Perform root cause analysis and oversee postmortems for incidents
- Configure deployment pipelines in Azure DevOps for secure workflows
- Write and maintain automation scripts for incident recovery and recurring tasks
- Enhance monitoring frameworks with platforms like Prometheus and Grafana
- Respond promptly to incidents to meet SLA expectations
- Facilitate integration of monitoring data from Azur...

Ready to Seal the Deal?

Submit your application today and take the next step in your career with EPAM Systems, Inc..

Apply for this Job