Description
Join our team as a **Senior Site Reliability Engineer** focused on delivering advanced support for critical Azure-based systems.
**Responsibilities**
- Troubleshoot and resolve complex incidents to maintain system uptime
- Ensure reliability and performance of Azure-based enterprise infrastructure
- Implement observability, monitoring, and logging solutions
- Automate infrastructure provisioning and deployment using Terraform and scripting
- Optimize system performance and uptime through proactive monitoring and alerting
- Collaborate with cross-functional teams to improve service reliability
- Conduct root cause analysis and postmortems for incident management
- Manage deployment pipelines in Azure DevOps for secure and scalable workflows
- Develop and maintain automation scripts for routine tasks and incident recovery
- Enhance monitoring frameworks with tools like Prometheus and Grafana
- React quickly to incidents to avoid SLA degradation
- In...
**Responsibilities**
- Troubleshoot and resolve complex incidents to maintain system uptime
- Ensure reliability and performance of Azure-based enterprise infrastructure
- Implement observability, monitoring, and logging solutions
- Automate infrastructure provisioning and deployment using Terraform and scripting
- Optimize system performance and uptime through proactive monitoring and alerting
- Collaborate with cross-functional teams to improve service reliability
- Conduct root cause analysis and postmortems for incident management
- Manage deployment pipelines in Azure DevOps for secure and scalable workflows
- Develop and maintain automation scripts for routine tasks and incident recovery
- Enhance monitoring frameworks with tools like Prometheus and Grafana
- React quickly to incidents to avoid SLA degradation
- In...
Ready to Seal the Deal?
Submit your application today and take the next step in your career with EPAM Systems, Inc..
Apply for this Job