Full-time

Site Reliability Engineer

Posted by CirrusLabs • June 05, 2026

📍 ciudad de méxico, ciudad de méxico, Mexico
Apply Now

Description

Job Title: Platform Site Reliability Engineer (SRE)

We are seeking a Platform Site Reliability Engineer (SRE) to support the reliability, observability, and day-2 operations of modern AI platform environments running performance-sensitive workloads. This role is suited for someone with hands‑on experience in production support, monitoring, alerting, incident response, Linux troubleshooting, operational automation, system software maintenance, and GPU‑enabled platform operations across infrastructure and platform layers.

The ideal candidate has experience with Prometheus, Grafana, and logging/metrics platforms, and can work across compute, platform, DevOps, storage, and network teams to improve service health, reduce alert noise, speed up incident resolution, and strengthen overall platform reliability.

Key Responsibilities

  • Support reliability and day‑2 operations for production platform environments.
  • Build and maintain monitorin...

Ready to Seal the Deal?

Submit your application today and take the next step in your career with CirrusLabs.

Apply for this Job