Full-time

Site Reliability Engineer

Posted by RCS TECH • June 04, 2026

📍 mexico, mexico, Mexico
Apply Now

Description

What You’ll Do

Reliability & Operations

- Own availability, latency, and scalability across SaaS and AI systems

- Define and enforce SLOs, SLIs, and error budgets

- Participate in a global on-call rotation (~1 week every 4 weeks)

- Lead incident response and drive blameless postmortems with systemic fixes

Platform & Infrastructure

- Architect and operate on-premise and multi-region, multi-cloud environments

- Manage large-scale Kubernetes workloads

- Build and evolve infrastructure using Terraform and Ansible

- Improve system resilience, fault isolation, and capacity planning

AI/ML & Automation

- Build and scale agentic AI systems for triage, anomaly detection, and self-healing

- Ensure reliability of model serving infrastructure

- Operate, optimize and scale distributed systems

What You Bring ...

Ready to Seal the Deal?

Submit your application today and take the next step in your career with RCS TECH.

Apply for this Job