Full-time

Kubernetes Reliability Engineer at Search Atlas

Posted by Search Atlas • June 05, 2026

📍 toronto, on, Canada
Apply Now

Description

Be a key player at Search Atlas, architecting Kubernetes-based platforms ensuring robust AI execution with 99.99% reliability. This role demands expertise in Terraform, ArgoCD, and high-concurrency systems.

In the role of Platform Reliability Engineer, you will focus on building and maintaining the Autonomous Nervous System for Atlas Brain. You’ll optimize ML inference pipelines, automate infrastructure processes, and design self-healing systems. The position requires an innovator who can push the boundaries of operational excellence for our autonomous marketing systems.

Key Responsibilities:
• Architect and maintain EKS/GKE-based Kubernetes platforms
• Automate infrastructure deployment with Terraform and ArgoCD
• Optimize high-concurrency crawling systems for real-time decisions
• Establish SLOs for AI execution and agent task completion
• Implement distributed monitoring solutions with OpenTelemetry and Grafana

Req...

Ready to Seal the Deal?

Submit your application today and take the next step in your career with Search Atlas.

Apply for this Job