Description
Locations
Remote, United States
Overview
At GitHub, we’re building the next generation of AI‑powered developer experiences. We’re looking for a Staff Applied Researcher with deep expertise in Large Language Model (LLM) evaluation, LLM agents, strong engineering instincts, and a bias for action to help shape the future of GitHub Copilot and our AI platform.
This is a high‑impact role where you will design evaluation systems that directly influence how millions of developers experience AI every day.
Responsibilities
- Lead Model Quality & Evaluation
- Design next‑generation evaluation frameworks for code generation, reasoning, safety, multimodal tasks, and agentic workflows.
- Develop scalable automatic metrics, LLM‑judge systems, reward models, and human‑in‑the‑loop evaluation pipelines.
- Establish high‑signal, repeatable methodologies that influence product decisions across GitHub AI.
Ready to Seal the Deal?
Submit your application today and take the next step in your career with Kubelt.
Apply for this Job