Roblox is looking to solve the technical challenges of scaling its platform to achieve 1 billion daily active users, with a focus on reliability, performance, and efficiency.
Requirements
- Experience with building software and tools and getting them adopted
- Experience with systems and a focus on code needing to be deeply reliable
- Prior experience developing, deploying and maintaining LLM-based agents or RAG systems in production is a plus
- Experience writing common programming languages (Go, C, Java…)
Responsibilities
- Create software and libraries that promote fault-tolerance and resilience
- Design and develop frameworks and tools to support performance testing, chaos experimentation, and improve infrastructure resiliency.
- Develop and implement performance monitoring and observability services to proactively identify and understand infrastructure issues and platform degradations.
Other
- BS degree (or equivalent professional experience) in Computer Science or related engineering field with at least 3-4 years of experience
- Ability to work onsite Tuesday, Wednesday, and Thursday, with optional presence on Monday and Friday
- Self-organized and able to overcome emergent issues and contribute to long-running projects as a part of the team
- Experience working in sprints, breaking down complex tasks into milestones, and reporting status to keep project scheduling accurate
- Ability to work with partners and processes with curiosity and seek to understand a problem deeply before starting to code