Cerebras Systems is looking to build the core backend services and APIs that power the Inference platform to allow customers to seamlessly deploy, manage, and serve inference workloads on dedicated Cerebras hardware.
Requirements
- Strong proficiency in Python (C++ is good to have)
- Experience designing, building, and integrating with RESTful APIs and gRPC services
- Solid understanding of distributed systems concepts such as concurrency, scalability, and fault tolerance
- Hands-on experience with containerization (Docker) and orchestration frameworks (Kubernetes)
- Experience with databases and caching systems (e.g., Postgres, Redis)
- Experience with observability, telemetry pipelines, and system monitoring best practices
Responsibilities
- Design, build, and maintain the core APIs for the Inference Platform, handling model catalog management, deployment of ML workloads, scaling, and status monitoring
- Focus on building platform capabilities that optimize for ease-of-use, robustness, and self-service access to inference models and serving
- Collaborate with infrastructure and ML engineering teams to ensure high reliability, uptime, and smooth user interactions with the inference service
- Design and implement features like multi-tenant support, deployment automation, priority queuing, and caching strategies for user requests
- Build robust observability features by integrating with monitoring and telemetry tools (e.g., Prometheus, Grafana) to track system health, performance metrics, and request analytics
Other
- Bachelor’s or Master's degree in computer science or related field, or equivalent practical experience
- 5+ years of experience in backend software development, with a focus on service APIs, orchestration platforms, or user-facing infrastructure
- Strong problem-solving and debugging abilities
- Excellent communication and cross-functional collaboration skills