Toma is looking to bridge the gap between agentic innovation and its use in the real world, especially in underserved industries like automotive and healthcare, by providing a customer-centric platform to deploy and monitor AI agents
Requirements
- 3+ years of experience in platform/infrastructure engineering
- Strong background in system design, operating systems, and distributed systems
- Deep expertise with AWS services (ECS/EKS, IAM, VPC, S3, RDS)
- Experience with containerization and orchestration (Docker, Kubernetes)
- Track record of building and maintaining production ML/LLM infrastructure
- Experience with observability tools (Prometheus, Grafana, ELK stack)
- Solid understanding of TypeScript
Responsibilities
- Owning our entire infrastructure (monorepo with 10+ microservices on AWS and Porter)
- Building and maintaining our ML/LLM model deployment pipeline and serving infrastructure
- Owning our incident response process, including on-call rotations and alerting systems
- Designing and implementing observability solutions across our stack
- Partnering with other engineers to improve latency and performance in our realtime systems
- Communicating with external vendors regarding cloud offerings and APIs
- Upholding compliance endeavors (SOC 2 + GDPR + ISO 27001 in progress)
Other
- Mentor engineers, collaborate closely with product and design
- Competitive salary with meaningful equity
- Free health, dental, and vision insurance
- Unlimited PTO
- Weekly team outings and customer visits