Scale is looking to solve the problem of cloud efficiency, cost optimization, and hosting cost attribution for their foundational platforms and systems that power their cloud infrastructure, with a strong focus on AI infrastructure and large-scale system design.
Requirements
- Deep understanding of distributed systems, cloud cost models, and AWS services (EC2, ECS/EKS, S3, Lambda, RDS, GPU instances, etc.).
- Demonstrated experience optimizing large-scale cloud deployments for performance and cost, including autoscaling, spot instance usage, and workload scheduling.
- Proficiency with containerization and deployment technologies such as Kubernetes, Terraform, and Docker.
- Familiarity with orchestration tools like Temporal or AWS Step Functions, and with both NoSQL (MongoDB) and SQL (Postgres) databases.
- Solid grasp of software engineering best practices, CI/CD (CircleCI), and infrastructure-as-code principles.
- Experience with cloud cost management platforms (e.g., CloudHealth, Kubecost, AWS Cost Explorer).
- Experience with LLM tools/platforms and API key management (i.e., LiteLLM)
Responsibilities
- Design, implement, and optimize core platform services with a focus on reducing cloud spend and improving resource utilization across compute, storage, and network layers.
- Collaborate closely with stakeholders and internal customers to define requirements and architect scalable, cost-efficient solutions in AWS and other public cloud environments.
- Analyze system performance and usage trends to identify opportunities for cost savings through right-sizing, autoscaling, reserved instance strategies, and architectural improvements.
- Partner with product engineering and infrastructure teams to improve orchestration, observability, and automation for efficient deployment and operation of distributed systems.
- Present insights, metrics, and recommendations on cloud usage and efficiency to engineering and business stakeholders.
- Proactively drive process enhancements and introduce new tools or frameworks that improve the balance between system performance, reliability, and cost.
Other
- 3+ years of full-time engineering experience with a focus on back-end and cloud infrastructure systems.
- Proven track record of independent ownership of engineering projects that delivered measurable efficiency or cost improvements.
- Excellent communication skills with the ability to explain technical trade-offs and cost implications to both engineering and non-technical audiences.
- Familiarity with FinOps principles and cross-team cost governance.
- Experience scaling systems efficiently in high-growth startup environments.