Operationalize and scale machine learning infrastructure.
Requirements
- Hands-on experience with Terraform and/or AWS CloudFormation.
- Strong experience with AWS services, particularly those related to data and ML pipelines.
- Proficient in Python and Shell scripting.
- Familiarity with GitHub Actions and automation of CI/CD pipelines.
Responsibilities
- Design, build, and maintain cloud infrastructure using Terraform and CloudFormation.
- Work with a variety of AWS services including (but not limited to) AWS Glue, Lambda, Step Functions, ECS, EKS, EC2, CloudWatch.
- Support integration with additional services such as IAM, SageMaker, and EMR.
- Develop and maintain Python and Shell scripts to automate workflows and infrastructure tasks.
- Implement CI/CD pipelines and automation using GitHub Actions.
- Ensure monitoring, alerting, and reliability best practices are in place for ML systems.
Other
- Collaborate with application and ML teams to enable seamless deployment and operation of ML workloads.
- Follow Agile development practices and participate in sprint planning, stand-ups, and retrospectives.
- Experience working directly with application or engineering teams.
- Comfortable operating in an Agile environment.
- job type: Contract
- work hours: 8am to 5pm