The financial firm is looking to build and enhance AI/ML infrastructure and application patterns that power mission-critical applications, ensuring high availability, durability, and resiliency.
Requirements
- GenAI, MLOps, Agentic AI, AWS, Bedrock, Sagemaker, LLM
- Experience with infrastructure automation tools such as Puppet, Ansible, CloudFormation, or Terraform.
- Working knowledge of pipeline-automation tools such as Jenkins, CodePipeline, Azure DevOps, or other comparable tools.
- Experience using Git for source control management.
- Ability to proficiently write code in Python, Node.js, Bash (shell), PowerShell, or other similar languages.
- Experience using Docker within container orchestration platforms such as AWS ECS, EKS, Google Anthos, or others.
- Understanding of foundational AWS services such as VPCs, EC2, S3, RDS, Auto Scaling Groups, CloudWatch Logs, etc.
Responsibilities
- Focus on optimizing existing systems, building infrastructure, and eliminating work through automation.
- Peer-reviewing infrastructure-as-code (AWS CloudFormation, Python, Terraform, or similar).
- Deployment and troubleshooting of infrastructure code.
- Identify opportunities to build self-service capabilities and automate infrastructure and application deployments.
- Develop tools and best practices for platform development, developer productivity, automation (MLOps, CI/CD, A/B testing), and production operations.
- Design, Develop & deliver critical components, frameworks, services, and products using AWS SageMaker, Bedrock, Lambda, and container technologies in AWS.
- Develop processes, model monitoring, and governance framework for successful ML model operationalization.
Other
- Remote, contract to hire position
- Candidate should have at least one AWS certification.
- Terraform, Cloudformation are a big plus.
- Experience in working in an Agile/Scrum-focused organization.
- Strong verbal and written communication skills; comfortable with translating technical problems to non-technical audiences.