DPR is looking to solve the technical direction of its AI initiatives by designing and implementing scalable, cloud-native solutions to meet the growing needs of its Data and AI team.
Requirements
- Strong understanding of cloud infrastructure and experience working with at least one major cloud provider
- Excellent troubleshooting and debugging skills, with a focus on data integrity and system optimization
- Proficiency in at least one objected-oriented programming language, preferably python with hands-on experience in ml frameworks like TensorFlow, PyTorch or Scikit-learn
- Proficiency in SQL, preferably Snowflake SQL
- Experience with Infrastructure-as-code platforms such as Terraform and Bicep
- Experience with APM and observability tools such as Azure App Insights or Datadog
- Experience with cloud infrastructure in both Azure and AWS environments
Responsibilities
- Design distributed, cloud-native, scalable architecture for data and ML pipelines
- Develop CI/CD pipelines and pipeline templates to be used across Data Engineering, AI/ML and Data Science teams
- Automate training, testing and deployment processes for machine learning models
- Develop and maintain ETL pipelines to move data in real-time/stream, on-demand, and in batch emphasizing security, reusability, and data quality
- Leverage Infrastructure-as-code platforms such as Terraform and Bicep to automate infrastructure provisioning and streamline deployments
- Implementation and management of APM and observability tools such as Azure App Insights or Datadog to monitor infrastructure, focusing on ML workloads
- Manage and maintain cloud infrastructure in both Azure and AWS environments
Other
- Bachelor’s degree in Computer Science, Data Science, Information Systems, or a related field
- 3-5 years of experience in Data Engineering, DevOps, MLOps, Software Engineering or Site Reliability Engineering
- Ability to work closely with cross-functional teams, including business stakeholders, data engineers, and technical leads
- Ability to abstract complexity and create reusable, scalable patterns that accelerate development
- Ability to contribute to preventive maintenance, technical debt reduction, and the promotion of clean code principles