TAYS is seeking an MLOps Engineer to ensure that ML models can be effectively developed, deployed, managed, and monitored in Production environments for various projects within a federal agency.
Requirements
- Machine Learning
- Python
- NoSQL and Relational Databases
- DevOps
- CI/CD
- Cloud Platforms (AWS, Azure) and related ML services.
- Strong foundation in Machine Learning including understanding of concepts, algorithms, model training and frameworks (TensorFlow, PyTorch, scikit-learn).
- Strong programming skills, especially Python, and relevant libraries (scikit-Learn, TensorFlow, PyTorch, NumPy, Pandas).
- Strong understanding of DevOps principles and experience with CI/CD tools (Jenkins, GitHub Actions, Gitlab CI/CD, etc.)
- Proficiency with cloud platforms (AWS preferred) including ML services, compute, storage (S3, EFS), and networking.
- Experience with containerization (Docker) and orchestration (Kubernetes).
- Knowledge of data engineering fundamentals including understanding of data pipelines, data storage (PostgreSQL, MySQL, MongoDB), and data processing frameworks (Apache Spark).
- Familiarity with MLOps platforms and tools (e.g., Sagemaker, MLflow, Kubeflow, DataRobot).
Responsibilities
- Ensure that ML models can be effectively developed, deployed, managed, and monitored in Production environments.
- Productionize ML models – integrate trained ML models with Production systems
- Build and manage ML pipelines – design, build, and maintain automated pipelines including data ingestion, data preprocessing, model training, validation, and deployment utilizing CI/CD practices.
- Infrastructure management – set up and manage infrastructure for ML workloads utilizing cloud platforms and containerization technologies.
- Monitoring and alerting – implement monitoring systems to track performance of ML models in Production
- Automation – automate various tasks within the ML workflow to improve efficiency and reproducibility
- Performance optimization – identify ways to optimize the performance, efficiency, and scalability of ML models and their supporting infrastructure
Other
- The candidate must be local to the DMV area.
- Must be on-site five days a week in Woodlawn, MD
- Must be able to obtain and maintain a Public Trust. Contract requirement.
- Strong communication, collaboration, problem-solving, analytical, and critical thinking skills.
- Prior experience with federal or state government IT projects.