IRS Advanced Analytics Program (AAP) needs to build, optimize, and maintain data pipelines and feature engineering workflows to support AI/ML model training, deployment, and monitoring for mission teams.
Requirements
- 5+ years of data engineering experience with AI/ML-focused projects.
- Hands-on expertise with Databricks, Spark, Delta Lake, and MLflow in the context of AI/ML pipelines.
- Proficiency in Python, SQL, and data transformation frameworks.
- Experience delivering feature engineering and data prep for model development and operationalization.
- Familiarity with ETL orchestration tools (Airflow, Databricks Workflows, or similar).
- Knowledge of CI/CD integration for data pipelines (Terraform, Git-based workflows).
- Awareness of AI/ML lifecycle data needs (training, validation, inference, retraining).
Responsibilities
- Design, build, and maintain data pipelines in Databricks (Spark, Delta Lake, MLflow) specifically tailored for AI/ML and GenAI use cases.
- Implement data ingestion, transformation, and feature engineering workflows that feed model training and inference processes.
- Collaborate with mission data scientists to ensure datasets are optimized for model development and experimentation.
- Integrate pipelines into CI/CD workflows for automated, repeatable, and compliant model operations.
- Optimize data workflows for performance, scalability, and cost-efficiency across multi-tenant workloads.
- Apply governance and security controls (Unity Catalog, IAM, audit logging) to protect sensitive IRS data.
- Support data validation, schema enforcement, and quality checks to ensure reliable model outcomes.
Other
- Must be a U.S. Citizen with the ability to obtain and maintain a Public Trust security clearance.
- Strong problem-solving and collaboration skills with architects, MLOps engineers, and mission data scientists.