Advance the Department of Defense’s mission to keep our country safe and secure by designing, building, and operating data pipelines that ingest, store, and process high-volume, multi-source data primarily for modern AI/ML processes.
Requirements
- Experience with Apache Airflow for workflow orchestration.
- Strong programming skills in Python.
- Experience with ElasticSearch/OpenSearch for data indexing and search functionalities.
- Understanding of vector databases, embedding models, and vector search for AI applications.
- Expertise in event-driven architecture and microservices development.
- Hands-on experience with cloud services (e.g. MinIO), including data storage and compute resources.
- Strong understanding of data pipeline orchestration and workflow automation.
Responsibilities
- Design, develop, and implement scalable data pipelines and ETL processes using Apache Airflow, with a focus on data for AI applications.
- Develop messaging solutions utilizing Kafka to support real-time data streaming and event-driven architectures.
- Build and maintain high-performance data retrieval solutions using ElasticSearch/OpenSearch.
- Implement and optimize Python-based data processing solutions.
- Integrate batch and streaming data processing techniques to enhance data availability and accessibility.
- Ensure adherence to security and compliance requirements when working with classified data.
- Deploy and manage cloud-based infrastructure to support scalable and resilient data solutions.
Other
- An active TS/SCI security clearance is REQUIRED, and candidates must have or be willing to obtain a CI Poly.
- Due to US Government Contract Requirements, only US Citizens are eligible for this role.
- Travel Required: Less than 10%
- 5 + years of related experience* may vary based on technical training, certification(s), *or * degree
- Work closely with cross-functional teams to define data strategies and develop technical solutions aligned with mission objectives.