DataDirect Networks (DDN) is seeking to optimize training, inference, and Retrieval-Augmented Generation (RAG) pipelines for high-performance AI applications
Requirements
Proven expertise in building and scaling AI/ML pipelines
Strong understanding of machine learning frameworks and libraries (TensorFlow, PyTorch, NVIDIA NeMo, vLLM, TensorRT-LLM)
Experience in deploying open-source vector databases at scale
Solid understanding of cloud infrastructure (AWS, GCP, Azure) and distributed computing
Proficiency with containerization tools (Docker, Kubernetes) and infrastructure as code
Implementation-level understanding of ML frameworks, data loaders and data formats
Experience with scaling RAG pipelines and integrating them with generative AI models
Responsibilities
Design and implement integration of data ingestion and streaming pipelines with open-source tools, like Ray Data, Mosaic Streaming, Tf.data, Torch Dataloader
Design of optimization for training like asynchronous checkpointing, and inference, like K-V caching and LORAX
Guide the integration of MLFlow with DDN’s Infinia product for comprehensive experiment tracking, model versioning, and deployment
Drive the implementation and scaling of Retrieval-Augmented Generation (RAG) pipelines to enhance generative model performance
Stay abreast of the latest developments in AIOps, AI frameworks, optimization, and accelerated execution
Identify and implement solutions to optimize training and inference pipeline performance, runtime, and resource utilization on Infinia
Other
Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or related fields
5+ years of experience in machine learning operations (MLOps) or related roles
Excellent problem-solving and troubleshooting skills, with attention to detail and performance optimization
Strong communication and collaboration skills
Participation in an on-call rotation to provide after-hours support as needed