CAE Defense & Security is looking to solve complex problems using data-driven approaches and deploy scalable machine learning solutions in production environments, specifically focusing on designing scalable NLP systems powered by state-of-the-art transformer models, optimizing inference performance, and integrating LLMs into real-world products to enhance user experience and business outcomes.
Requirements
- Proficiency in ML frameworks (PyTorch, TensorFlow, Scikit-learn, CUDA).
- Good understanding of distributed systems, understanding of microservice architecture and REST APIs.
- Strong understanding of MLOps tools and practices (MLflow, Airflow, DVC).
- Hands-on experience with Hugging Face Transformers, LangChain, and OpenAI APIs.
- Technology proficiency with cloud platforms (AWS, GCP, Azure), Linux, and container orchestration (Docker, Kubernetes).
- Proven track record of deploying ML models in production environments.
- Experience in working with SQL/NoSQL database systems such as MySQL, MongoDB or Elasticsearch.
Responsibilities
- Design, develop, and deploy machine learning models for real-world applications.
- Build scalable data pipelines and model training workflows using modern tools and frameworks.
- Conduct rigorous model evaluation, validation, and performance tuning.
- Monitor and maintain deployed models, ensuring reliability and accuracy over time.
- Design, fine-tune, and deploy LLMs (LLaMA, Mistral, etc.) for various NLP tasks such as summarization, question answering, semantic search, and chatbots.
- Develop scalable and efficient model serving infrastructure using tools like ONNX, TensorRT, DeepSpeed, or vLLM.
- Implement retrieval-augmented generation (RAG) pipelines using vector databases (e.g., FAISS, Weaviate, Pinecone, Milvus).
Other
- This position is onsite with locations in Tampa FL, Arlington TX, or Orlando FL.
- Due to U.S. Government contract requirements, only U.S. citizens are eligible for this role.
- Incumbent must be eligible for DoD Personal Security Clearance.
- Must comply with all company security and data protection / usage policies and procedures.
- Must be able to work overtime on and off-shifts as required.