Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Machine Learning Engineer - LLM Verification & Testing

Jaxon.AI

Salary not specified

Nov 6, 2025

Remote, US

At Jaxon, we’re focused on making AI trustworthy for mission-critical environments. Our core technology is built to enable the safe deployment of AI in high-stakes settings, particularly across the Department of Defense. This role is dedicated to the defense side, solidifying the U.S. government's ability to use AI reliably and advancing machine learning capabilities to meet the rigor and reliability required for national security operations.

Requirements

Fluency in Python is essential.
Python (core and advanced), enterprise software development practices (modular design, testing frameworks such as pytest/unittest, CI/CD integration).
NumPy, Pandas, Scikit-Learn, PyTorch, TensorFlow, Hugging Face Transformers; strong experience in data manipulation, feature engineering, and ML pipeline construction.
LangChain, Langraph, LlamaIndex, and related libraries for building AI-assisted applications.
Docker, Docker Compose, Kubernetes, container orchestration, cloud deployment (AWS, GCP, or Azure), monitoring/logging frameworks (Prometheus, ELK, etc.).
Hands-on experience running local LLMs using Ollama and Hugging Face Transformers; building, training, and fine-tuning models for enterprise use cases.
Strong foundation in both fundamentals (tokenization, embeddings, sequence modeling) and applied approaches (question answering, summarization, RAG, fine-tuning).

Responsibilities

designing and running rigorous verification, validation, and test pipelines for LLM systems — including unit/integration tests for data pipelines and models, automated evaluation suites, adversarial and regression testing, and acceptance criteria used to certify models for high-assurance/defense deployments.
Conduct comprehensive data management including preprocessing, feature engineering, and model evaluation to improve accuracy and efficiency.
Demonstrate solid machine learning engineering experience, particularly with NLP applications and unstructured data in a cloud environment.
Familiar with the model deployment process, optimization of LLM parameters for specific behaviors, and a general understanding of LLM functionality and use cases.
Understanding of metrics and measurements of models to assess performance and apply rigorous evaluation practices to ensure reliability and effectiveness.
Building reproducible ML pipelines, model packaging/deployment, version control for ML artifacts (e.g., MLflow, DVC), integration with CI/CD workflows.
Practical experience building test harnesses and CI for ML (automated model evaluation suites, reproducible test datasets, A/B/canary testing, fuzzing/adversarial test generation, and metric-based acceptance gates for deployment).

Other

Collaborate Across Teams: Work with cross-functional, geographically distributed teams to integrate ML models into our existing systems and workflows, enhancing product capabilities.
Previous experience with AI/LLM governance and guardrail development is a big plus.
Delivery of ML or software systems in Defense or other highly regulated domains, including direct interaction with government customers (e.g., service labs, joint commands).
Applicants should have a bachelor's degree in Computer Science or Software Engineering with a minimum 2 years of experience in applicable roles.
We are looking for somebody who thrives in an environment where day-to-day priorities and tasks may rapidly change. The role is not a fit if you value "business as usual".