Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

NavitsPartners Logo

Staff Machine Learning Engineer - MLAV#01

NavitsPartners

Salary not specified
Dec 16, 2025
Palo Alto, CA, US
Apply Now

Building privacy-preserving large language model (LLM) capabilities to support hardware design workflows involving Verilog/SystemVerilog and RTL artifacts, enabling advanced use cases such as code generation and refactoring, lint explanation, constraint translation, and spec-to-RTL assistance, while operating within strict enterprise security and data-privacy boundaries.

Requirements

  • 3+ years working hands-on with transformers or LLMs
  • Deep expertise with PyTorch, Hugging Face (Transformers, PEFT, TRL), and distributed training frameworks (DeepSpeed, FSDP).
  • Experience with quantization-aware fine-tuning, constrained decoding, and evaluation of code-generation models.
  • Amazon Bedrock (model usage, customization, Guardrails, runtime APIs, VPC endpoints)
  • SageMaker (Training, Inference, Pipelines)
  • Core services: S3, EC2/EKS, IAM, KMS, VPC, CloudWatch, CloudTrail, Secrets Manager
  • Solid software engineering fundamentals: testing, CI/CD, observability, and performance optimization.

Responsibilities

  • Own the technical roadmap for RTL-focused LLM capabilities, from model selection and fine-tuning through deployment and continuous improvement.
  • Fine-tune and customize transformer models using modern techniques such as LoRA/QLoRA, PEFT, instruction tuning, and preference optimization (RLAIF).
  • Design and operate HDL-aware evaluation frameworks, including: Compile, lint, and simulation pass rates, Pass@k metrics for code generation, Constrained/grammar-guided decoding, Synthesis-readiness checks
  • Build and maintain secure, privacy-first ML pipelines on AWS, including: Amazon Bedrock for managed foundation models, SageMaker and/or EKS for bespoke training and inference, Encrypted storage (S3 + KMS), private VPCs, IAM least privilege, CloudTrail auditing
  • Deploy and operate low-latency, production inference using Bedrock and/or self-hosted stacks (vLLM, TensorRT-LLM), with autoscaling and safe rollout strategies.
  • Establish a strong evaluation and MLOps culture with automated regression testing, experiment tracking, and model documentation.
  • Drive product integration with internal developer tools, CI workflows, IDE plug-ins, retrieval-augmented generation (RAG), and safe tool-use.

Other

  • Staff-level role will provide technical leadership
  • Lead and mentor a small team of applied ML engineers and scientists; review designs and code, remove technical blockers, and drive execution.
  • Partner with hardware engineering, EDA, security, and legal stakeholders to ensure compliant data sourcing, anonymization, and governance.
  • Mentor engineers on LLM best practices, reproducible experimentation, and secure system design.
  • Proven experience shipping LLM-powered features to production and leading cross-functional technical initiatives.
  • Excellent communication skills and the ability to influence both technical and executive stakeholders.