Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Top Talent HQ Logo

AI / Machine Learning Engineer

Top Talent HQ

Salary not specified
Aug 26, 2025
Seattle, WA, US
Apply Now

Design, build, deploy, and maintain the robust and scalable infrastructure that powers cutting-edge artificial intelligence (AI) and machine learning (ML) initiatives.

Requirements

  • Strong programming skills in Python, C++, Go, or Rust for systems development and automation.
  • Ability to design end-to-end systems that balance performance, reliability, security, and cost.
  • Hands-on experience with ML training frameworks (PyTorch, TensorFlow, JAX) at scale.
  • Knowledge of hardware-level optimization: CUDA, ROCm, kernel bypass, FPGA/ASIC integration.
  • Experience with Heterogeneous Computing for AI, Bigdata, HPC.
  • Open-source contributions or patents in the ML systems space.

Responsibilities

  • Lead end-to-end design of scalable, reliable AI infrastructure (AI accelerators, compute clusters, storage, networking) for training and serving large ML workloads.
  • Define and implement service-oriented, containerized architectures (Kubernetes, VM frameworks, unikernels) optimized for ML performance and security.
  • Profile and optimize every layer of the ML stack—ML Compiler, GPU/TPU scheduling, NCCL/RDMA networking, data preprocessing, and training/inference frameworks.
  • Develop low-overhead telemetry and benchmarking frameworks to identify and eliminate bottlenecks in distributed training and serving.
  • Build and operate large-scale deployment and orchestration systems that auto-scale across multiple data centers (on-premises and cloud).
  • Architect and implement robust ETL and data ingestion pipelines (Spark/Beam/Dask/Flume) tailored for petabyte-scale ML datasets.
  • Integrate experiment management and workflow orchestration tools (Airflow, Kubeflow, Metaflow) to streamline research-to-production.

Other

  • Master's degree (PhD's degree is preferred) in Computer Science, Engineering, or a related technical field.
  • 5+ years in infrastructure or systems engineering focused roles, with at least 2 years focused on ML/AI infrastructure.
  • Excellent communicator able to bridge research and production teams.
  • Strong problem-solving aptitude and a drive to push the state of the art in ML infrastructure.
  • Publications in top tier ML or System Conferences such as MLSys, ICML, ICLR, KDD, NeurIPS (Preferred)