Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Siemens Energy Logo

Senior AI Engineer - Foundation Model Training & Infrastructure for Power Grids (US)

Siemens Energy

Salary not specified
Sep 19, 2025
Remote, US
Apply Now

Siemens Energy, Inc. is looking to build and optimize the end-to-end systems, data pipelines, and training processes for training foundation models for power grid applications to enable the rapid development and deployment of transformational AI solutions.

Requirements

  • 5 or more years in a Data & AI (Artificial Intelligence) Engineer or Machine Learning Engineer, focusing on building and optimizing infrastructure for large-scale machine learning systems.
  • Deep practical expertise with AI frameworks (PyTorch, Jax, Pytorch Lightning, etc.), large-scale multi-node GPU training, and optimization strategies for large foundation models on distributed compute infrastructure.
  • Excellent problem-solving, debugging, and performance optimization skills, with a data-driven approach to identifying and resolving technical challenges.
  • Strong communication and teamwork skills, experience with MLOps best practices for model tracking, evaluation, and deployment.
  • Public GitHub profile with a track record of open-source contributions to data engineering or deep learning infrastructure projects
  • experience writing CUDA/Triton/CUTLASS kernels
  • proficiency with performance monitoring and profiling tools for distributed training and data pipelines.

Responsibilities

  • Designing, building, and optimizing all aspects of large-scale training and fine-tuning, from dataloading to inference, to maximize Model Flop Utilization (MFU) on large compute clusters.
  • Working closely and proactively with research scientists to translate models and algorithms into high-performance, production-ready code, integrating and testing the latest advancements.
  • Relentlessly profiling and resolving training performance bottlenecks, optimizing the entire training stack for speed and efficiency.
  • Contributing to the technology evaluations and selection of hardware, software, and cloud services for the AI infrastructure platform.
  • Using MLOps frameworks (MLFlow, WnB, etc.) to ensure best practices across the model lifecycle, ensuring reproducibility, reliability, and continuous improvement.
  • Creating thorough documentation for infrastructure and training procedures, staying updated on advancements in training strategies, and driving improvements in workflows and infrastructure.

Other

  • Master's degree or higher in Computer Science, Engineering, or a related technical field.
  • Candidates with more experience can be considered for a higher level or vice-versa.
  • high-agency individual demonstrating initiative, problem-solving, and a commitment to delivering robust and scalable solutions for rapid prototyping and turnaround.
  • Strong communication and teamwork skills
  • Supportive work culture