Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Super Micro Computer Logo

Sr. System Engineer

Super Micro Computer

$137,000 - $156,000
Sep 5, 2025
San Jose, CA, USA
Apply Now

Supermicro is looking for a Sr. System Engineer to roll out and maintain business critical applications and services, resolve escalated service issues, and drive technical development for their advanced server, storage, and networking solutions.

Requirements

  • 8+ years of work-related experience in Deep Learning and Machine Learning
  • 8+ years of Linux/networking debugging/testing or relevant experience preferred
  • Experience with leading AI/ML frameworks such as PyTorch, TensorFlow, ONNX, etc.
  • Experience with DevOps or in cloud environments, including but not limited to Docker/Containers and Kubernetes
  • Hands-on experience with workload/scheduler Managers (Slurm) for rack/cluster
  • Familiar with MLPerf Training/Inference benchmark, LLM, HPL-AI or RCCL/NCCL
  • Programming experience with windows and Linux shell scripting

Responsibilities

  • Execute comprehensive system-level rack tests on latest NVidia and AMD GPUs, ARM-based, Intel Xeon, and AMD EPYC processors, encompassing functionality, compatibility, performance, stress, and reliability testing, leveraging proprietary in-house tools
  • Establish expertise in HPC/AI applications and benchmarks, delivering impactful training sessions to customers and partners, while addressing complex customer support issues, demonstrating innovative problem-solving skills and building robust processes and procedures for HPC/AI solutions
  • Conduct proof of concept design and testing, providing optimized benchmarks for HPC/AI applications in a timely manner. Fine-tune BIOS settings, optimize OS/network configurations, and develop diverse simulation configurations to enhance efficiency across various workloads
  • Deliver on-site deployment services, ensuring customer acceptance verification and providing post-level 1&2 support. Create and maintain technical documentation, including technical notes, blogs, and diagrams, to facilitate knowledge dissemination
  • Identify and document hardware and software quality issues and collaborate with Product Management and other Engineering teams to integrate customer feedback into future product enhancements
  • Proactively engage in HPC roadmap development, planning software and hardware upgrades to sustain exceptional HPC infrastructure performance
  • Document and analyze test plans, reports, logs, and actively contribute to the development of test utilities and automation scripts to streamline testing processes

Other

  • As a Sr. System Engineer, you’ll be the go-to person to roll out and maintain business critical applications and services for Supermicro.
  • You are also responsible for resolving escalated service issues, coaching other engineers to resolutions, engineering and implementing complex projects.
  • You will be a person who is independent with leadership to drive the technical development and with excellent communication skills.
  • Strong sense of teamwork and good team player, strong communication skills
  • Please note that this position requires regular in-office attendance. The successful candidate is expected to be present in the office during standard working hours as determined by the company. In-office collaboration and participation in team meetings, training sessions, and other on-site activities are essential aspects of this role. Candidates should consider the commuting distance and be prepared to fulfill their responsibilities in the designated office location.