Annapurna Labs (AWS) is seeking a Senior Software Engineer to develop software infrastructure and tools for next-generation ML acceleration chips, focusing on improving performance and power efficiency at scale across thousands of AWS data center systems.
Requirements
- 5+ years of experience in software development using Python, C/C++, or similar languages.
- Proven experience in building tools and automation for system-level testing and data collection.
- Strong foundation in data analysis, including experience with frameworks like Pandas, NumPy, and visualization libraries.
- Experience scaling test infrastructure or automation across large-scale environments (e.g., cloud or data centers).
- Experience developing performance or power telemetry infrastructure.
- Proficiency in dashboard tools such as Grafana, Power BI, or Tableau.
- Familiarity with machine learning workloads (e.g., PyTorch, TensorFlow).
Responsibilities
- Develop and implement software and firmware for managing power, thermal, and performance behavior
- Develop and maintain tools to collect, process, and analyze power and performance data for machine learning workloads at scale.
- Develop low-level software interfaces to retrieve metrics from firmware, hardware, and telemetry sources.
- Design and develop scalable automation frameworks to run performance and power tests across thousands of servers inside Amazon data centers.
- Create dashboards and data pipelines to visualize trends, detect anomalies, and enable data-driven decisions.
- Collaborate with architecture, hardware, and software teams to define key metrics and guide system optimizations.
- Continuously evolve infrastructure to improve efficiency, accuracy, and test coverage.
Other
- Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field.
- Strong debugging skills and ability to interpret system-level behavior from raw telemetry.
- Work safely and cooperatively with other employees, supervisors, and staff
- Adhere to standards of excellence despite stressful conditions
- Communicate effectively and respectfully with employees, supervisors, and staff to ensure exceptional customer service