Advance the core capabilities of Microsoft's Ads serving stack, a high-scale, low-latency, geo-distributed system powering advertisements across various Microsoft services.
Requirements
- 5+ years experience in programming with native C++, including writing production-quality code.
- 5+ years experience in designing, implementing, and scaling large-scale, distributed online systems with a deep understanding of system architecture and proven ability to profile, analyze, and optimize performance and capacity of native C++ systems in complex, high-throughput environments.
- Proven experience in designing, implementing, and validating deep learning systems for real-time online inference.
- Solid expertise in optimizing machine learning models for GPUs, including development of custom CUDA kernels for performance-critical workloads.
- coding in languages including, but not limited to, C, C++, C or Python
- coding in languages including, but not limited to, C, C++, C, Java, JavaScript, or Python
Responsibilities
- Design and develop large-scale, distributed systems—including CPU and GPU ranking platforms—to support real-time processing of millions of ad requests per second with high efficiency, extensibility, diagnosability, reliability, and maintainability.
- Lead architecture discussions, create technical design documents, and drive end-to-end solution planning—identifying system dependencies, performance optimizations, and security/compliance requirements across interconnected services.
- Implement features and enhancements with a focus on code quality, maintainability, and scalability; conduct thorough code reviews to uphold Microsoft engineering standards and ensure solutions are production-ready.
- Serve as a Designated Responsible Individual (DRI) for live-site operations on a rotational on-call basis, proactively identifying, resolving, and escalating service degradations or interruptions to maintain high availability.
- Guide testing strategies and quality assurance plans, including unit tests, automation, and telemetry-based diagnostics to validate assumptions, ensure reliability, and drive continuous improvement in service performance.
- Mentor engineers on software engineering best practices, reusable patterns, and tooling; lead efforts to improve performance through debugging, refactoring, experimentation, and instrumentation at scale.
- Drive engineering excellence through compliance with global and local regulations, investment in modern tools and trends, and close collaboration with partner teams to deliver secure, performant, and customer-aligned ad-serving solutions.
Other
- Microsoft will accept applications for the role until August 31, 2025.
- Microsoft is an equal opportunity employer.
- If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request via the Accommodation request form.