Microsoft's Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products our portfolio include Microsoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is to build the data platform for the age of AI, powering a new class of data-first applications and driving a data culture. The Cosmos DB team is seeking a Principal Software Engineer to work on enhancing the scalability and reliability of the Cosmos DB Control Plane, a global service responsible for provisioning, configuring, securing Cosmos DB resources.
Requirements
- 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C-Sharp, Java, JavaScript, or Python
- 5+ years of experience building system designs and solutions on distributed systems.
- Experience designing, building, and operating large-scale NoSQL cloud databases (e.g., Cosmos DB, DynamoDB, Cassandra, MongoDB) with emphasis on scalability, availability, and observability.
- Experience collaborating with partner teams to ensure seamless integration, end-to-end testing, and operational readiness, including telemetry, scalability validation.
- Experience engaging directly with customers to understand scenarios, validate requirements, and incorporate feedback into design and solution improvements.
Responsibilities
- Lead architectural design for complex, large-scale services, producing clear design documents that capture dependencies, trade-offs, and long-term scalability.
- Build and maintain high-quality, extensible, and reliable code, while defining comprehensive test strategies to ensure functionality, prevent regressions, and validate security.
- Collaborate with partner teams to ensure seamless integration, end-to-end testing, and operational readiness, including telemetry, scalability validation
- Ensure live-site excellence by leading on-call operations, improving troubleshooting guides, telemetry, and automation to enhance on-call effectiveness; recommend user-facing documentation and additional test coverage to reduce incidents.
- Engage directly with customers to understand scenarios, validate requirements, and incorporate feedback into design and solution improvements.
- Mentor engineers and lead by example in producing maintainable, secure, and performant code that adheres to design specifications.
- Produce technical blogs and content that showcase the scale, innovation, and engineering excellence of Cosmos DB, helping grow awareness and adoption across the developer community.
Other
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role.
- Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.