The Generative AI – Senior Platform Software Engineer role supports the operational stability of GenAI platforms used across the enterprise, ensuring continuity of service and performance.
Requirements
2+ years of experience with observability tools (e.g. Prometheus, Grafana, Splunk)
2+ years of experience in incident management and diagnostics in production environments
1+ year of experience supporting internal platforms or services used by engineering or ML teams
2+ years of experience working with cloud infrastructure platforms, preferably Google Cloud Platform
Experience with infrastructure-as-code tools (i.e., Terraform, Ansible)
Experience with container platforms
Experience supporting Generative AI environments
Responsibilities
Participate in incident triage and resolution across platform services
Maintain observability tooling to ensure visibility into system performance and reliability
Collaborate with infrastructure teams (e.g., Google Cloud Platform support) to resolve platform level issues
Conduct diagnostics and contribute to root cause analysis for platform incidents
Support internal Gen-AI facing platforms, including Agent Space, ensuring operational stability and performance
Contribute to automation, runbooks and service documentation to improve operational efficiency
Other
4+ years of Software Engineering experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
4+ years of experience in platform operations, SRE, or infrastructure engineering
1+ year of experience collaborating across geographically distributed teams