Global Payments is seeking to transform its operations through strategic Generative AI and Machine Learning initiatives, requiring a senior AI leadership role to ensure the reliability, scalability, and performance of production AI systems and services.
Requirements
- Strong command of AWS and GCP with experience managing AI workloads.
- Hands-on experience with AWS SageMaker, AWS Bedrock, Google VertexAI, and Snowflake Cortex.
- Deep knowledge of LLMs, inference pipelines, vector databases, RAG and agentic architectures.
- Expertise in designing and running reliable, scalable, and observable production systems.
- Hands-on experience with observability platforms (e.g. Fiddler AI, Arize, Weights & Biases, etc)
- Deep understanding of containerization and designing ephemeral solutions.
- Familiarity with MLOps workflows, data versioning and model lifecycle management
Responsibilities
- Lead the SRE and Support function for AI production systems, including LLM inference services and monitoring, vector databases, orchestration platforms and AI agent frameworks.
- Ensure high availability, low latency performance, and secure operation of GenAI PAIs and applications.
- Build and scale observability frameworks to monitor model drift, hallucination, bias, performance degradation, latency spikes, and bottlenecks.
- Define and enforce SLAs, SLOs, and error tolerance tailored to AI/ML workloads, covering batch, realtime, and on-demand use cases.
- Lead incident management and root cause analysis across AI pipelines, including model serving, feature stores, and data flows.
- Partner with the AI Engineering, MLOps and Platform teams to ensure reliability is baked right into every stage of AI development and deployment.
- Work closely with the Platform teams to implement and support auto-scaling, failover, and self-healing strategies for AI workloads in multi-cloud and hybrid environments.
Other
- Bachelor’s or Master's degree in Computer Science, Math, AI, or a related area.
- At least 10 years of experience in software and support engineering, for enterprise-grade cloud based AI systems.
- Passionate engineering leader with experience building high performance teams.
- Proficiency in stakeholder management to effectively communicate and manage expectations of those linked to the work outside your team
- Proficiency in project management and resource allocation to ensure timely, efficient and successful delivery of outcomes
- Experience in strategic planning and execution with strong decision-making skills to align initiatives with business goals and make informed choices that benefit the organization
- Some experience in handling compliance and regulatory requirements to ensure engineering practices adhere to relevant laws and regulations