Cisco IT's Cloud Operations team is looking to build scalable, efficient, and cutting-edge infrastructure powering the next generation of AI solutions by optimizing cloud infrastructure, automating routine operations, enhancing performance monitoring, and improving system resilience.
Requirements
- Proven ability building and deploying ML models, with at least 2 years focused on infrastructure or cloud operations.
- Solid knowledge of hybrid cloud technologies (AWS, GCP, OpenStack, Kubernetes).
- Experience with Python, Jupyter, and ML libraries such as PyTorch, TensorFlow, or scikit-learn.
- Familiarity with cloud-native monitoring, logging, and automation tools (e.g., Terraform, Ansible, Prometheus, Splunk, AppDynamics).
- Comfortable working with streaming data, APIs, and telemetry systems.
- Experience with Agile and DevOps operating models, including project tracking tools (e.g., Jira), Git (any Version Control systems), and CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins).
- Proficient in general-purpose programming languages (Python, GoLang, Bash and/or C/C++) and development platforms and technologies.
Responsibilities
- Design and implement AI Agents to optimize cloud resource allocation, auto-scaling, and performance tuning.
- Develop predictive models for failure detection, incident management, and system health monitoring.
- Automate operational workflows using machine learning and intelligent scripting.
- Integrate AI-driven insights with existing cloud monitoring tools.
- Collaborate with DevOps and SRE teams to deploy, monitor, and improve ML models in production environments.
- Conduct anomaly detection for security, cost optimization, and performance analytics.
- Continuously evaluate emerging AI technologies and tools for operational improvements.
Other
- LOCAL CANDIDATES ONLY (MUST HAVE AI EXPERIENCE)
- An excellent collaborator who can partner, lead, guide, and communicate advanced technical concepts.
- A hardworking and passionate engineer comfortable working in high-pressure, large-scale enterprise environments.
- Strong communication and multi-functional collaboration skills.
- Deep understanding of operating systems and experience with Cisco technologies (UCS, Nexus, Thousand Eyes)