Oracle Cloud Infrastructure (OCI) is looking to solve the problem of automating, optimizing, and securing networks using AI, with a focus on tasks like self-provisioning, auto-ingesting, auto-qualifying systems and self-healing networks.
Requirements
- Strong Python and ML frameworks (PyTorch, TensorFlow)
- LLMs, embeddings, vector search, RAG pipelines, and fine-tuning
- Data engineering: Spark, Kafka, Flink, OCI Streaming/Data Flow
- Distributed systems and large-scale training/inference
- Handling network telemetry (NetFlow, packet captures, streaming telemetry)
- Network automation frameworks (Terraform, Ansible, NAPALM, Batfish is a plus)
Responsibilities
- Design and implement scalable orchestration for serving and training AI/ML models
- Explore and incorporate contemporary research on AI, agents, and inference systems into the software stack for designing, monitoring, troubleshooting and deploying networks.
- Evaluate, Integrate, and Optimize technologies across the stack, for latency, throughput, and resource utilization for training and inference workloads.
- Lead initiatives in AI systems design, including Retrieval-Augmented Generation (RAG) and LLM fine-tuning.
- Design and develop scalable services and tools to support GPU-accelerated AI pipelines, Python/Go, and observability frameworks.
Other
- BSEE, BSCS, BSCE, or equivalent. MSEE, MSCS, or MSCE is a plus
- At least 7+ years of experience building software systems and built AI applications training models.
- Strong problem-solving skills, attention to detail, and excellent communication skills are essential for this role.
- US: Hiring Range in USD from: $96,800 - $223,400 per year. May be eligible for bonus and equity.
- Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.