Designing, building, and maintaining the data infrastructure that powers statistical analyses and machine learning efforts for the Core Infrastructure Availability Engineering organization at Oracle
Requirements
- Expertise in data pipeline tools such as Apache Spark, Apache Beam, or Apache Flink
- Experience with data warehousing and data lake technologies such as Oracle Object Storage, Apache Hadoop, Apache Hive, or Amazon Redshift
- Strong programming skills in languages such as Python, Java, or Scala
- Solid understanding of different data structures and algorithms to design and implement efficient and scalable data processing systems
- Experience with containerization technologies such as Docker and Kubernetes
- Experience analyzing data, generating insights, and telling stories with data
- Strong understanding of data governance, data quality, and data security principles
Responsibilities
- Design, build, and maintain large-scale data pipelines that support statistical analyses and machine learning model training, testing, and deployment
- Create comprehensive data strategy to enable reporting, analytics, and machine learning
- Conduct research to evaluate data and answer strategic business questions
- Fine-tune and optimize algorithms and models to ensure scalability, reliability, and performance at a high level
- Develop and maintain data architectures that support data warehousing, data lakes, and data governance
- Work with cross-functional teams to integrate data pipelines
- Ensure data quality, integrity, and security across all data pipelines and systems
Other
- BS (or equivalent experience) in Data Science, Computer Science, Engineering, or related quantitative or technical field
- 8+ years of data science or software engineering experience
- Excellent communication and collaboration skills
- Provide guidance and mentorship to junior data scientists, contributing to team knowledge and best practices