Roche is looking to harness the transformative power of data and Artificial Intelligence (AI) to assist scientists in delivering more innovative and transformative medicines for patients worldwide.
Requirements
- Proficient in Python, with hands-on experience using modern frameworks for deep learning and GenAI, such as PyTorch, Hugging Face Transformers, LangChain, or Llama-Index.
- Good understanding of machine learning algorithms, model evaluation techniques, and performance optimization, with a knowledge of deploying LLMs in data-intensive settings.
- Skilled in cloud platforms (AWS, GCP, Azure), version control systems (Git, DVC, MLflow), CI/CD pipelines, and SQL for relational database management.
- Experience with deploying machine learning applications at scale, preferably in R&D or data-intensive environments.
- Knowledge of advancements in LLMs and GenAI, with a passion for applying these technologies to drive efficiencies in R&D workflows.
- Familiarity with large language models (LLMs) and their applications in scientific research.
- Experience with data search, insights, and protocol generation and review platforms.
Responsibilities
- Design, develop, and deploy cloud-first, API-driven machine learning applications for data search, insights, and protocol generation and review platforms.
- Leverage large language models (LLMs) to improve contextual search, data retrieval, and scientific research efficiency through advanced prompt engineering, retrieval augmented generation, and fine-tuning techniques.
- Develop and refine LLMs tailored for protocol generation and review workflows, driving innovation in GenAI applications to streamline R&D processes.
- Collaborate with data engineers, software engineers, and architects to integrate ML models effectively within the internal data ecosystem.
- Monitor, validate, and optimize ML applications to ensure high-quality outputs, performance scalability, and a seamless user experience.
- Partner with research teams to identify needs, exchange insights, and deliver solutions that address evolving R&D requirements.
Other
- Hold a Bachelor's, Master’s degree or PhD in Computer Science, Data Science, Applied Mathematics, Bioinformatics, or a related quantitative field, with 2-4 years of experience.
- Onsite presence on our South San Francisco campus is expected for at least 3 days a week.
- Relocation benefits are available for this job posting.
- A public portfolio of projects available on GitHub/GitLab is preferred.
- A record of scientific excellence, as evidenced by at least one publication in a scientific journal or conference is preferred.