Eurofins Scientific is seeking a Data Engineer to design, build, and maintain data pipelines and infrastructure to support Machine Learning (ML) model training, deployment, and analysis workflows for complex chemical data.
Requirements
- Expert-level proficiency in Python (including packages like Pandas, NumPy, and familiarity with data engineering libraries).
- Hands-on experience with at least one major cloud provider (AWS, Azure, or GCP, preferred Azure), including services related to computing, storage, and serverless functions (e.g., Azure Data Lake Storage (ADLS), Azure Compute VMs, Azure Functions).
- Proven experience building and managing data workflows using an orchestration tool like Apache Airflow, Prefect, or Dagster.
- Strong knowledge of SQL and experience working with both relational (e.g., PostgreSQL) and NoSQL databases.
- Proficient with Git and standard DevOps practices
- Prior experience handling and processing complex, large-volume scientific data (e.g., mass spectrometry, chromatography, LIMS/ELN integration).
- Familiarity with MLOps platforms and tools such as Azure ML Studio, MLflow, Kubeflow, or Sagemaker.
Responsibilities
- Design, construct, and manage scalable and reliable ETL/ELT pipelines to ingest, clean, transform, and store raw chemistry data (e.g., CSV, JSON, and proprietary instrument formats).
- Develop optimized data models and manage a data warehouse (or data lake) to support fast querying and ML feature engineering on complex datasets, including time-series and spectral data from chromatograms.
- Collaborate with MLE to containerize and deploy ML models and build automated model retraining and monitoring pipelines.
- Implement robust data quality checks, validation, and monitoring to ensure the integrity and reproducibility of chemical experiment data used for ML.
- Develop internal tools and APIs to facilitate data access for MLE and provide standardized interfaces for data submission from chemistry lab systems.
Other
- Authorization to work in the United States without restriction or sponsorship
- Professional working proficiency in English is a requirement, including the ability to read, write and speak in English.
- Strong analytical and problem-solving skills with a focus on delivering high-quality, reproducible data solutions.
- Excellent verbal and written communication skills, with the ability to bridge the gap between technical infrastructure, data science models, and chemical applications.