C the Signs is looking to develop and fine-tune data specifically for their LLMs and machine learning models, and the Data Engineer will play a crucial role in this process, particularly in the context of healthcare data.
Requirements
- Strong proficiency in programming languages such as Python, Scala, or Java.
- Extensive experience with data warehousing, ETL processes, and data modeling.
- Experience with major cloud providers (e.g., AWS, GCP, Azure) and their data storage and processing services.
- Hands-on experience with big data frameworks like Apache Spark for distributed processing.
- Experience with healthcare data and a good understanding of healthcare data standards (e.g., FHIR, HL7).
- Familiarity with machine learning concepts and LLM fine-tuning processes.
- Experience with data orchestration tools (e.g., Apache Airflow).
Responsibilities
- Collaborate with data scientists and machine learning engineers to understand data requirements for LLM and machine learning model fine-tuning.
- Design, build, and maintain scalable data pipelines to ingest, process, and store massive and diverse healthcare datasets.
- Implement robust data validation and monitoring to ensure the integrity, accuracy, and consistency of all training datasets.
- Implement robust data cleaning, validation, and transformation processes to ensure data quality and integrity.
- Develop and optimize data structures and schemas for efficient access and utilization by LLMs and machine learning models.
- Monitor data pipeline performance, troubleshoot issues, and implement optimizations to improve efficiency and reliability.
- Document data engineering processes, data models, and data dictionaries.
Other
- Bachelor's degree in Computer Science, Engineering, or a related field.
- Must be a US Citizen, Green Card holder, or currently in the US have valid H1B visa
- Excellent problem-solving skills and the ability to work independently and as part of a team.
- Strong communication and interpersonal skills.
- Master's degree in a related field.