Skywalker Sound Development Group is seeking an experienced Data Engineer to specialize in the creation, management, and optimization of data pipelines to support cutting-edge AI/ML research for immersive and multichannel audio applications.
Requirements
- Proficiency in Python, with expertise in data manipulation libraries such as Pandas, NumPy, and PyTorch’s data utilities.
- Hands-on experience with audio processing libraries and tools (e.g., Librosa, FFmpeg, SoX) for handling complex audio formats.
- Familiarity with scalable pipeline tools like GitLab, Apache Spark, Airflow, or Luigi, and experience with containerized workflows (Docker, Kubernetes).
- Strong understanding of data pipeline requirements for model training, retraining, and evaluation in iterative research workflows.
- Experience with immersive and multichannel audio formats.
- Knowledge of cloud-based platforms and tools for storage and processing, such as AWS S3, Redshift, or Google BigQuery.
- Experience integrating data pipelines with AI/ML workflows, including active learning and model retraining.
Responsibilities
- Design, implement, and maintain scalable, automated data pipelines for the ingestion, preprocessing, and transformation of large-scale audio datasets.
- Ensure pipelines support efficient model training and retraining workflows, enabling continuous improvement of AI/ML models.
- Collaborate with AI/ML researchers to define data requirements and integrate feedback to improve data pipeline functionality.
- Develop advanced preprocessing techniques for immersive and multichannel audio formats (e.g., Dolby Atmos, high-order ambisonics).
- Automate data cleaning, normalization, and augmentation processes to prepare datasets for various model architectures, including foundational models and transformers.
- Integrate external datasets and APIs while ensuring compliance with legal and ethical data usage standards.
- Monitor and optimize pipeline performance to handle complex and dynamic data structures effectively.
Other
- This role is considered Hybrid, which means the employee will work 2-3 days onsite at our Nicasio, CA office and occasionally from home.
- Master’s Degree with preference for PhD in Data Engineering/Science, Computer Science, Signal Processing, or a related field.
- 8+ years of experience in data engineering or data science with a focus on building pipelines for AI/ML applications.
- Strong problem-solving skills, with a proactive mindset for addressing evolving data challenges.
- The hiring range for this position in Nicasio, CA is $166,800 to $223,600 per year.