Impinj is seeking a Data Engineer to manage and process high-volume IoT data for cloud-based machine learning model training, enabling real-time inference on edge devices.
Requirements
- 8+ years of experience in data engineering working with Machine Learning pipelines
- Deep understanding of data pipeline design, ETL/ELT processes, automated workflow orchestration (e.g. Apache Airflow)
- Strong programming skills in Python (especially for data workflows), with experience building scalable, maintainable pipelines. (e.g. Pandas, numpy)
- Strong experience with structured and unstructured databases (SQL, MongoDB, DuckDB)
- Strong understanding of cloud infrastructure (AWS, Azure, or GCP), especially cloud storage, compute, and ML tools (e.g., SageMaker, Vertex AI, Azure ML)
- Experience with data lake/data warehouse technologies (e.g., S3 + Glue, BigQuery, Snowflake, Delta Lake)
- Familiar with distributed data systems and big data tools (e.g., Spark, Kafka, Hadoop)
Responsibilities
- Design data workflows to support model training, evaluation, and retraining cycles for deployment on edge devices
- Work closely with ML engineers to align data formats, labeling standards, feature extraction for edge-compatible models, and feedback loops for model improvement
- Architect and maintain scalable data pipelines to ingest, process, store, and access large volumes of structured and semi-structured RFID time-series data from edge networks
- Develop automated systems for data versioning, labeling, augmentation, and quality assurance
- Establish and maintain data APIs and interfaces to query, consume, and update datasets
- Manage large datasets using distributed storage and compute frameworks (e.g., Apache Spark, Hadoop, or Dask)
- Implement robust ETL/ELT workflows for preparing data for cloud-based ML model training and evaluation
Other
- Bachelor’s degree in Data Engineering, Electrical Engineering or a related field and 8 years of related experience, or equivalent combination of education and experience
- This is a multi-functional role requiring close collaboration with ML engineers, systems engineers, cloud architects, and embedded systems teams
- Collaborate and coordinate with large scale data collection projects
- Monitor and optimize data pipelines for performance, reliability, and cost across edge-to-cloud infrastructure
- Optimize data flow and compute for performance, cost, and latency in hybrid edge-cloud environments