The company is seeking to design and develop scalable, high-performance data pipelines focused on time series data.
Requirements
- Proven experience with time series databases and storage solutions (e.g., KDB+, TimeSet, Kronos).
- Strong background in building both streaming and batch data pipelines using tools like AWS Glue, Apache Kafka, Apache Flink, or Apache Spark.
- Proficiency in Python and experience integrating with ML libraries such as pandas, NumPy, scikit-learn, and PyTorch.
- Experience designing scalable, efficient data models and partitioning strategies for time series data.
- Knowledge of distributed systems, parallel computing, and columnar data storage.
- Familiarity with cloud-based data architectures (AWS, GCP, or Azure) and containerized environments.
- Experience with multiple time series systems (e.g., KDB+ and Kronos)
Responsibilities
- Design, build, and optimize data pipelines to process large-scale time series data.
- Develop scalable infrastructure using tools like KDB+, TimeSet, or Kronos.
- Implement real-time and historical data ingestion and transformation workflows.
- Integrate data systems with Python-based ML pipelines to support model training and inference.
- Design data models and schemas tailored for time series data, including strategies for downsampling, indexing, and aggregation.
- Monitor and fine-tune systems for reliability, scalability, and performance.
- Implement best practices in data governance, lineage tracking, and system observability.
Other
- 5+ years of experience in data engineering, particularly in large-scale, high-throughput environments.
- Excellent communication skills and the ability to work in a client-facing, collaborative environment.
- Mentor junior engineers in large-scale system architecture and distributed processing.
- Work closely with product and infrastructure teams to align technical solutions with business objectives.