The company is looking to solve complex data modeling, ingestion, and query optimization problems using Apache Druid, and leverage machine learning for time-series forecasting and anomaly detection.
Requirements
- Expert-level proficiency in Python and its core data science libraries (Pandas, NumPy, Scikit-learn).
- Strong proficiency in SQL for complex data extraction and manipulation.
- Hands-on experience with modern deep learning frameworks such as TensorFlow or PyTorch.
- Deep understanding of statistical concepts and a wide range of machine learning algorithms, with proven experience in time-series forecasting and anomaly detection.
- Demonstrable experience working with large datasets using distributed computing frameworks, specifically Apache Spark.
- Experience querying and working with data from multiple relational database systems (e.g., PostgreSQL, Oracle , MS SQL Server).
- Practical experience with MLOps principles and tools for model versioning, tracking, and deployment (e.g., MLflow, Docker).
Responsibilities
- Design and implement efficient data schemas, dimensions, and metrics within Apache Druid for various analytical use cases (e.g., clickstream, IoT, application monitoring).
- Determine optimal partitioning, indexing (bitmap indexes), and rollup strategies to ensure sub-second query performance and efficient storage.
- Develop and manage real-time data ingestion pipelines into Druid from streaming sources like Apache Kafka, Amazon Kinesis, or other message queues.
- Implement batch data ingestion processes from data lakes (e.g., HDFS, Amazon S3, Azure Blob, Google Cloud Storage) or other databases.
- Write and optimize complex SQL queries (Druid SQL) for high-performance analytical workloads, including aggregations, filters, and time-series analysis.
- Analyze query plans and identify performance bottlenecks, implementing solutions such as segment optimization, query rewriting, or cluster configuration adjustments.
- Build and deploy data science solutions on a major cloud platform (AWS, GCP, or Azure).
Other
- Excellent verbal and written communication skills, with a proven ability to explain complex technical concepts to a non-technical audience through visual storytelling.