The company is looking to solve complex data science and machine learning problems, including data ingestion, query optimization, and model deployment, using various technologies such as Apache Druid, Apache Spark, and cloud platforms.
Requirements
- Expert-level proficiency in Python and its core data science libraries (Pandas, NumPy, Scikit-learn).
- Strong proficiency in SQL for complex data extraction and manipulation.
- Hands-on experience with modern deep learning frameworks such as TensorFlow or PyTorch.
- Deep understanding of statistical concepts and a wide range of machine learning algorithms, with proven experience in time-series forecasting and anomaly detection.
- Demonstrable experience working with large datasets using distributed computing frameworks, specifically Apache Spark.
- Experience querying and working with data from multiple relational database systems (e.g., PostgreSQL, Oracle, MS SQL Server).
- Experience building and deploying data science solutions on a major cloud platform (AWS, GCP, or Azure).
Responsibilities
- Design and implement efficient data schemas, dimensions, and metrics within Apache Druid for various analytical use cases (e.g., clickstream, IoT, application monitoring).
- Develop and manage real-time data ingestion pipelines into Druid from streaming sources like Apache Kafka, Amazon Kinesis, or other message queues.
- Implement batch data ingestion processes from data lakes (e.g., HDFS, Amazon S3, Azure Blob, Google Cloud Storage) or other databases.
- Write and optimize complex SQL queries (Druid SQL) for high-performance analytical workloads, including aggregations, filters, and time-series analysis.
- Analyze query plans and identify performance bottlenecks, implementing solutions such as segment optimization, query rewriting, or cluster configuration adjustments.
- Determine optimal partitioning, indexing (bitmap indexes), and rollup strategies to ensure sub-second query performance and efficient storage.
- Ensure data quality, consistency, and exactly-once processing during ingestion.
Other
- Excellent verbal and written communication skills, with a proven ability to explain complex technical concepts to a non-technical audience through visual storytelling.
- Ability to work with non-technical stakeholders to understand business requirements and communicate technical solutions.
- Strong problem-solving skills and attention to detail.
- Ability to work in a team environment and collaborate with cross-functional teams.
- Ability to adapt to changing priorities and deadlines.