Hayden AI is looking to grow its data platform to enable reliable, scalable, and high-performance data systems across the organization. This involves designing and building core data infrastructure, optimizing data pipelines, ensuring data quality and reliability, and collaborating with product and data-science teams to deliver impactful data solutions that power their product and deliver business insights.
Requirements
- Strong proficiency in Python, Scala or Java for data processing and automation.
- Hands-on experience with AWS data services such as Kinesis/Kafka, Glue, EMR, Iceberg, Redshift, S3, and Lambda.
- Solid understanding of data modeling, ETL/ELT design, and distributed data processing (Spark or Flink).
- Experience with infrastructure-as-code (Terraform or CloudFormation) and CI/CD automation.
- Familiarity with data governance, lineage, and observability best practices.
- Working knowledge of cloud security, IAM, and cost optimization in AWS.
- AWS or data engineering certifications are a strong plus.
Responsibilities
- Evolve the data platform leveraging AWS services such as Kinesis/Kafka, Glue, EMR, Lambda, and Redshift to support streaming, batch, and analytical workloads at scale.
- Design and implement robust data ingestion and transformation pipelines, ensuring reliability, performance, and schema consistency across diverse data sources.
- Establish and enforce data quality, validation, and observability standards, building automated checks and alerts using CloudWatch, Datadog and custom frameworks.
- Optimize data storage and lifecycle management through intelligent S3 partitioning, versioning, and lifecycle policies to balance performance, cost, and retention.
- Conduct regular performance, scalability, and cost-efficiency audits of data pipelines and warehouses, proactively identifying and resolving bottlenecks or inefficiencies.
- Ensure data reliability and resilience, implementing redundancy, checkpointing, and replay strategies for real-time and near-real-time data systems.
- Define and maintain data governance and security best practices, including IAM-based access control, encryption policies, and compliance with internal data standards.
Other
- 6+ years of experience in software or data engineering, focused on large-scale big-data systems and low latency systems.
- Excellent collaboration and communication skills, with experience working cross-functionally with analytics and platform teams.
- Mentor other engineers and promote best practices in data architecture, pipeline design, and development across the engineering organization.
- Lead root-cause analysis and incident response for data platform issues, driving long-term reliability improvements and knowledge sharing across teams.
- Stay current with AWS and data engineering advancements, evaluating new tools and services to continuously improve scalability, developer velocity, and data accessibility.