Job Board
LogoLogo

Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Blue Coding Logo

Senior Data Engineer (R)

Blue Coding

Salary not specified
Nov 7, 2025
Apply Now

Blue Coding is looking to hire a Senior Data Engineer to design and build a next-gen data platform for one of their clients, focusing on ingesting, transforming, and governing document-centric datasets to power analytics, dashboards, and AI model training.

Requirements

  • 6–10+ years in data engineering with 3+ years building production workloads on AWS; expert-level Python and SQL plus strong Spark (Glue/EMR/Databricks).
  • Proven experience designing and operating data lakes/warehouses at scale, including file formats (Parquet/Delta/Iceberg/Hudi), partitioning, and performance/cost tuning.
  • Hands-on document ETL: OCR pipelines, text/metadata extraction, schema design, and incremental processing for millions of files.
  • Solid orchestration and DevOps chops: Airflow/MWAA or Step Functions, Docker, Terraform/CDK, and CI/CD best practices.
  • Data governance mindset: lineage, quality frameworks, IAM least privilege, KMS, VPC endpoints/private networking, secrets management, and compliance awareness (e.g., SOC 2/ISO 27001).
  • Practical ML enablement: crafting reproducible, versioned datasets; experience with embeddings/feature pipelines and at least one vector-store pattern (OpenSearch/pgvector/etc).

Responsibilities

  • Design and build an AWS-first data platform: stand up an S3-based (or equivalent) data lake, Glue Data Catalog/Lake Formation, and a performant warehouse layer (Redshift/Snowflake/Athena) using medallion (bronze/silver/gold) patterns.
  • Implement a robust ETL/ELT solution for document data, including OCR (Textract), text parsing, metadata enrichment, schema inference, incremental loads, partitioning, and optimization for large-scale semi-structured/unstructured files.
  • Make data AI-ready: create curated, versioned training datasets, embeddings/feature pipelines, as well as ML-friendly exports for SageMaker/Bedrock or downstream services; prepare for a future AI developer to plug in models easily.
  • Orchestrate and productionize pipelines with Airflow/MWAA or Step Functions/Lambda; containerize where necessary (ECS/EKS) and deploy with Terraform or AWS CDK, along with CI/CD (CodePipeline/GitHub Actions).
  • Establish data quality, lineage, and governance: Utilize Great Expectations/Deequ checks, OpenLineage/Marquez, fine-grained permissions with Lake Formation, and perform cost/performance monitoring.
  • Partner with Analytics/BI to provision trusted marts powering dashboards (QuickSight/Power BI/Tableau) and self-serve queries.
  • Manage your own delivery process, including backlog grooming, sprint planning, estimates, stand-ups, reviews, and retrospectives.

Other

  • Speaking both Spanish and English at a fluent level is a must.
  • This position is open exclusively to candidates based in LATAM countries.
  • Excellent stakeholder communication and leadership: comfortable being the first and only data engineer, translating client needs into clear sprint goals, and later mentoring/partnering with an AI developer as the team grows.
  • 100% Remote