Get Jobs Tailored to Your Resume

Filtr uses AI to scan 1000+ jobs and finds postings that perfectly matches your resume

Senior Data Engineer (R)

Blue Coding

Salary not specified

Nov 7, 2025

Blue Coding is looking to hire a Senior Data Engineer to design and build a next-gen data platform for one of their clients, focusing on ingesting, transforming, and governing document-centric datasets to power analytics, dashboards, and AI model training.

Requirements

6–10+ years in data engineering with 3+ years building production workloads on AWS; expert-level Python and SQL plus strong Spark (Glue/EMR/Databricks).
Proven experience designing and operating data lakes/warehouses at scale, including file formats (Parquet/Delta/Iceberg/Hudi), partitioning, and performance/cost tuning.
Hands-on document ETL: OCR pipelines, text/metadata extraction, schema design, and incremental processing for millions of files.
Solid orchestration and DevOps chops: Airflow/MWAA or Step Functions, Docker, Terraform/CDK, and CI/CD best practices.
Data governance mindset: lineage, quality frameworks, IAM least privilege, KMS, VPC endpoints/private networking, secrets management, and compliance awareness (e.g., SOC 2/ISO 27001).
Practical ML enablement: crafting reproducible, versioned datasets; experience with embeddings/feature pipelines and at least one vector-store pattern (OpenSearch/pgvector/etc).

Responsibilities

Design and build an AWS-first data platform: stand up an S3-based (or equivalent) data lake, Glue Data Catalog/Lake Formation, and a performant warehouse layer (Redshift/Snowflake/Athena) using medallion (bronze/silver/gold) patterns.
Implement a robust ETL/ELT solution for document data, including OCR (Textract), text parsing, metadata enrichment, schema inference, incremental loads, partitioning, and optimization for large-scale semi-structured/unstructured files.
Make data AI-ready: create curated, versioned training datasets, embeddings/feature pipelines, as well as ML-friendly exports for SageMaker/Bedrock or downstream services; prepare for a future AI developer to plug in models easily.
Orchestrate and productionize pipelines with Airflow/MWAA or Step Functions/Lambda; containerize where necessary (ECS/EKS) and deploy with Terraform or AWS CDK, along with CI/CD (CodePipeline/GitHub Actions).
Establish data quality, lineage, and governance: Utilize Great Expectations/Deequ checks, OpenLineage/Marquez, fine-grained permissions with Lake Formation, and perform cost/performance monitoring.
Partner with Analytics/BI to provision trusted marts powering dashboards (QuickSight/Power BI/Tableau) and self-serve queries.
Manage your own delivery process, including backlog grooming, sprint planning, estimates, stand-ups, reviews, and retrospectives.

Other

Speaking both Spanish and English at a fluent level is a must.
This position is open exclusively to candidates based in LATAM countries.
Excellent stakeholder communication and leadership: comfortable being the first and only data engineer, translating client needs into clear sprint goals, and later mentoring/partnering with an AI developer as the team grows.
100% Remote