EchoStar is looking to build and manage data pipelines for artificial intelligence (AI) models to ensure high-quality data is available for training and deployment, and to optimize model performance.
Requirements
- Proficiency in Python, SQL and Apache Spark
- AWS services such as EMR, Glue (serverless architecture), S3, Athena, IAM, Lambda and CloudWatch
- Core Spark, Spark Streaming, DataFrame API, Data Set API, RDD APIs & Spark SQL programming dealing with processing terabytes of data
- Advanced SQL using Hive/Impala framework including SQL performance tuning
- Expertise in Hadoop, and other distributed computing frameworks
- ElasticSearch (OpenSearch) and Kibana Dashboards
- Resource management frameworks such as Yarn or Mesos.
Responsibilities
- Design and implement robust data pipelines to extract, transform, and load (ETL) data from various sources, optimizing for efficient AI model training.
- Gather and understand data requirements, create and maintain automated ETL processes with special focus on data flow, error recovery, and exception handling and reporting
- Support data and cloud transformation initiatives
- Support our software engineers and data scientists
- Understand the latest technologies in a rapidly innovative marketplace
- Ability to work independently and with a team with all stakeholders across the organization to deliver enhanced functionality
Other
- Candidates must be willing to participate in at least one in-person interview, which may include a live whiteboarding or technical assessment session.
- Minimum 5 years of experience in Big Data Engineering/Data Analysis
- Candidates need to successfully complete a pre-employment screen, which may include a drug test and DMV check.