The company needs a data engineer to help clients find answers in their data to impact important missions.
Requirements
- Experience using Python, SQL, and PySpark
- Experience deploying and maintaining Cloudera or Apache Spark clusters
- Experience designing and maintaining Data Lakes or Data Lakehouses
- Experience with big data tools such as Spark, NiFi, Kafka, Flink, or at multi-petabyte scale
- Experience in designing and maintaining ETL or ELT data pipelines utilizing storage, serialization formats, schemas, such as Parquet and Avro
- Knowledge of cryptography protocols and standards, including TLS, mTLS, hashing algorithms, and Public Key Infrastructure (PKI)
- Knowledge of cybersecurity concepts, including threats, vulnerabilities, security operations, encryption, boundary defense, auditing, authentication, and supply chain risk management
Responsibilities
- Deploy and develop pipelines and platforms that organize and make disparate data meaningful
- Manage the assessment, design, building, and maintenance of scalable platforms for clients
- Design and maintain Data Lakes or Data Lakehouses
- Design and maintain ETL or ELT data pipelines utilizing storage, serialization formats, schemas, such as Parquet and Avro
- Administrate and maintain data science workspaces and tool benches for Data Scientists and Analysts
- Deploy and maintain Cloudera or Apache Spark clusters
- Use experience in analytical exploration and data examination
Other
- Secret clearance
- HS diploma or GED
- DoD8570 IAT II Compliance Certification, such as Security+, CCNA Security, or GSEC
- Bachelor’s Degree
- Ability to work in a fast-paced, agile environment
- Ability to work with and guide a multi-disciplinary team of analysts, data engineers, developers, and data consumers