Sev1Tech is seeking a Senior Data Lake Engineer to establish and configure an enterprise-level Databricks solution to support their federal customer organization's data lake initiatives, aiming to enhance data infrastructure with scalable and secure solutions.
Requirements
- Proven experience in building and configuring enterprise-level data lake solutions using Databricks in an AWS or Azure environment.
- In-depth knowledge of Databricks architecture, including workspaces, clusters, storage, notebook development, and automation capabilities.
- Strong expertise in designing and implementing data ingestion pipelines, data transformations, and data quality processes using Databricks.
- Experience with big data technologies such as Apache Spark, Apache Hive, Delta Lake, and Hadoop.
- Hands-on experience with cloud platforms like AWS or Azure, including relevant services like S3, EMR, Glue, Data Factory, etc.
- Proficiency in SQL and one or more programming languages (Python, Scala, or Java) for data manipulation and transformation.
- Knowledge of data security and privacy best practices, including data access controls, encryption, and data masking techniques.
Responsibilities
- Lead the design, implementation, and configuration of an enterprise Data Lake solution utilizing Databricks, ensuring scalability, reliability, and optimal performance.
- Establish and configure Databricks workspaces, clusters, and storage components, optimizing the solution for efficient data processing, query performance, and data governance.
- Design and implement data ingestion pipelines to efficiently extract, transform, and load data from various sources into the data lake using Databricks tools and services.
- Develop and maintain data lake security frameworks, including access controls, encryption solutions, and data masking techniques to protect sensitive data.
- Monitor and tune Databricks clusters and workloads to ensure performance, reliability, and cost optimization, utilizing automated scaling and resource management techniques.
- Implement best practices for data governance, data cataloging, metadata management, and data lineage within Databricks, adhering to regulatory and compliance requirements.
- Collaborate with infrastructure teams to ensure data lake infrastructure meets scalability and availability requirements, leveraging Databricks cluster management and AWS/Azure services.
Other
- Bachelor's degree in computer science, information technology, or a related field. Equivalent experience will also be considered.
- Must be able to provide proof of U.S. Citizenship.
- This position has an on-site requirement of 2 days a week in Arlington VA (In office requirement subject to change based on client request).
- Excellent interpersonal and communication skills, with the ability to collaborate effectively with technical and non-technical stakeholders.
- Relevant certifications such as Databricks Certified Developer or Databricks Certified Professional are highly desirable.