The UT Data Hub improves university outcomes and advances the UT mission by increasing the usability and value of institutional data. The Principal Data Engineer will lead the data engineering team to innovate and implement data engineering trends and best practices to create complex data pipelines within UT’s cloud data ecosystem in support of academic and administrative needs, enabling advanced data-driven decision making.
Requirements
- 5+ years of experience designing, implementing, and maintaining complex, production-grade data pipelines and enterprise data platforms.
- 5+ years of hands-on experience with cloud-based data engineering, preferably in Amazon Web Services (AWS), with strong command of services such as Glue, S3, Lambda, Redshift, and EMR.
- 3+ years of experience defining cloud data architecture and data strategy in large, distributed enterprise environments.
- Deep expertise with Databricks Lakehouse Platform, including Delta Lake, Delta Live Tables, and Unity Catalog, for scalable data ingestion, transformation, and governance.
- Proficiency in Python, PySpark, and SQL, with demonstrated experience in building ETL/ELT workflows across structured and unstructured data sources.
- Proven ability to design and implement high-performance, AI-ready data architectures supporting analytics, machine learning, and real-time data processing.
- Experience developing and deploying Continuous Integration / Continuous Delivery (CI/CD) pipelines for data engineering using tools such as Databricks Repos, GitHub Actions, or Terraform.
Responsibilities
- Architect, design, and lead the development of enterprise-scale, production-grade data platforms and pipelines using Databricks and cloud-native technologies (AWS, Azure, or GCP).
- Champion the adoption of the Databricks Lakehouse architecture to unify data warehousing, data science, and machine learning workloads across the organization.
- Guide the design and deployment of AI-ready data pipelines to support predictive analytics, generative AI, and advanced decision intelligence use cases.
- Define and enforce data engineering standards, including performance optimization, scalability, data observability, and cost efficiency.
- Oversee code reviews, architecture reviews, and system design discussions to ensure technical excellence and maintainability across the engineering team.
- Lead the implementation of robust data quality, governance, and compliance frameworks, leveraging Databricks Unity Catalog and modern metadata management tools.
- Solve complex data architecture and integration challenges using advanced technologies such as Spark, Delta Live Tables, Airflow, and MLflow.
Other
- This is a fixed term position that is expected to continue for a 1-year limited term from start date with a possibility for extension.
- Flexible work arrangements are available for this position, including the ability to work 100% remotely.
- This position provides life/work balance with typically a 40-hour work week and travel limited to training (e.g., conferences/courses).
- Must be authorized to work in the United States on a full-time basis for any employer without sponsorship.
- This position requires you to maintain internet service and a mobile phone with voice and data plans to be used when required for work.