SailPoint is seeking a Senior Data Engineer to help build a new cloud-based SaaS identity security product. This role will focus on creating scalable and reliable data pipelines using a diverse and modern data stack.
Requirements
- Strong proficiency in programming languages like Python, Scala, or Java.
- Expert-level knowledge of Snowflake for data warehousing, including data modeling and query optimization.
- Hands-on experience with Apache Spark for large-scale data processing.
- Experience with Apache Flink for building real-time data streaming applications.
- Strong experience with Cassandra for managing and optimizing distributed data storage.
- Solid understanding and hands-on experience with graph databases and their query languages (e.g., Gremlin, Cypher).
- Familiarity with a major cloud provider (e.g., AWS, Azure, GCP) and its data-related services.
Responsibilities
- Data pipeline development: Design, construct, and optimize scalable ETL/ELT pipelines for both batch and real-time data using DBT, Apache Spark and Apache Flink.
- Data warehousing: Develop and manage data schemas and warehouses within Snowflake, ensuring data is organized for efficient querying and analysis.
- Database management: Collaborate with infrastructure teams to administer and optimize data storage solutions using Cassandra for high-velocity, wide-column data, and graph databases for complex relationship-based data.
- Real-time streaming: Build and maintain data ingestion workflows for real-time applications using Apache Flink to process data from sources like Apache Kafka.
- Performance optimization: Tune complex Spark, Flink, SQL, and CQL queries to improve performance and reduce cost within Snowflake and other database systems.
- Data quality and governance: Implement and enforce data quality standards, monitor pipelines, and establish data governance policies to ensure data integrity and security.
- Infrastructure management: Collaborate with DevOps teams to manage and automate the deployment of data applications using CI/CD pipelines.
Other
- 6+ years of professional experience in data engineering or a similar role.
- Bachelor's degree in Computer Science, Engineering, or a related technical field.
- Experience with other components of the big data ecosystem (e.g., DBT, Apache Kafka, Airflow).
- Experience with containerization technologies like Docker and Kubernetes.
- Familiarity with data governance and observability tools.