Abnormal AI is investing strongly in building and supporting our world-class data pipelines that power our AI-native security platform at massive scale. As the founding member of our Data Engineering function, you will establish the technical and operational foundation for data excellence across the company. Your work will enable Abnormal to continue its steep growth trajectory while delivering enterprise-grade reliability and performance.
Requirements
- Proficiency in our stack: Python, Spark/PySpark, Airflow, SQL, dbt, Databricks, Snowflake, AWS.
- Proven track record of driving pipeline reliability to 99%+ uptime, including SLAs, observability tooling, and automated recovery patterns.
- Strong systems-thinking skills with the ability to debug complex distributed systems, optimize for performance and cost, and make architectural decisions balancing short-term needs with long-term scalability.
- Experience building or operating AI/ML data pipelines, including data readiness for training and evaluation.
- Experience with compliance frameworks such as GDPR, SOC2, FedRAMP, plus familiarity with PII handling and anonymization.
- Knowledge of multi-region data architectures, cellular/multi-tenant systems, or related large-scale distributed design patterns.
- Background in cybersecurity, threat detection, or email security.
Responsibilities
- Own mission-critical pipeline reliability: Take end-to-end ownership of our production data pipelines processing billions of messages weekly, ensuring 99.9% uptime for revenue-critical pipelines that directly enable sales and customer-facing AI products
- Build self-healing pipelines: Design and implement automated monitoring, testing, and recovery systems for data pipelines that eliminate manual intervention and reduce MTTR from hours to minutes
- Accelerate development velocity: Deploy CI/CD pipelines and self-service platforms that reduce deployment time from 3-5 days to under 2 hours, enabling Data Scientists to safely deploy models without engineering bottlenecks
- Architect for scale: Optimize data pipelines handling exponential annual growth, implementing cost-effective solutions that support regional expansion and compliance requirements (GDPR, FedRAMP, SOC2)
- Bridge technical and business domains: Partner with Sales, Finance, and Product teams to ensure data infrastructure aligns with business needs, making critical trade-off decisions when pipelines impact revenue
- Establish data engineering excellence: Define best practices for dbt, Airflow, Spark usage, PII anonymization, and cross-divisional data sharing while mentoring embedded Data Guild team members on these.
- Enable AI and accessible data consumption: Design and maintain an accessible semantic layer that provides consistent, trustworthy definitions and abstractions, making it easy for stakeholders to consume data and incorporate AI-driven insights into their workflows.
Other
- 6+ years of software engineering experience in backend, distributed systems, or data-focused roles.
- Proven experience designing and running large-scale, production-grade data pipelines.
- Demonstrated ownership mindset and ability to drive projects from conception to production independently, including on-call responsibilities for critical systems.
- Experience collaborating with Data Science, Analytics, Product, Finance, Marketing, and Sales, along with the ability to communicate technical decisions clearly to non-technical stakeholders and executives.
- Background in high-growth environments where data volume doubles annually, requiring frequent re-architecture and optimization.