CoBank is seeking to lead the integration of generative AI capabilities into its data platform, building scalable pipelines and infrastructure that power next-generation AI applications.
Requirements
- 10 years of progressive experience in data engineering, with a strong foundation in building scalable, secure, and high-performance data pipelines, including both batch and real-time architectures.
- 5 years of experience leading data architecture and engineering efforts in cloud environments (preferably AWS), with demonstrated expertise in tools such as Apache Spark, SQL, Python, Airflow, Kafka, and modern data lake/lakehouse architectures.
- 3 years of hands-on experience with GenAI/ML infrastructure, including experience designing and building embedding pipelines, vector databases (e.g., FAISS, Weaviate, OpenSearch), and Retrieval-Augmented Generation (RAG) architectures.
- Prior Experience processing and transforming unstructured data (e.g., PDFs, reports, knowledge bases) into AI-ready formats, including semantic chunking, metadata enrichment, and content classification.
- Prior Experience working with or integrating hosted LLM services via AWS Bedrock, Azure AI Foundry, or similar platforms, as part of production-ready applications.
- Proven track record in building and operationalizing data solutions that integrate with LLMs, including adapter-based fine-tuning techniques (e.g., LoRA, PEFT) and/or prompt engineering strategies to align model behavior with domain-specific use cases.
- Strong familiarity with CI/CD and GitOps workflows (e.g., GitHub Actions, ArgoCD), infrastructure-as-code (e.g., Terraform), and Kubernetes-based deployments (especially EKS).
Responsibilities
- Leads the design and optimization of scalable data pipelines and architecture to support GenAI use cases, including Retrieval-Augmented Generation (RAG) and adapter-based fine-tuning of LLMs using proprietary domain data.
- Builds and manages embedding workflows leveraging frontier LLMs (e.g., OpenAI, Anthropic, Meta, Mistral) and ensures efficient storage and semantic retrieval of vectorized content using vector databases such as FAISS, Weaviate, OpenSearch, Azure AI Search, or AWS Kendra.
- Develops ingestion and transformation workflows for unstructured content (e.g., PDFs, reports, emails), including chunking, semantic tagging, and metadata enrichment to power GenAI and semantic search applications.
- Partners with engineering, product, and data stakeholders to prototype and scale GenAI services, while establishing patterns and frameworks that lay the foundation for future ML and AI capabilities at CoBank.
- Defines and enforces data governance, quality, and compliance standards across GenAI pipelines for both structured and unstructured data, ensuring alignment with regulatory requirements (e.g., CCPA, GDPR, EU AI Act) and industry frameworks such as ISO/IEC 400, NIST AI Risk Management Framework (AI RMF), and OECD AI Principles.
- Provides architectural leadership and technical mentorship, ensuring integration of GenAI systems into CoBank’s analytics and EKS-based microservices platform, and aligning with existing CI/CD, GitOps and observability frameworks.
Other
- Hybrid work model: flexible arrangements for most positions
- Prior Experience mentoring technical teams and guiding architectural decisions across cross-functional stakeholders in engineering, product, and risk/compliance functions.
- Applicants must be authorized to work for any employer in the U.S. We are unable to sponsor or take over sponsorship of an employment visa at this time.
- CoBank is an Equal Opportunity Employer.
- All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, national origin, disability, or status as a protected veteran.