Building privacy-preserving large language model (LLM) capabilities to support hardware design workflows involving Verilog/SystemVerilog and RTL artifacts, enabling advanced use cases such as code generation and refactoring, lint explanation, constraint translation, and spec-to-RTL assistance, while operating within strict enterprise security and data-privacy boundaries.
Requirements
- 3+ years working hands-on with transformers or LLMs
- Deep expertise with PyTorch, Hugging Face (Transformers, PEFT, TRL), and distributed training frameworks (DeepSpeed, FSDP).
- Experience with quantization-aware fine-tuning, constrained decoding, and evaluation of code-generation models.
- Amazon Bedrock (model usage, customization, Guardrails, runtime APIs, VPC endpoints)
- SageMaker (Training, Inference, Pipelines)
- Core services: S3, EC2/EKS, IAM, KMS, VPC, CloudWatch, CloudTrail, Secrets Manager
- Solid software engineering fundamentals: testing, CI/CD, observability, and performance optimization.
Responsibilities
- Own the technical roadmap for RTL-focused LLM capabilities, from model selection and fine-tuning through deployment and continuous improvement.
- Fine-tune and customize transformer models using modern techniques such as LoRA/QLoRA, PEFT, instruction tuning, and preference optimization (RLAIF).
- Design and operate HDL-aware evaluation frameworks, including: Compile, lint, and simulation pass rates, Pass@k metrics for code generation, Constrained/grammar-guided decoding, Synthesis-readiness checks
- Build and maintain secure, privacy-first ML pipelines on AWS, including: Amazon Bedrock for managed foundation models, SageMaker and/or EKS for bespoke training and inference, Encrypted storage (S3 + KMS), private VPCs, IAM least privilege, CloudTrail auditing
- Deploy and operate low-latency, production inference using Bedrock and/or self-hosted stacks (vLLM, TensorRT-LLM), with autoscaling and safe rollout strategies.
- Establish a strong evaluation and MLOps culture with automated regression testing, experiment tracking, and model documentation.
- Drive product integration with internal developer tools, CI workflows, IDE plug-ins, retrieval-augmented generation (RAG), and safe tool-use.
Other
- Staff-level role will provide technical leadership
- Lead and mentor a small team of applied ML engineers and scientists; review designs and code, remove technical blockers, and drive execution.
- Partner with hardware engineering, EDA, security, and legal stakeholders to ensure compliant data sourcing, anonymization, and governance.
- Mentor engineers on LLM best practices, reproducible experimentation, and secure system design.
- Proven experience shipping LLM-powered features to production and leading cross-functional technical initiatives.
- Excellent communication skills and the ability to influence both technical and executive stakeholders.