DataDirect Networks (DDN) is seeking a QA Lustre Architect to lead system validation and test architecture efforts for petabyte-scale data and High-Performance Computing (HPC) environments, ensuring correctness, performance, and resilience of storage solutions.
Requirements
- Experience with HPC workloads and environments, including MPI, high-throughput clusters, InfiniBand, and RDMA
- Strong understanding of POSIX file systems, object storage interfaces (e.g., S3), and parallel file systems
- Proficiency in automation and scripting (Python, Bash, Rust)
- Hands-on experience with storage benchmarking and profiling tools: IOR, MDTest, FIO, Vdbench, Perf, iostat, collectl
- Familiarity with CI/CD tools and infrastructure-as-code (e.g., Jenkins, GitLab CI, Ansible, Terraform)
- Solid understanding of system-level debugging and analysis tools
- Experience with BDD frameworks such as Cucumber, Gherkin, or similar
Responsibilities
- Validation of distributed storage systems (e.g., Lustre, GPFS/Spectrum Scale, BeeGFS, GlusterFS)
- Architect scalable test frameworks and automation pipelines to validate storage performance, throughput, IO behavior, and system reliability at scale
- Design test plans that cover key areas such as metadata operations, object lifecycle, parallel IO, file system consistency, and failure scenarios
- Lead performance benchmarking using industry-standard tools and custom workloads (e.g., IOR, MDTest, FIO, Vdbench)
- Validate integration with HPC compute clusters, schedulers (e.g., Lustre), and storage tiers (e.g., NVMe, SSD, HDD)
- Simulate large-scale distributed environments and execute fault-injection and resilience testing
- Collaborate with product managers, architects, and DevOps teams to ensure test coverage across CI/CD pipelines and production-like environments
Other
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, or related field
- 8+ years of experience in software QA or systems testing, with 3+ years in a QA lead or technical lead role
- Strong communication skills and ability to lead cross-functional quality initiatives
- Participation in an on-call rotation to provide after-hours support as needed
- Ability to work in a dynamic and driven team with a flat organizational structure