DataDirect Networks (DDN) is seeking to drive end-to-end performance analysis and optimization for Infinia, their AI-native, highly distributed data intelligence platform, to deliver customer value at scale.
Requirements
15+ years of experience in systems performance engineering or low-level infrastructure development.
Proven expertise in end-to-end performance debugging across distributed systems, from application logic down to I/O subsystems.
Strong background in system-level tools (e.g., perf, ftrace, bpftrace, nvme-cli, or similar).
Deep understanding of NVMe, CPU and memory hierarchy, caching, thread scheduling, and Linux kernel internals.
Experience optimizing compute- or data-intensive applications such as AI inference, search, or analytics.
Proficiency in C/C++ or Rust and scripting languages such as Python or Bash.
Familiarity with vector databases (e.g., FAISS, Milvus, Weaviate) and LLM inference pipelines.
Responsibilities
Design and implement comprehensive performance instrumentation across the Infinia platform — from AI pipelines to storage backends.
Develop and maintain end-to-end performance models that span applications (e.g., LLM inference, vector search), query engines, data pipelines, and NVMe-backed storage.
Build and execute reproducible performance tests that simulate realistic AI and data-intensive workloads.
Use advanced profiling and tracing tools (e.g., perf, eBPF, flamegraphs, custom telemetry) to identify and address latency hotspots, bandwidth bottlenecks, and concurrency inefficiencies.
Partner with component teams (I/O Path, Core Platform, Data Engine, and AI Applications) to deliver performance fixes and architectural recommendations.
Work closely with hardware and storage teams to analyze I/O patterns on NVMe drives and optimize storage usage for real-world applications.
Provide tuning guidance for AI/ML applications, vector databases, and orchestration layers to maximize system utilization and efficiency.
Other
Occasional in-person meetings or team events may be required.
Participation in an on-call rotation to provide after-hours support as needed.
Excellent written and verbal communication skills, especially around technical reporting and architectural recommendations.
Strong communication skills in all our engineers and researchers, as they are crucial for the success of our teams and the company as a whole.