Quadric has created an innovative general purpose neural processing unit (GPNPU) architecture. Quadric's co-optimized software and hardware is targeted to run neural network (NN) inference workloads in a wide variety of edge and endpoint devices, ranging from battery operated smart-sensor systems to high-performance automotive or autonomous vehicle systems. Unlike other NPUs or neural network accelerators in the industry today that can only accelerate a portion of a machine learning graph, the Quadric GPNPU executes both NN graph code and conventional C++ DSP and control code.
Requirements
- Deep grasp of fixed‑point arithmetic, quantization theory, and statistical calibration.
- Fluent in Python, PyTorch or TensorFlow, NumPy/Pandas/SciPy, and data‑viz tools (Matplotlib/Plotly).
- Hands‑on with at least one quantization toolkit (PyTorch FX/PTQ/QAT, TF‑Lite, ONNX‑Runtime, TVM, MLIR Quant).
- Working knowledge of CNNs, Transformers and DNN architectures
Responsibilities
- Design statistically rigorous experiments to compare PTQ, QAT, pruning, and mixed‑precision schemes on vision, language, and multimodal models.
- Build calibration datasets; develop Python notebooks/dashboards to track accuracy, latency, power, and memory trade‑offs.
- Perform layer‑ and token‑level error analysis to guide numerical‐format choices.
- Partner with compiler team to convert your findings into turnkey SDK flows and reference configs.
- Publish internal whitepapers, external benchmarks, and present results to customers and at industry events.
- Monitor academic literature in compression and efficient inference; translate promising ideas into reproducible prototypes.
Other
- M.S./Ph.D. in CS, EE, Applied Math, or similar, with 5 + years in ML model optimization or data‑science‑driven research.
- Integrity, Humility, Happiness
- Initiative, Collaboration, Completion