Responsibilities
- Profile and improve the performance of ML workloads on different platforms (e.g. Nvidia, Apple, Qualcomm)
- Build highly-optimized GPU kernels for our Inference Engine
- Distill highly technical project outcomes in layman-approachable technical blogs to our customers and developers
- Mentor junior wizards and interns
Qualifications
- Skilled in debugging, profiling, and performance hillclimbing for GPU kernels
- Expert knowledge of parallel programming
- Familiarity with Metal and/or CUDA/Triton
- Advanced understanding of modern Deep Learning workload characteristics
- Basic understanding of fundamental Machine Learning concepts
- Android/Windows experience is a plus