Performance Engineer 5/6
Netflix
Netflix is one of the world's leading entertainment services, with over 300 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.
About the team
Netflix has built a reputation as a world leader in Performance Engineering, and the Netflix Performance Tech blog is one of the de facto resources in the Industry. We are a small, seasoned, and highly impactful team that has a massive impact across Netflix engineering by being a trusted expert to countless teams, getting involved in the hardest problems and most critical projects to ensure Netflix is always delivering the very best performance to its customers.
About the role
We are looking for a highly experienced Performance Engineer to join our team, focusing on the critical area of GPU infrastructure efficiency and the optimization of large-scale AI/ML workloads. This role is essential to managing our rapidly growing computational footprint, ensuring we deliver maximum performance while optimizing cost and resource utilization. You will be a trusted expert, working at the intersection of infrastructure, ML platforms, and core engineering to drive meaningful impact across the organization.
What you will do:
Drive efficiency and performance optimization across our large-scale infrastructure.
Collaborate with ML Platform and Data Science teams to build and enhance comprehensive profiling, tracing, and observability capabilities for GPU workloads.
Analyze and resolve complex performance bottlenecks across the entire stack, including hardware, drivers, OS, Kubernetes/scheduling, networking, storage, and application code.
Evaluate and guide the adoption of new GPU architectures, interconnects, and cloud vendor services to maximize performance and cost efficiency within Netflix's AI/ML ecosystem.
Share knowledge by documenting best practices, contributing to Netflix Tech Blogs, and presenting at industry and vendor forums.
Must-Have Skills:
10+ years of experience in systems performance analysis and optimization with a focus on large-scale distributed systems.
Deep understanding of GPU architecture, kernels, and ML frameworks.
Experience in building and using CPU and GPU profiling and other performance analysis tools.
Expertise in identifying and resolving performance bottlenecks within the AI/ML infrastructure and software stack.
Experience with container orchestration platforms such as Kubernetes.
Experience with performance analysis and optimization in a multi-tenant, cloud-native environment.
Strong programming skills in languages such as Python and Java.
Nice-to-Have Skills:
Experience with large language model (LLM) serving and training optimization techniques.
Understanding of Linux internals such as resource scheduling, memory management, and I/O for GPU-intensive workloads.
Experience with the performance analysis of high-speed networking protocols and interconnect technologies, such as InfiniBand and NVLink.
Experience with capacity engineering and cost optimization in a major public cloud environment.
Proven track record of contributing to open-source performance tools or research in the field.
*********************
Our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $230,000 - $960,000.
Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more detail about our Benefits here.
Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.
We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.