Senior Engineering Manager - Critical Operations and Reliability Engineering
Netflix
Netflix is one of the world's leading entertainment services, with 283 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.
About Netflix
Netflix is revolutionizing entertainment by connecting people with movies and television globally through outstanding content and technological innovation. Our Infrastructure Engineering team provides the backbone for Netflix products, including Streaming Video on Demand (SVOD), Live, Ads, Games, and more, by building and operating an efficient, scalable, secure, and easy-to-use development platform and content delivery network. Join us as we push the boundaries of scale, performance, and resilience, empowering developers to create groundbreaking applications on a reliable platform.
Reliability engineering operates in a federated model at Netflix, with central teams building standard reliability practices and tooling that is leveraged across Streaming, Live, Ads, and Games teams. The federated model allows for a centralized approach to reliability while empowering domain-specific SRE teams to address unique challenges within their areas.
Role Overview
C.O.R.E is the central SRE team within Infrastructure Engineering that defines and drives reliability practices for all consumer-facing app development teams. The C.O.R.E team's mission is to improve the availability and reliability of Netflix's infrastructure while enhancing the operational readiness of its engineering culture, focusing on incident management and operational excellence.
As the Senior Manager of the CORE Site Reliability Engineering (SRE) team, you will lead the integration of Netflix's SRE model with industry-leading best practices. You will define and drive reliability practices for all consumer-facing product teams, ensuring that our services are reliable, scalable, and efficient. This role is pivotal in ensuring the reliability and performance of Netflix's services, driving innovation, and optimizing system operations to support the company's mission of revolutionizing entertainment.
Role Responsibilities
Strategic Leadership: You will lead & mentor the C.O.R.E SRE team while also setting the strategic vision and technical direction for worldclass system reliability, observability, and scalability.
Reliability: Your leadership will enable consumer-facing product teams to adopt standardized strategies for meeting reliability targets (eg SLO/SLI, error budgets etc).
Incident Management: You will manage high-severity incidents impacting Member Experience and/or Revenue across {SVOD, Live, Ads, Games}, conduct post-incident reviews, and provide ongoing incident trend analysis to prevent recurrence and improve system architecture.
Operational Excellence: You will drive down the operational cost of service ownership by optimizing system reliability and scalability via resilience experiments.
Automation and Tools: You will use both toward outcomes like easier deployment, monitoring, indicent response, alerting, resolution, etc.
Collaboration and Integration: You'll work closely with SREs, Dev teams, and Service owners to integrate reliability practices into SDLC and manage shared accountability for service health.
Requirements
Proven experience in a Senior Leadership Role within Site Reliability Engineering or a related domain.
Substantial experience commanding high-pressure and large-scale incidents.
Being open to participating in an on-call rotation, with shifts covering 24/7.
Extensive experience with high scale cloud platforms with a strong understanding of distributed systems, networking, and software engineering.
Experience working in a collaborative environment, influencing stakeholders across various levels of the organization. Ability to build strong relationships with engineering, product, and business teams.
Strong problem-solving abilities and a proactive approach to challenges.
Bachelor’s or Master’s degree in Computer Science, Engineering, or a related fields or equivalent work experience.
Our compensation structure consists solely of an annual salary; we do not have bonuses. You choose each year how much of your compensation you want in salary versus stock options. To determine your personal top of market compensation, we rely on market indicators and consider your specific job family, background, skills, and experience to determine your compensation in the market range. The range for this role is $480,000 - $1,200,000
Netflix provides comprehensive benefits including Health Plans, Mental Health support, a 401(k) Retirement Plan with employer match, Stock Option Program, Disability Programs, Health Savings and Flexible Spending Accounts, Family-forming benefits, and Life and Serious Injury Benefits. We also offer paid leave of absence programs. Full-time hourly employees accrue 35 days annually for paid time off to be used for vacation, holidays, and sick paid time off. Full-time salaried employees are immediately entitled to flexible time off. See more detail about our Benefits here.
Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.
We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.
Job is open for no less than 7 days and will be removed when the position is filled.