Software Engineer 5 - Foundational Infrastructure
Netflix
Netflix is one of the world's leading entertainment services, with over 300 million paid memberships in over 190 countries enjoying TV series, films and games across a wide variety of genres and languages. Members can play, pause and resume watching as much as they want, anytime, anywhere, and can change their plans at any time.
Observability Engineering is the organization that provides visibility and insight into Netflix engineering, this team is responsible for implementing infrastructure and tools to enable visibility into the health of thousands of applications that power Netflix. Thanks to the critical work of these teams, Netflix is able to power a highly available system that rivals the reliability of utilities.
The Observability engineering organization consists of three sub-groups: Telemetry, Exploration & Troubleshooting, and Monitoring & Alerting. The organization is expanding into a fourth sub group. This role is on the newly formed Foundational Infrastructure team, a crucial part of our Observability engineering organization. In this role, you will partner with engineering teams across Netflix to build tools and infrastructure that enable fellow engineers to gather timely and actionable insights to help the Netflix service operate reliably.
Our work impacts virtually every engineer at Netflix and we are looking for a talented engineering leader to join our team.
The Opportunity
We are looking for a Telemetry Collections Engineer familiar with Go and C++, our two primary programming languages for agents, to help design the user experience design and implementation for telemetry collection at Netflix. The goal is to deliver a consistent and dependable development and operations experience for users of the metrics, tracing and logs platforms. You will collaborate with both users and developers of the core telemetry libraries and collectors. This role will be part engineer, part DevOps engineer and part support engineer, with a focus on enabling the success of other engineers at Netflix when they use our tools, agents and SDKs to build and operate applications.
You should be able to understand and empathize with the customer development and operating experience, so that you can leverage these inputs in your designs. You should be able to determine and establish consistent data reporting formats across the tools, in order to support the creation of managed experiences.
You should be able to read existing library and collector code bases, so that you can submit pull requests to drive your recommendations forward. You should be able to write and maintain highly performant collectors that can run across the entire Netflix fleet. You should be comfortable prototyping new open-source telemetry libraries, in order to determine whether or not they will be useful to the Netflix ecosystem, and conducting performance testing to validate whether or not they can sustain the sampling rates that will be required. You should be ready to provide consulting and support to both data producers and consumers as a part of maintaining data quality across telemetry platforms.
One of your outputs will be documentation and training materials for the library and collector ecosystem, intended to ease on-boarding for new engineers and to provide useful references for experienced users of these tools.
You will be in charge of investigating and determining the best strategy for instrumenting third-party services both inside and outside of the protected network, and ensuring that Netflix engineers can enjoy a full observability experience across all of the services they run, wherever they are located.
If you enjoy working in a unique culture and building systems at scale that are critical to delivering the Netflix streaming experience, then come join our team!
What You Bring to the Table
Expertise building consistent and reliable client libraries and agents, with the ability to manage change responsibly across version releases.
Knowledge of the Go, C++, Java and the ecosystem. Other languages, such as Node.js or Python, are a plus.
Experience with instrumenting the collection of metrics, tracing and logs in applications and the ability to form opinions on what a streamlined developer experience should look like.
Strong Cloud/DevOps skills to help our team successfully roll out changes to our Telemetry systems for users.
eBPF knowledge and experience building eBPF programs is a strong plus
A positive attitude and the ability to empathize with the customer experience, while finding reasonable solutions that drive the state-of-the-practice of telemetry collection forward to a better place.
Sharing Is Caring
In this group, you'll have a chance to create software that is state of the art and foundational. Because of Netflix's desire to share technology and concepts, you'll be in the rare position of both working on this and sharing this knowledge with your peers outside Netflix. We believe this is unique to Netflix, and if it sounds amazing to you, we should talk.
Inclusion is a Netflix value and we strive to host a meaningful interview experience for all candidates. If you want an accommodation/adjustment for a disability or any other reason during the hiring process, please send a request to your recruiting partner.
We are an equal-opportunity employer and celebrate diversity, recognizing that diversity builds stronger teams. We approach diversity and inclusion seriously and thoughtfully. We do not discriminate on the basis of race, religion, color, ancestry, national origin, caste, sex, sexual orientation, gender, gender identity or expression, age, disability, medical condition, pregnancy, genetic makeup, marital status, or military service.