Principal Site Reliability Engineer (SRE/DevOps)

Tripadvisor

Tripadvisor

Software Engineering
Oxford, UK
Posted on Wednesday, February 21, 2024

We believe that we are better together, and at Tripadvisor we welcome you for who you are. Our workplace is for everyone, as is our people powered platform. At Tripadvisor, we want you to bring your unique perspective and experiences, so we can collectively revolutionize travel and together find the good out there.

Tripadvisor captured the online travel market 20 years ago as a Boston-based startup before an online travel market existed. The fact that we still dominate the industry proves that we know how to operate a fast-moving technology company and hire the right people who allow us to maintain that lead throughout the many advancements in technology. As we enter the era of Large Language Models and mobile-based internet everywhere, we are poised to innovate again. As a Tripadvisor Engineer, you will work with some of the best and brightest minds that technology offers and learn best practices and engineering methodologies that will empower you for the rest of your career.

The Site Operations team at Tripadvisor maintains and enhances the core systems that power and support the Tripadvisor.com website. This includes systems in private data centers and over a hundred accounts in AWS. Our scope of responsibilities is vast, and listing them here would take an entire page. Suffice it to say that we are the go-to team for questions about the interface boundaries between these two halves of the company and the deep inner workings of our infrastructure.

As a Site Operations Engineer on the SiteOps team, you will be a force multiplier for our engineering and operations teams, delivering tooling & infrastructure that not only has a direct impact on day-to-day operations but also helps contribute to the future evolution of infrastructure and engineering here at Tripadvisor. You'll be part of a dynamic team responsible for ensuring our services' high availability, reliability, and scalability. We seek passionate engineers with experience in Python, Java, Ansible, PostgreSQL, CentOS, and Alma Linux to help us optimize and automate our infrastructure and deployment processes. We are currently involved in several types of systems migrations, within both the scope of on-prem to AWS/cloud-native migrations and on-prem data centers to alternate on-prem data center migrations. As a SiteOps Engineer, you will be involved in designing and implementing how we perform those migrations, testing them, and then performing them with a “no surprises in production” mindset.

What You'll Do:

  • Infrastructure Automation: Design, implement, and maintain automated infrastructure provisioning and configuration management using tools like Ansible to ensure consistency and scalability.
  • Monitoring and Alerting: Set up monitoring and logging systems to proactively detect and address potential issues, ensuring optimal performance and reliability in environments like on-prem Prometheus/Thanos, Grafana Cloud, and Grafana Cloud Loki.
  • Database Management: Manage hundreds of on-prem PostgreSQL databases, including performance tuning, backups, and disaster recovery strategies.
  • Collaboration: Work closely with cross-functional teams, including developers and system administrators, to improve the overall development and deployment processes.
  • Troubleshooting and Incident Management: Assist in identifying and resolving operational issues and participate in on-call rotations.

Skills and Experience:

  • Bachelor's degree in Computer Science, Engineering, or related field (or equivalent experience).
  • Proven experience as a DevOps Engineer or similar role, focusing on building and maintaining scalable infrastructures.
  • Strong proficiency in Python for scripting and automation tasks.
  • Expertise in configuration management such as Ansible or Puppet.
  • Solid understanding of PostgreSQL and experience in managing PostgreSQL databases.
  • Hands-on experience with CI/CD tools like Jenkins, GitLab CI, and GitHub Actions.
  • Knowledge of containerization technologies like Docker and container orchestration tools like Kubernetes is a plus.
  • Understanding of networking concepts such as load balancing and DNS.
  • Strong problem-solving skills and the ability to work in a fast-paced, agile environment.

If you need a reasonable accommodation or support during the application or the recruiting process due to a medical condition or disability, please reach out to your individual recruiter or send an email to AccessibleRecruiting@Tripadvisor.com and let us know the nature of your request . Please include the job requisition number in your message.

#LI-AMCVAY