Senior Site Reliability Engineer

Details of the offer

Reports to : Project Lead Experience: 5+ years Start date: 1st August 2022 Responsibilities Responsible for Toil Reduction, implementing identified improvement opportunities, and handling minor enhancement and non-ticketed activity. Define and monitor service level metrics that include Reliability metrics like MTTD, MTTR, MTBF, MTTF, Unavailability rate, Incident count, etc. Create rules to optimize incident response by metrics, streamlining alert flows, and collaboration and communication across squads. Proactively identify the issues that might disrupt the service in production. Address incoming service requests to their support groups/Jira tool. Create and maintain alerts. Change validation or change planning-related requests. Assist business stakeholders in determining SLO or adjusting threshold limits. Demand and capacity management & make corrections to SLI/SLO threshold limits. Gather and analyze metrics from both Infrastructure and applications to assist in bug fixing. Engage in capacity planning & performance tuning exercises. Partner with development teams to improve services through rigorous testing and release procedures. Participate in system design consulting, platform management, and capacity planning. Create sustainable systems and services through automation and uplifts. Balance feature development speed and reliability with well-defined service level objective (SLO, SLI). Debug production issues across services and levels of the stack. Required Skills and Qualifications Bachelor's degree in computer science or other highly technical, scientific discipline. Experience in AEM, Webservices/APIs. Experience in working with Public Clouds (Min 3 years experience is a must ). Experience with Git or other source control systems. Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines. Working knowledge in service level definitions and identifying the KPIs. Working knowledge of the TCP/IP stack, internet routing, and load balancing. Experience with distributed storage technologies like NFS, HDFS, Ceph. Experience in Observability strategy. Delivery Model: Onsite Job Type: Full Time Job Location: Auckland #J-18808-Ljbffr


Nominal Salary: To be agreed

Source: Talent2_Ppc

Job Function:

Requirements

Engineering Team Lead - Wsl2024

Your responsibility is to maximize the value of customer and business applications throughout their lifecycle through leading a team of software engineers. T...


Watercare Services Limited - Auckland

Published a month ago

Capital Works Operator - Thames

Veolia Australia & New Zealand (Veolia) is the only global company to provide a full range of environmental services in the fields of water solutions, waste ...


Veolia - Auckland

Published a month ago

Civil Foreman

A leader within the Civil Industry is looking for an experienced Civil Foreman to join their team and grow their career. As the Civil Foreman, you will supe...


Extrastaff - Auckland

Published a month ago

Pivot Service Technician

About us Rainer Irrigation is a proud family-owned company based in the heart of New Zealand's agricultural hub - Mid Canterbury. Rainer has been providing i...


Rainer Irrigation Limited - Auckland

Published a month ago

Built at: 2024-11-26T14:38:37.811Z