Reports to : Project Lead Experience: 5+ years Start date: 1st August 2022 Responsibilities Responsible for Toil Reduction, implementing identified improvement opportunities, and handling minor enhancement and non-ticketed activity. Define and monitor service level metrics that include Reliability metrics like MTTD, MTTR, MTBF, MTTF, Unavailability rate, Incident count, etc. Create rules to optimize incident response by metrics, streamlining alert flows, and collaboration and communication across squads. Proactively identify the issues that might disrupt the service in production. Address incoming service requests to their support groups/Jira tool. Create and maintain alerts. Change validation or change planning-related requests. Assist business stakeholders in determining SLO or adjusting threshold limits. Demand and capacity management & make corrections to SLI/SLO threshold limits. Gather and analyze metrics from both Infrastructure and applications to assist in bug fixing. Engage in capacity planning & performance tuning exercises. Partner with development teams to improve services through rigorous testing and release procedures. Participate in system design consulting, platform management, and capacity planning. Create sustainable systems and services through automation and uplifts. Balance feature development speed and reliability with well-defined service level objective (SLO, SLI). Debug production issues across services and levels of the stack. Required Skills and Qualifications Bachelor's degree in computer science or other highly technical, scientific discipline. Experience in AEM, Webservices/APIs. Experience in working with Public Clouds (Min 3 years experience is a must ). Experience with Git or other source control systems. Experience using tools to create and manage CI (continuous integration) and CD (continuous delivery) pipelines. Working knowledge in service level definitions and identifying the KPIs. Working knowledge of the TCP/IP stack, internet routing, and load balancing. Experience with distributed storage technologies like NFS, HDFS, Ceph. Experience in Observability strategy. Delivery Model: Onsite Job Type: Full Time Job Location: Auckland #J-18808-Ljbffr