JOB DESCRIPTION

Reporting to Service Reliability Engineering Lead - Systems Engineering. The Service Reliability Engineer will be responsible for ensuring system availability, performance, efficiency, change management, monitoring, emergency response, security and capacity planning. In addition, this role will be response for: -

  • Ensuring operational excellence through proactively building and implementing services, including end to end monitoring, scripting and automation, modern tooling, and maintenance of software.
  • Providing software-related operations support, including managing level two and level three incident and problem management.
  • Define, measure, monitor and report key SRE performance indicators and escalate breaches and violations. 
  • Documenting “tribal” knowledge and constant upkeep of the playbooks and runbooks to ensure teams get the information they need right when they need it.
  • Implementation of machine learning, self-healing and drive the organization towards a no-ops model.

RESPONSIBILITIES

  • Run the production environment by monitoring availability and taking a holistic view of system health.
  • Implement SRE frameworks and practices within the organization using the systems operations tool chain.
  • Strong familiarity with web servers and load balancing technologies.
  • Operational Excellence – ensure systems availability, performance, efficiency, change management, monitoring, emergency response, security, and capacity planning.
  • Stakeholder Engagement - Engage the business teams and promoting a culture of participation and collaboration to enhance effective and informed decision making.
  • Define, measure, monitor and report key systems reliability performance indicators and escalate breaches and violations, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
  • Create sustainable systems by driving continuous improvement of the applications through chaos experiments, automation, ML/AIOPs and proactive alerting strategies.
  • Problem and Incident management – ensure level two and level three support and incidents are addressed within SLA.
  • Continually improve skills and competencies by proactively participating in various internal and external training opportunities and stretch assignments.
  • Research on new fit for future technologies and actively implement the viable solutions.

QUALIFICATIONS

  • Bachelor’s degree in computer science, Information Systems, Software Engineering, IT, or another related field.
  • More than three years of work experience in programming and /or systems analysis applying agile frameworks.
  • Experience working with agile methodologies, such as Scrum, Kanban, XP, LSD, and FDD.
  • Experience using code versioning & collaboration tools such as Git, Docker.
  • Strong analytical and problem-solving skills
  • Strong knowledge of software architecture principles.
  • Experience working in cloud-native environments such as AWS
  • Experience working with multiple programming and markup languages, such as Android, IoS, HTML, CSS, JavaScript, Java, Ruby, PHP, SQL, XML, JSON, YAML, and Python, and paradigms such as object-oriented-, even-driven-, procedural-, functional-, and declarative programming.
  • Experience in Unix/Linux/AIX Operating System and application security technologies (e.g. SSL)
  • Professional experience and knowledge of the telecommunications industry preferred.
  • Competency in system and application administration and practices preferred.
  • Individual thinker with the ability to identify and drive new uncharted solutions.
  • Ability and willingness to share knowledge with individuals with varying levels of experience.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks

 

How to Apply
If you feel that you are up to the challenge and possess the necessary qualification and experience, kindly proceed to update your candidate profile on the recruitment portal and then Click on the apply button. Remember to attach your resume.

 

 

ABOUT US

We are the leading telecommunication company in East Africa. Our purpose is to transform lives by connecting people to people, people to opportunities and people to information. We keep over 42 million customers connected and play a critical role in the society, supporting over one million jobs both directly and indirectly while our total economic value was estimated at KES 362 Billion ($ 3.2 billion) for the 12 months through March 2021. We are listed on the Nairobi Securities Exchange (NSE) and with annual revenues of close to KES 298 Billion ($2.5 billion) as at March 2022. We were founded in 1997 as a fully owned subsidiary of Telkom Kenya before a 40 percent acquisition by Vodafone Group PLC in May 2000, and a public offering of 25 percent shares through the NSE in 2008. Under the management of Vodafone Group PLC, we welcomed Michael Joseph, as our first CEO, a few months later in July of 2000. He led the company’s growth to accommodate 16.71 million subscribers from the previous 20,000, largely owing to innovative products like M-PESA in 2007.

Follow Us on Social Media