Reporting to Technical Product Owner – SRE Squad. The Senior Service Reliability Engineer will be responsible for ensuring system availability, performance, efficiency, change management, monitoring, emergency response, security, and capacity planning.

Responsibilities

  • Run the production environment by monitoring availability and taking a holistic view of system health
  • Create sustainable systems by driving continuous improvement of the applications through chaos experiments, automation, ML/AIOPs and proactive alerting strategies
  • Problem and Incident management – ensure level two and level three support and incidents are addressed within SLA
  • Implement SRE frameworks and practices within the organization using the systems operations tool chain
  • Strong familiarity with web servers and load balancing technologies
  • Operational Excellence – ensure systems availability, performance, efficiency, change management, monitoring, emergency response, security, and capacity planning
  • Stakeholder Engagement - engage the business teams and promoting a culture of participation and collaboration to enhance effective and informed decision making
  • Define, measure, monitor and report key systems reliability performance indicators and escalate breaches and violations, with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
  • Continually improve skills and competencies by proactively participating in various internal and external training opportunities and stretch assignments
  • Research on new fit for future technologies and actively implement the viable solutions

Qualifications

  • Bachelor’s Degree in Computer Science, Information Systems, Software Engineering, IT, or another related field
  • More than 3 years of work experience in programming and /or systems analysis applying agile frameworks
  • Experience working with agile methodologies, such as Scrum, Kanban, XP, LSD, and FDD
  • Experience using code versioning & collaboration tools such as Git, Bitbucket
  • Strong analytical and problem-solving skills
  • Strong knowledge of software architecture principles
  • Experience working with multiple programming and markup languages, such as HTML, CSS, JavaScript, Java, PHP, SQL, XML, JSON, YAML, and Python, and paradigms such as object-oriented-, even-driven-, procedural-, functional-, and declarative programming.
  • Experience in Unix/Linux/AIX Operating System and application security technologies (e.g. SSL)
  • Professional experience and knowledge of the telecommunications industry preferred
  • Competency in system and application administration and practices preferred
  • Individual thinker with the ability to identify and drive new uncharted solutions
  • Ability and willingness to share knowledge with individuals with varying levels of experience
  • A proactive approach to spotting problems, areas for improvement, and performance bottleneck

Follow Us on Social Media