Facebook Pixel

Job Description

The Senior System Reliability Engineer will be responsible for maintaining and enhancing the performance, availability, and resilience of complex IT systems and infrastructure within UAE-based organizations. Working in high-demand environments such as finance, telecom, government, or healthcare, this role focuses on automation, incident response, system monitoring, and capacity planning. The engineer ensures that all solutions comply with UAE data security regulations and business continuity requirements while supporting scalable and reliable operations.


Responsibilities:

  • Designing and implementing system reliability strategies to ensure high availability, fault tolerance, and efficient incident management.

  • Monitoring infrastructure and application performance using observability tools and proactively resolving system anomalies.

  • Developing automated solutions for deployment, scaling, recovery, and configuration to reduce manual intervention and increase uptime.

  • Leading root cause analysis (RCA) of critical incidents and implementing long-term fixes to prevent recurrence.

  • Collaborating with DevOps, IT, and security teams to ensure system integrity, compliance, and security as per UAE data governance standards.

  • Establishing and maintaining Service Level Objectives (SLOs) and Service Level Indicators (SLIs) in alignment with business needs.

  • Conducting capacity planning and performance tuning to meet growing system demands and user loads.

  • Participating in disaster recovery planning and ensuring readiness for business continuity events in accordance with UAE regulatory expectations.


Requirements:

  • Bachelor’s degree in Computer Science, Information Technology, or a related field.

  • Minimum of 5–7 years of experience in system reliability, infrastructure engineering, or DevOps, with at least 2 years in a senior role.

  • Strong expertise in Linux/Unix system administration, containerization (Docker/Kubernetes), and cloud platforms (AWS, Azure, or GCP).

  • Proficiency in automation tools such as Terraform, Ansible, and scripting languages like Python or Bash.

  • Experience with observability and monitoring tools such as Prometheus, Grafana, Datadog, or ELK stack.

  • Solid understanding of networking, load balancing, CI/CD pipelines, and high-availability architectures.

  • Knowledge of UAE-specific cybersecurity, data residency, and compliance standards is highly preferred.

  • Strong problem-solving, communication, and cross-functional collaboration skills.

  • Fluency in English is required; Arabic is a plus for local stakeholder interaction.


Job Details

Role Function: N/A Work Type: Full-Time
Role Level: Mid-Level Country: United Arab Emirates
City: Dubai Number of Vacancies: 1
Job Category: Engineering Company Website: https://www.talentmate.com/
Skills & Expertise

What We Offer

  • Health Insurance
  • Visa
  • Paid Annual Leaves
  • Maternity and Paternity Leaves

About the Company

Searching, interviewing and hiring are all part of the professional life. The TALENTMATE Portal idea is to fill and help professionals doing one of them by bringing together the requisites under One Roof. Whether you're hunting for your Next Job Opportunity or Looking for Potential Employers, we're here to lend you a Helping Hand.

Report

Similar Jobs

Disclaimer: talentmate.com is only a platform to bring jobseekers & employers together. Applicants are advised to research the bonafides of the prospective employer independently. We do NOT endorse any requests for money payments and strictly advice against sharing personal or bank related information. We also recommend you visit Security Advice for more information. If you suspect any fraud or malpractice, email us at abuse@talentmate.com.