۲ ماه پیش
استخدام Infrastructure Observability Engineer در اسنپ
حضوری
سابقه دارد (۲ سال)
حقوق توافقی
تمام وقت
منقضی شده
اطلاعات بیشتر
استخدام اسنپ
مجموعه اسنپ در تهران جهت تکمیل کادر خود از واجدین شرایط زیر دعوت به همکاری می نماید:
| Infrastructure Observability Engineer | |
| As an Infrastructure Observability Engineer within the Platform team, you will work across observability platforms, infrastructure monitoring, and DevOps automation to ensure comprehensive visibility and high system reliability. You will maintain and enhance monitoring and logging stacks, analyze infrastructure events, and drive proactive improvements that strengthen performance and resilience. This highly technical role emphasizes automation and continuous optimization rather than reactive support. | Your Impact |
| Build, operate, and optimize monitoring and logging systems (Prometheus, Grafana, ELK, Zabbix, etc.) Ensure full observability coverage for infrastructure, networks, and services. Maintain alerting rules, dashboards, SLO/SLA metrics, and anomaly detection. Analyze logs and metrics to identify patterns and potential risks. Monitor infrastructure health across compute, storage, virtualization, and network layers. Perform root cause analysis of network-related incidents (Routing/Switching, load balancing, DNS, firewalls) Collaborate with network and datacenter teams on incident follow-ups. Maintain knowledge of network topologies, protocols, and traffic flows. Support improvement of infrastructure reliability and performance. Work with CI/CD pipelines to ensure reliable delivery and deployment processes. Develop automation for observability, monitoring, and operational workflows. Maintain Linux-based systems and automate routine infrastructure tasks. Contribute to reliability engineering initiatives (IaC, Docker, GitOps, auto-remediation, etc.) |
What You’ll Drive Forward |
| At least 2+ years of experience in NOC/IOC, SRE, infrastructure operations, DevOps, or a similar technical role. Strong hands-on experience with monitoring & logging stacks (Prometheus, Grafana, ELK, Zabbix, etc.). Solid understanding of networking fundamentals (CCNA Routing, Switching, VLANs, BGP, OSPF, load balancing) Strong Linux administration background. Familiarity with CI/CD tools (GitLab CI, ArgoCD, Jenkins, GitHub Actions, etc.) Hands-on experience with containerization (Docker) and service mesh tools Practical knowledge of automation using Bash, Python, or similar scripting languages. Ability to read and interpret logs, metrics, traces, and alerts. Strong communication and documentation skills, especially in technical reporting. Preferred Qualifications (optional) Experience designing observability architecture for large-scale infrastructure. Contribute to reliability engineering initiatives (Terraform, Ansible, Docker, GitOps, auto-remediation, etc.) Knowledge of ITIL Incident/Problem Management practices. Experience with cloud infrastructure or private cloud platforms. Experience with Kubernetes (cluster operation, troubleshooting, manifests, Helm, etc.) |
What Powers Your Drive |
متقاضیان واجد شرایط می توانند با کلیک روی لینک تکمیل فرم استخدام، رزومه خود را ارسال نمایند.
اطلاعات تماس
گزارش مشکل آگهی
- ثبتنام —
- ایمیل —
- تلفن —
دستهبندی آگهیهای استخدام