Monitoring & Incidents Analyst
Full TimeBookmark Details
About the job
Job Title: Monitoring & Incidents Analyst
Type of contract: One year extendable
Main Role (Overall Accountability)
Our client is looking for a Command Centre Monitoring & Incidents Analysts to operate
within the bank 24/7 command center. The command center is used for end-to-end
monitoring and observing all IT services using the relevant monitoring and observability
tools implemented or being implemented at bank side. The candidate will be part of the
team responsible for operating the round clock command center and monitoring the entire
bank IT assets using the monitoring capabilities applied within the command center. This
will involve working on shifts as per the shift roster prepared by the supervisors.
Overall Responsibility
This candidate is required work on shift to perform 24×7 command center duties and ensure all monitoring and observability tools are working in an uninterrupted manner.
This role shall entail contributing to our command center work activities involving monitoring of all 24×7 critical systems operation, following up service requests for all requests initiated from monitoring tools, and perform emergency escalation and reporting management.
Liaise with on-site shift technical team to perform system inspection, emergency repair and emergency management with an aim to achieve a 100% facility uptime for our IT Services operations.
Perform emergency escalation and reporting when systems abnormality occurs.
Perform proper handover and takeover of daily duties to next shift co-worker by clearly indicating all task or work duties to follow-up by next shift co-worker, this include prepare daily 24×7 shift handover report and work activities summary.
Provide incident report details information to Command Centre Leadership for
preparation of Incident report and ready to issue within SLA.
Assist in preparation of historical data log from the monitoring tools whenever required.
Be part of a Command Centre team to handle Incident and Problem Management Assess and validate major incidents.
Manage notifications and escalations as defined in the major incident management
Help in coordinating recovery actions and plans for major incidents to resolution
with the respective applications owners.
Provide timely and informative updates to management, stakeholders, and users
until incident closure.
Participate in post-incident root cause analysis (RCA) as required and follow up on improvement plans.
Understand and track outstanding actions, improvement plans for incidents escalated to Command Centre until closure.
Provide monthly incidents trend updates to management.
Work in close collaboration with internal teams throughout the life cycle to ensure cross-team alignment.
Contact right support & vendors team(s) on time in case of incident (after scrutinizing the event/alert with the subject matter experts)
Arrange triage in case of crisis
Monitoring using existing tools & new EPM Perform pre-defined recovery process (following the runbooks).
Create & Maintain knowledge library for Command center team to operate and detect
Manage and follow-up on the incidents created by the monitoring tools/team
Follow-up (RCA, Problem Calls, Implementation)
Lead crisis calls and manage the war-room.
Classify incidents based on priority (e.g., critical, high, medium, low)
Coordinate between various IT teams & vendors to ensure swift resolution of incidents.
Escalate incidents that require additional resources or senior management
Maintain detailed records of incidents, actions taken, and resolution processes.
Generate reports on incident management performance and incident trends.
Conduct post-incident reviews to assess the effectiveness of the crisis response.
Update crisis management plans based on lessons learned from the incident
Act as the primary point of contact for all IT-related major incidents and crises.
Ensure all issues are logged in appropriate internal and vendors tracking tools.
Must make sure that regular updates on progress are conveyed to line managers.
Ensure that appropriate corrective and preventive actions are undertaken and resolve problems as soon as they arise.
Must ensure compliance to Risk Management and Audit standards.
Must contribute ideas to help the support team to become more effective and seek
ideas from other team members.
Specific Responsibilities:
1. System Maintenance
Ensure that a detailed impact analysis of any issue is carried out and viable solution is recommended with the help of the respective application custodians.
2. System Enhancements
Must ensure to use the Bank’s methodology for any enhancements undertaken.
Come up with appropriate solutions to support line managers in making decisions
and using right methodology during any development or issue resolution.
Research and evaluate emerging technologies and trends and suggest course of
action to line managers.
3. Projects
Assist and advise digital transformation and change management teams on projects
related to Observability and Monitoring System.
Educational & Personnel Specifications
Bachelor degree in IT or any related discipline.
At least 3 years’ experience as an Incident or Problem Manager in an IT Application
Operations environment.
3+ years’ experience in performance monitoring and observability tools like
Dynatrace, Riverbed NPM, SolarWinds, Grafana, etc.
Proven Techno-functional knowledge in performance and observability tools.
Proven Techno-functional knowledge in IT related fields.
SRE certification is an added advantage.
Good understanding of the market trends and current technology.
Good presentation skills, ability to express complex technical and business topics
work experience in the banking industry is considered as a competitive advantage
Good level of programming knowledge in various languages.
Good level of knowledge in database management systems and SQL.
Solid knowledge in system design using Structured and Object-Oriented
methodology and good knowledge of SDLC.
Good knowledge of current technology in the IT industry.
Documentation & Report/MIS Preparation
Good communication, presentation skills with good command of written English.
Good Interpersonal relations with pleasing personality.
Share
Facebook
X
LinkedIn
Telegram
Tumblr
Whatsapp
VK
Mail