Measuring and Monitoring the Availability and Reliability of Electronic Infrastructures
The electronic infrastructure availability / reliability monitoring service is designed for Service Level Monitoring, for medium and large electronic infrastructures. It monitors the status, availability and reliability of services provided by electronic infrastructures with moderate to high complexity. The service allows the creation of multiple reports using profiles, which are defined by the client (e.g. for managing SLAs, operations, etc.). It supports flexible development models and modular design, allowing easy communication with external systems (such as CMDBs, Service Catalogs, etc.). When calculating reports (using big data management technologies) the service takes into account a number of external factors, such as specific service parameters, scheduled or unscheduled downtime, etc.
How it works
The service checks the availability and reliability of the services as well as groups of services by simulating the actions that the respective user would do in it: for instance, in the case of a file transfer server (FTP), the service tries to transfer, save and delete a file in server. These tests are performed at regular intervals (e.g. every hour or 15 minutes) and their results are recorded to be included in the measuring of the availability and reliability of the service. The service supports different profiles for creating the topology and grouping of services and the algorithm for calculating the availability and reliability of services. The calculations are done in big data management and processing infrastructures (use of apache flink platform) which provide results both in real time and their correction at regular intervals. The service publishes the results of its calculations in real time, thus allowing the creation of a control panel of the status of the services (through a special interface that has been created) and provides the possibility of timely notification (alerts via email as well as SMS and slack messages) of administrators in case to detect an error.