Data center growth around the world
It is estimated that there are more than 7200 different data centers, and more than half of them are located in the USA, UK, Germany and China.
The largest data center campus is in Tahoe Reno, Nevada, USA (The Citadel) and covers an area of 670.000 square meters. Energy needed to power and cool the massive amounts of equipment in data centers incur almost 40% of their total operational costs.
Due to the COVID-19 pandemic, the acceleration of digital transformation processes is present in just about every industry. The demand for colocation services and hyper scale vendors is on the rise and in 2020 there were nearly $17 billion investments in public cloud infrastructure.
Data center providers are working hard to manage and control all the resources that are placed in their data centers (either by them or by their clients). All servers, routers, power supplies, UPS, air conditioning, or racks needs to be documented and monitored to enable smooth operations and no (or very little) downtime.
Data center infrastructure management
Data center infrastructure management (DCIM) solutions are usually used to monitor, measure, manage, and forecast data center utilization and energy consumption of used or placed resources (network, servers, power, HVAC, etc.).
A DCIM solution is one of the most important assets that a data center provider possesses to enable smooth operations and easy placement (or removal) of new equipment. By storing all resource information (location, vendor, product number, serial number, nominal power consumption, heat dissipation, etc.) in one central solution, data center providers have a better overview of floor space, electrical power, or cooling capacity requirements. The same data can be used to forecast the growth of a data center, including when and how to invest in additional power or cooling equipment.
A DCIM solution should enable inventory and lifecycle management on all physical and virtual resources and their links, dependencies, and relationships. All cables, including patch cables, patch panels, junction boxes, splice trays, trays, shafts should be documented and connected to other resources. This should enable end-to-end signal tracing across all devices and cables.
Additionally, a DCIM solution should track data capacity, power supply, white space, and air conditioning utilization and ensure that it meets the needs of users or clients, maximizes utilization of existing resources, and increases resource density.
The rising need for data center NOCs (network operations centers)
In a telecom operator’s environment, the network operations center (NOC) is one of the most important technical departments. It is responsible for monitoring one or more network domains for certain conditions that may require special attention to avoid degraded end customer service. NOC engineers use a set of operations and business support system (OSS and BSS) solutions that enable access to inventory, fault, and performance information collected from network devices.
In the world of data centers, there is a growing need to manage and monitor data center equipment. Data center providers have recognized this and are establishing their own NOC departments. Usually, these departments are small and rely on one or more open source or custom-made monitoring tools.
For instance, one tool is used to monitor the network part, another for servers, hypervisor has its own monitoring tool, air conditioning has a completely different one, and so on. As a data center grows, the management and configuration of these tools become more complex and require more time from NOC engineers. Time they could spend on other operational activities.
Another common problem is that NOC engineers have to manually view and correlate alarms from different monitoring tools. For example, if there is a high CPU on a network switch there will be an alarm in the network monitoring tool. This causes increased response time from the database server and longer loading of an end customer web application that is monitored in a custom monitoring tool. At the same time, the IT monitoring tool will detect increased latency in the database server access and generate its own alarm.
In this example, the NOC engineer would (probably) see three different alarms (in three different applications or browsers) or in the worst-case scenario, three NOC engineers would see only one alarm in one of the monitoring tools and start troubleshooting independently of each other. This will increase time to restore and create confusion since one engineer could interfere in another’s troubleshooting.
Similarly, there is a problem to track all the physical resources (servers, routers, switches, UPS, etc.) that are added, removed, or changed in a data center. Most of the data center providers rely on Excel files and manual work to document changes in their data centers. Problems start when these files are not regularly updated, and new equipment is constantly being installed or planned. This can lead to unexpected costs or activities where an Excel file shows acceptable power, air conditioning, and rack space utilization, but the utilization is near full.
UMBOSS in a data center environment
UMBOSS is a modular umbrella product suite that addresses critical challenges that data center providers face. Its modules include:
- Fault Management
- Performance Management
- Automatic Discovery and Reconciliation Management
- Resource Inventory Management
- “Single Pane of Glass” Portal
UMBOSS collects, stores, and visualizes all relevant information in one solution enabling faster correlation and access to necessary data. By integrating UMBOSS with DCIM solutions (for example FNT Command), data center providers can accurately plan and forecast reports based on actual data collected from equipment placed in their data center.
UMBOSS can be used to replace existing monitoring tools and monitor the equipment directly or can be integrated with existing monitoring tools and collect all required information from them. Also, collected information can be stored in a DCIM solution and used for other calculations.
For example, UMBOSS will use Simple Network Monitoring Protocol (SNMP) to collect information about power consumption for a physical server or an outer from a power distribution unit, store that information internally, and use an integration point to update the actual power consumption in the DCIM solution. This can be done periodically to ensure that the DCIM solution has the latest resource power consumption information, thus creating better and more accurate reports or forecasts.
Another benefit of implementing UMBOSS in a data center environment is the ability to monitor resources or services located in public cloud providers (like Microsoft Azure, Amazon AWS, etc.). In a hybrid cloud scenario, UMBOSS can process and correlate both alarms received from equipment placed in a data center and services located in the cloud.
Setting up a Data Center NOC
Consolidating all alarm and performance information in one central solution like UMBOSS is only a part of data center provider NOC establishment. Another important part is to implement required business processes and operational changes.
A NOC department, i.e., a central monitoring center, should be officially established and staffed with employees that can be both proactive and reactive to detected faults or problems. Employees in the NOC department should be aware of the resources that are being monitored (including network, power, cooling, access control, etc.) and have established procedures to handle different types of problems or faults. These procedures can be written in a simple knowledge base in a wiki or a shared disk. This way the onboarding of new NOC employees can be quicker and faster since all “knowledge” is stored in a central place that is accessible to everyone. Also, documented procedures can be easily changed or updated to keep up with the changes in certain parts of the data center provider offering or domains.
The central monitoring center or NOC department should have a dedicated room where all employees can work on resolving alarms or problems and share information in case of a major failure.
One of the most useful tools to enable a better overview of a data center provider’s resource state is to implement a central screen with a list of all the active alarms or problems in the data center. Large telecom providers have dedicated video walls that can display the whole network and active alarms, but in a data center provider environment this is overkill. Usually, this can be handled by implementing a big screen TV or a monitor on a wall where everyone in the room can see active alarms and react if required. Also, another useful feature is to implement a sound notification when a critical alarm is shown on the screen.
Monetizing NOC and DCIM solutions
The goal of implementing monitoring and DCIM solutions in a data center provider environment is to enable easier tracking and monitoring of resources placed in one or more data centers or data rooms.
These solutions can be also monetized by offering access to certain information to an end customer. For example, a data center provider end customer leases one data rack and places their own equipment in the rack. The data center provider can offer an extra service and enable their end customer access to the multi-tenant DCIM solution and document their own resources. This way the end customer can manage the rack space and have a better relationship with the data center provider.
Also, the end customer can have access to all the alarms that are detected for the leased rack or can even enable monitoring of their own equipment and have access to more detailed information. Access to DCIM or monitoring information can be through a self-care portal or even a mobile application.
Huge benefits for data center operators
Implementing fault, performance, and infrastructure management solutions can bring many benefits to a data center provider. These benefits can be internal to enable optimized and efficient operational activities or can be used to generate new revenue streams and enable a better relationship with the end customer.
Inceptum can help data center providers with setting up central monitoring, a NOC department, or data center infrastructure management solution. You can read more about these topics here: