Bottom-up Energy Modeling of Data Center components

Module 1 of the project IT-for-Green is concerned with Green IT. In this module we provide tools and methodology for modeling the energy behavior of a data center. A data center graphical editor is developed where the user can create components per drag and drop from a toolbox into a canvas, and connect them together through their available ports. Later this model can be converted into Modelica code which can be run against data to provide simulation results of the energetic behavior.

The full model of the data center is created by instantiating the models of single components (Servers, UPSs, Cooling Systems, ) and connecting them to each other using equations that correspond to arcs used in the graphical design. But where do the models of the single components come from? This article will cover how these models are created in a bottom-up fashion.

Top-down modeling

This approach starts by using existing models which can be found in literature or provided by the manufacturers of the components. After that the models are parameterized and/or verified against real measured data. For example, a conventional server model requires at least two parameters:

Pidle : Power consumption of the server in the idle state where the server is up but is not running any tasks. Pmax: Maximum power consumption of the server when it is fully loaded.

Between these two states consumption of the server relates to its load in a linear or a quadratic manner. The curve in this area is an additional parameter of the model: being a convex or a concave and the coefficients of the load parameters. Load of the server is measured mainly through the CPU load percentage, some models offer to consider other load parameters such as RAM usage, HDD read and write bytes, Network IP datagrams, etc.. The next step in this approach is to set these parameters then compare the calculated output of this model to the measured energy consumption in order to estimate the model's accuracy.

Bottom-up modeling

Instead of relying on existing models and then struggling with parameterization, inclusion and exclusion of input parameters, and accuracy optimization, we have chosen an approach that starts from the last step. We start from the measured data and try to find the model that best describes them.

We start by loading the data of the component, for example server data in the form of a table that contains the following columns: timestamp, cpu_busy, mem_busy, disk_bytes, ip_datagrams, active_power. Each record in this table represents a snapshot of these parameters at the given time stamp.

After that we try to find the best function that describes these data in the form: f(cpu_busy, mem_busy, disk_bytes, ip_datagrams) = active_power. For that regression analysis methods can be used, and we have chosen the Ordinary Least Squares (OLS). OLS tries to find the function that produces a power consumption with a minimal squared distance to the observed power consumption. We thereby get automatically the best possible function based on the passed parameters.

This multi-regression might not be enough however since modern CPUs use Dynamic voltage and frequency scaling (DVFS) which allows the CPU to lower its power consumption when it is lightly loaded, and causes the relation between its load and its consumption to take a quadratic curve. Therefore we incorporate additional parameters into the regression which are the squares of our original parameters. The sought-after function would then look like this: f(cpu_busy2, mem_busy2, disk_bytes2, ip_datagrams2, cpu_busy, mem_busy, disk_bytes, ip_datagrams) = active_power.

Since not all four parameters might be available to measure or significant for determining the power consumption, we test all possible combinations of the 8 parameters and sort the resulting functions by their quality which is represented by the correlation of its results to the measured consumption. The result is a sorted list of functions formed as a table that has the following columns: involved_params, coefficient_vector, correlation.

Depending on the available parameters, the most-representative function is chosen as the energy model for the server type. It might happen that even with inclusion of all input parameters a correlation of no more than 50% is reached. In this case we dare to declare the server as load-unadaptive and therefore an energy-saving potential lies in reconfiguring or replacing this server. Exact savings can be calculated later through a what-if simulated scenario.

Best functions per parameter combination are stored as energy models for the components for later use when generating code from the data center design.