Runtime Data Center Temperature Prediction using Grammatical Evolution Techniques

Marina Zapatera,c,b,∗, José L. Risco-Martı́na, Patricia Arrobac,b, José L. Ayalaa, José M. Moyac,b, Román Hermidaa

aDACYA, Universidad Complutense de Madrid, Madrid 28040, Spain
bCCS - Center for Computational Simulation, Campus de Montegancedo UPM, 28660, Spain

cLSI - Integrated Systems Lab., Universidad Politécnica de Madrid, Madrid 28040, Spain

Abstract

Data Centers are huge power consumers, both because of the energy required for computation and the cooling needed to keep
servers below thermal redlining. The most common technique to minimize cooling costs is increasing data room temperature.
However, to avoid reliability issues, and to enhance energy efficiency, there is a need to predict the temperature attained by servers
under variable cooling setups. Due to the complex thermal dynamics of data rooms, accurate runtime data center temperature
prediction has remained as an important challenge. By using Gramatical Evolution techniques, this paper presents a methodology
for the generation of temperature models for data centers and the runtime prediction of CPU and inlet temperature under variable
cooling setups. As opposed to time costly Computational Fluid Dynamics techniques, our models do not need specific knowledge
about the problem, can be used in arbitrary data centers, re-trained if conditions change and have negligible overhead during runtime
prediction. Our models have been trained and tested by using traces from real Data Center scenarios. Our results show how we can
fully predict the temperature of the servers in a data rooms, with prediction errors below 2°C and 0.5°C in CPU and server inlet
temperature respectively.

Keywords: Temperature prediction; Data Centers; Energy efficiency

1. Introduction1

Data Centers are found in every sector of the economy2

and provide the computational infrastructure to support a wide3

range of applications, from traditional applications to High-4

Performance Computing or Cloud services. Over the past5

decade, both the computational capacity of data centers and the6

number of these facilities have increased tremendously without7

relative and proportional energy efficiency, leading to unsus-8

tainable costs [1]. In 2010, data center electricity represented9

1.3% of all the electricity use in the world, and 2% of all elec-10

tricity use in the US [2]. In year 2012, global data center power11

consumption increased to 38GW, and in year 2013 there was a12

further rise of 17% to 43GW [3].13

The cooling needed to keep the servers within reliable ther-14

mal operating conditions is one of the major contributors to data15

center power consumption, and accounts for over 30% of the16

electricity bill [4] in traditional air-cooled infrastructures. In17

the last years, both industry and academia have devoted signifi-18

cant effort to decrease the cooling power, increasing data center19

Power Usage Effectiveness (PUE), defined as the ratio between20

total facility power and IT power. According to a report by the21

Uptime Institute, average PUE improved from 2.5 in 2007 to22

1.65 in 2013 [5], mainly due to more efficient cooling systems23

and higher data room ambient temperatures.24

∗Corresponding author
Email addresses: marina.zapater@ucm.es (Marina Zapater),

jlrisco@ucm.es (José L. Risco-Martı́n), parroba@die.upm.es (Patricia
Arroba), jayala@ucm.es (José L. Ayala), josem@die.upm.es (José M.
Moya), rhermida@ucm.es (Román Hermida)

However, increased room temperatures reduce the safety 25

margins to CPU thermal redlining and may cause potential reli- 26

ability problems. To avoid server shutdown, the maximum CPU 27

temperature limits the minimum cooling. The key question of 28

how to set the supply temperature of the cooling system to en- 29

sure the worst-case scenario, is still to be clearly answered [6]. 30

Most data centers typically operate with server inlet tempera- 31

tures ranging between 18°C and 24°C, but we can find some of 32

them as cold as 13°C [7], and others as hot as 35°C [8]. These 33

values are often chosen based on conservative suggestions pro- 34

vided by manufacturers, and ensure inlet temperatures within 35

the ranges published by ASHRAE (i.e., 15°C to 32°C for enter- 36

prise servers [9]). 37

Data center designers have collided with the lack of accu- 38

rate models for the energy-efficient real-time management of 39

computing facilities. One modeling barrier in these scenarios 40

is the high number of variables potentially correlated with tem- 41

perature that prevent the development of macroscopic analyt- 42

ical models. Nowadays, to simulate the inlet temperature of 43

servers under certain cooling conditions, designers rely on time 44

consuming and very expensive Computational Fluid Dynamics 45

(CFD) simulations. These techniques use numerical methods 46

to solve the differential equations that drive the thermal dynam- 47

ics of the data room. They need to consider a comprehensive 48

number of parameters both from the server and the data room 49

(i.e. specific characteristics of servers such as airflow rates, 50

data room dimensions and setup). Moreover, they are not ro- 51

bust to changes in the data center (i.e. rack placement and lay- 52

out changes, server turn-off, inclusion of new servers, etc.). If 53

Preprint submitted to Applied Soft Computing August 30, 2016


the simulation fails to properly incorporate a relevant param-54

eter, or if there is a deviation between the theoretical and the55

real values, the simulation becomes inaccurate. Due to the high56

economic and computational cost of CFD simulation, models57

cannot be re-run each time there is a change in the data room.58

To minimize cooling costs, the development of models that59

accurately predict the CPU temperature of the servers under60

variable environmental conditions is a major challenge. These61

models need to work on runtime, adapting to the changing con-62

ditions of the data room automatically re-training if data center63

conditions change dramatically, and enabling data center oper-64

ators to increase room temperature safely.65

The nature of the problem suggests the usage of meta-66

heuristics instead of analytical solutions. Meta-heuristics make67

few assumptions about the problem, providing good solutions68

even when they have fragmented information. Some meta-69

heuristics such as Genetic Programming (GP) perform Feature70

Engineering (FE), a particularly useful technique to select the71

set of features and combination of variables that best describe72

a model. Grammatical Evolution (GE) is an evolutionary com-73

putation technique based on GP used to perform symbolic re-74

gression [10]. This technique is particularly useful to provide75

solutions that include non-linear terms offering Feature Engi-76

neering capabilities and removing analytical modeling barriers.77

Also, designer’s expertise is not required to process a high vol-78

ume of data as GE is an automatic method. However, GE pro-79

vides a vast space of solutions that may need to be bounded to80

achieve algorithm efficiency.81

This paper develops a data center room thermal modeling82

methodology based on GE to predict on runtime, and with suf-83

ficient anticipation, the critical variables that drive reliability84

and cooling power consumption in data centers. Particularly,85

the main contributions of our work are the following:86

• The development of multi-variable models that incorpo-87

rate time dependence based on Grammatical Evolution88

to predict CPU and inlet temperature of the servers in a89

data room during runtime. Due to the feature engineering90

and symbolic regression performed by GE, our models in-91

corporate the optimum selection of representative features92

that best describe the thermal behavior.93

• We prevent premature convergence by means of Social94

Disaster Techniques and Random Off-Spring Generation,95

dramatically reducing the number of generations needed96

to obtain accurate solutions. We tune the models by se-97

lecting the optimum parameters and fitness function using98

a reduced experimental setup, consisting of real measure-99

ments taken from a single server isolated in a fully sen-100

sorized data room.101

• We offer a comparison with other techniques commonly102

used in literature to solve temperature modeling problems,103

such as autoregressive moving average (ARMA) mod-104

els, linear model identification methods (N4SID), and dy-105

namic neural networks (NARX).106

• The proposal of an automatic data room thermal model-107

ing methodology that scales our solution to a realistic Data108

Figure 1: Typical raised-floor air-cooled Data Center layout

Center scenario. As a case study, we model CPU and in- 109

let temperatures using real traces from a production data 110

center. 111

Our work allows the generation of accurate temperature models 112

able to work on runtime and adapt to the ever changing con- 113

ditions of these scenarios, while achieving very low average 114

errors of 2°C for CPU temperature and 0.5°C for inlet temper- 115

ature. 116

The remainder of the paper is organized as follows: Section 2 117

accurately describes the modeling problem, whereas Section 3 118

provides an overview of the current solutions. Section 4 de- 119

scribes our proposed solution, whereas Section 5 presents the 120

experimental methodology. Section 6 shows the results ob- 121

tained and Section 7 discusses them. Finally, Section 8 con- 122

cludes the paper. 123

2. Problem description 124

2.1. Data room thermal dynamics 125

To ensure the safe operation of a traditional raised-floor air- 126

cooled Data Center, data rooms are equipped with chilled-water 127

Computer Room Air Conditioning (CRAC) units that use con- 128

ventional air-cooling methods. Servers are mounted in racks on 129

a raised floor. Racks are arranged in alternating cold/hot aisles, 130

with server inlets facing cold air and outlets creating hot aisles. 131

CRAC units supply air at a certain temperature and air flow rate 132

to the Data Center through the floor plenum. The floor has some 133

perforated tiles through which the blown air comes out. Cold 134

air refrigerates servers and heated exhaust air is returned to the 135

CRAC units via the ceiling, as shown in Figure 1. 136

Even though this solution is very inefficient in terms of en- 137

ergy consumption, the majority of the data centers use this 138

mechanism. In fact, despite the recent advances in high-density 139

cooling techniques, according to a survey by the Uptime Insti- 140

tute, in 2012 only 19% of large scale data centers had incor- 141

porated other cooling mechanisms [5]. In some scenarios, the 142

control knob of the cooling subsystem is the cold air supply 143

temperature, whereas in others, it is the return temperature of 144

the heated exhaust air to the CRAC unit. 145

The maximum IT power density that can be deployed in the 146

Data Center is limited by the perforated tile airflow. Because 147

the plenum is usually obstructed (e.g. blocked with cables in 148

some areas), a non-uniform airflow distribution is generated and 149

each tile exhibits a different pressure drop. Moreover, in data 150

centers where the hot and cold aisles are not isolated, which is 151

2


the most common scenario, the heated exhaust air recirculates152

to the cold aisle, mixing with the cold air.153

2.2. Temperature-energy tradeoffs154

The factor limiting minimum data room cooling is maximum155

server CPU temperature. Temperatures higher than 85°C can156

cause permanent reliability failures [11]. At temperatures above157

95°C, servers usually turn off to prevent thermal redlining. Pre-158

vious work on server power and thermal modeling [12], shows159

how CPU temperature is dominated by: i) power consumption,160

which is dependent on workload execution, ii) fan speed, which161

changes the cooling capacity of the server, and iii) server cold162

air supply (inlet temperature).163

Thus, to keep all the equipment under normal operation,164

CRAC units have to supply the air at an adequately low temper-165

ature to ensure that all CPU’s are below the critical threshold.166

However, inlet temperature is also not uniform across servers.167

The cold air temperature at the server inlet depends on several168

parameters: i) the CRAC cold air supply, ii) the airflow rate169

through the perforated tiles, and iii) the outlet temperature of170

adjacent servers due to air recirculation.171

Setting the cooling air supply temperature to a low value,172

even though ensures safety operation, implies increased power173

consumption due to a larger burden on the chiller system. The174

goal of energy-efficient cooling strategies is to increase the cold175

air supply temperature without reaching thermal redlining. Due176

to the non-linear efficiency of cooling systems, lowering air177

supply temperature can yield important energy savings. A met-178

ric widely used is that each degree of increase in air supply179

temperature yields 4% energy savings in the cooling subsys-180

tem [13]. To increase air supply temperature safely, however,181

we need to predict not only the inlet temperature to the servers,182

but also the CPU temperature that each server attains under the183

current workload.184

Due to the temperature gradients between hot and cold aisles185

and the data room layout and geometry, the air inside a data cen-186

ter behaves like a turbulent fluid. Thus, obtaining an analytical187

relation between cold air supply and server inlet temperature is188

not trivial, making inlet and CPU temperature prediction a chal-189

lenging problem. Besides, data centers are composed of thou-190

sands of CPU cores, whose temperatures need to be modeled191

independently. This prevents the usage of classical regression192

techniques that need human interaction to train and validate the193

models.194

3. Literature overview195

Data center room thermal modeling enables both thermal196

emergency management and energy optimization, and enhances197

reliability. Because of the turbulent behavior of the air in the198

Data Center room, Computational Fluid Dynamics (CFD) sim-199

ulation has traditionally been the most commonly used solution200

in both industry and academia [14].201

CFD is used to model the inlet and outlet temperature of202

servers, given cold air supply parameters, room layout, server203

configuration and utilization, in order to either optimize cooling204

costs or detect hot spots [15]. CFD solvers perform a three- 205

dimensional numerical analysis of the thermodynamic equa- 206

tions that govern the data room. Their main drawbacks is that 207

they require and expert to configure the simulation, and are 208

computationally costly both in the modeling stage (i.e. mod- 209

eling a small-sized data room may take from hours to days) 210

and in the evaluation phase, thus preventing their online usage. 211

Moreover, CFD simulation is not robust to changes in the layout 212

of the Data Center, i.e. server placement, open tiles, workload 213

running or cold air supply setting. 214

To solve these issues, Chen et.al. [16] use CFD together with 215

sensor information to calibrate the simulation and reduce com- 216

putational complexity. Their work achieves a prediction er- 217

ror below 2°C when predicting temperature 10 minutes in ad- 218

vance. Other work [17] presents the Data Center as a distributed 219

Cyber-Physical System (CPS) in which both computational and 220

physical parameters can be measured with the goal of minimiz- 221

ing energy consumption. Our work leverages this concept by 222

using a monitoring system [18] capable of collecting environ- 223

mental (i.e. cold air supply and server inlet temperature, air- 224

flow, etc.) and server data (i.e. temperature, power, fan speed, 225

etc.) from a real data center. 226

A common alternative to CFD are abstract heat flow models. 227

These models characterize the steady state of hot air recircu- 228

lation inside the data center. Recirculation is described by a 229

cross-interference coefficient matrix which denotes how much 230

of every node’s outlet heat contributes to the inlet of every other 231

node. This matrix is obtained in an offline profiling stage using 232

CFD [19]. Even though profiling is still costly, model evalua- 233

tion can be performed online. 234

Machine learning techniques have also been used in Data 235

Center modeling. The Weatherman [20] tool uses neural net- 236

works to predict the inlet temperature of servers, obtaining pre- 237

diction errors below 1°C in over 90% of their traces. How- 238

ever, they use simulation traces obtained with CFD simulation 239

for their training and test sets, instead of real data. The prob- 240

lem behind time series prediction can be explained as a prob- 241

lem of extracting a manageable set of adequate features, fol- 242

lowed by a regression mechanism. Careful selection of features 243

and their horizon is therefore of much greater importance com- 244

pared with the static-data prediction problem. Neural network- 245

based approaches require previous knowledge of the parame- 246

ters that drive thermal modeling, obtaining them using pseudo- 247

exhaustive algorithms. Our work, on the contrary, relies on the 248

benefits of feature engineering in Symbolic Regression prob- 249

lems to obtain the relevant features and construct the models in 250

an automated way. 251

The work by De Silva et al. [21] is the one most similar to 252

ours, regarding the modeling methodology. The authors use 253

Grammatical Evolution for electricity load prediction. As op- 254

posed to our work, this paper is focused on predicting the trend, 255

momentum and volatility indicators of a timeseries, not on ob- 256

taining a physical model, i.e. they do not solve a multi-variable 257

problem. 258

In summary, the main issues in all previous approaches are: 259

i) they monitor and predict inlet temperature instead of CPU 260

temperature, ii) modeling is performed for only certain hand- 261

3


picked cooling and workload configurations, iii) the use of CPU262

utilization as a proxy for server power, iv) they assume data263

centers with homogeneous servers, v) server fan speed is con-264

sidered constant, vi) results are not validated with real traces,265

and vii) model construction requires specific knowledge on the266

problem and classical feature selection, which prevents the us-267

age of automated techniques.268

Our work, on the contrary, first predicts inlet temperature and269

then uses this result to predict CPU temperature, which is the270

factor limiting cooling. Both in our training and test sets, we271

use real traces obtained from enterprise servers in a data cen-272

ter. Moreover, as shown in previous work [12], in highly multi-273

threaded enterprise servers, utilization is not a good proxy for274

power for arbitrary workloads.275

Enterprise servers come with automatic temperature-driven276

variable fan control policies. When fan speed changes, so does277

the airflow and the server cooling capacity [22]. Our method-278

ology also considers the contribution of variable fan speed, al-279

lowing us to predict temperature in heterogeneous data center280

setups running arbitrary workloads.281

At the server level, Heather et al. [23] propose a server282

temperature prediction model based on simplified thermody-283

namic equations obtaining results within 1°C of accuracy. Even284

though this approach predicts CPU temperature and takes into285

account inlet temperature, it does not predict the inlet and needs286

specific knowledge about several server parameters. Our ap-287

proach only uses data from the generic sensors deployed in the288

server and data room.289

Another common approach to CPU temperature modeling is290

the usage of autoregressive moving average (ARMA) model-291

ing to estimate future temperature accurately based on previous292

measurements [24]. Their main drawbacks are that, because293

they only use past temperature samples, the prediction horizon294

is usually below one second. Moreover, they do not provide a295

physical model, disregarding the effect of power or airflow, and296

need to be retrained often.297

As opposed to others, our work achieves prediction horizons298

of 1 minute for CPU temperature and 10 minutes for inlet tem-299

perature with high accuracy. This enables data center opera-300

tions to take action before thermal events occur, by changing301

either the workload or the cooling in the data center.302

4. Gramatical evolution techniques303

Evolutionary algorithms use the principles of evolution to304

turn one population of solutions into another, by means of selec-305

tion, crossover and mutation. Among them, Genetic Program-306

ming (GP) has proven to be effective in a number of Symbolic307

Regression (SR) problems [25]. However, GP presents some308

limitations like bloating of the evolution (excessive growth of309

memory computer structures), often produced in the phenotype310

of the individual. In the last years, variants to GP like Gram-311

matical Evolution (GE) appeared as a simpler optimization pro-312

cess [26]. GE is inspired in the biological process of generating313

a protein given the DNA of an organism. GE evolves computer314

programs given a set of rules, adopting a bio-inspired genotype-315

phenotype mapping process.316

Real data measurements

Grammatical Evolution

Prediction
window

Data 
window

2 min

1 min

predicted sample
current measurement

Figure 2: CPU temperature prediction diagram.

In this section, we describe how we perform feature selec- 317

tion, provide a brief insight on the grammars and mapping pro- 318

cess, as well as on several model parameters. 319

4.1. Feature selection and model definition 320

In this work we use Feature Engineering (FE) and Grammat- 321

ical Evolution to obtain a mathematical expression that models 322

CPU and server inlet temperature. This expression is derived 323

from experimental measurements in real server and data room 324

scenarios, gathering data that have an impact on temperature, 325

according to previous work in the area [12, 18]. To predict CPU 326

temperature, we gather server power, fan speed, inlet tempera- 327

ture and previous CPU temperature measurements. For inlet 328

temperature, we gather the CRAC air supply and return tem- 329

perature, humidity, pressure difference through perforated tiles 330

(which is a measurement that provides information about air- 331

flow) and previous inlet temperature measurements. Our goal 332

is to predict temperature a certain time (samples) in advance, 333

by using past data of the available magnitudes within a win- 334

dow. We may use past samples from the magnitude we need to 335

predict, or even previously predicted data. 336

For illustration purposes, in Figure 2 we show a diagram in 337

which CPU temperature is predicted 1 minute ahead given: i) 338

2 minutes of past measurements (data window) for fan speed, 339

server power, inlet and CPU temperature and, ii) the previous 340

CPU temperature predictions (prediction window). 341

Formally, we claim that CPU temperature prediction for a 342

certain time instant α samples into the future is a function of 343

past data measurements within a window of size i = {0..Wcpu}, 344

and previously predicted values within a window of size j = 345

{1..α} as expressed in Eq.(1): 346

T̂CPU(k + α) = f
(
Tinlet(k − i), FS (k − i), P(k − i),

TCPU(k − i), T̂CPU(k + j)
) (1)

where Tinlet is a short form for the previous inlet temperature 347

values in a window: {Tinlet(k − i)|i ∈ {0, ...,Wcpu}}. TCPU , FS 348

4


and P are past CPU temperature, fan speed, and server power349

consumption values respectively, which are defined similarly,350

and T̂CPU are previous temperature predictions.351

For inlet temperature, our claim is that inlet temperature Tinlet352

of a certain server is driven by the room thermal dynamics and353

can be expressed as a function of the cold air supply (or return)354

temperature, TCRAC , differential pressure across perforated tiles355

γ (measured in inches of water, inH2O) and data room humidity356

h (in percentage), as in Eq.(2):357

T̂inlet(k + β) = f
(
TCRAC(k − i), γ(k − i), h(k − i),

FS p−m(k − i), T̂inlet(k + j)
) (2)

where the data window can be defined in the range i = {0..Winlet}358

and the prediction window is j = {1..β}359

Note that, in general, α and β are not equal, as the room dy-360

namics are much slower than the CPU temperature dynamics361

of the servers, i.e. in a real data room we might need hours362

to appreciate substantial differences in ambient temperature,363

whereas CPU temperature changes within seconds. The selec-364

tion of relevant features among all data measurements is a Sym-365

bolic Regression (SR) problem. In our approach, GE allows the366

generation of mathematical models applying SR.367

Regarding both the structure and the internal operators, GE368

works like a classic Genetic Algorithm [27]. GE evolves a pop-369

ulation formed by a set of individuals, each one constituted by a370

chromosome and a fitness value. In SR, the fitness value is usu-371

ally a regression metric like Mean Squared Error (MSE). In GE,372

a chromosome is a string of integers. In the optimization pro-373

cess, GA operators, are iteratively applied to improve the fitness374

value of each individual. In order to compute the fitness func-375

tion for every iteration and extract the mathematical expression376

given by an individual (phenotype), a mapping process is ap-377

plied to the chromosome (genotype). This mapping process is378

achieved by defining a set of rules to obtain the mathematical379

expression, using grammars in Backus Naur Form (BNF) [26].380

The process does not only perform parameter identification.381

In conjunction with a well-defined fitness function, the evolu-382

tionary algorithm is also computing mathematical expressions383

with the set of features that best fit the target system. Thus, GE384

is also defining the optimal set of features that derive into the385

most accurate power model.386

Moreover, this methodology can be used to predict magni-387

tudes with memory, such as temperature, where the current ob-388

servation depends on past values. To incorporate time depen-389

dence, data used for model creation needs to be a timeseries. In390

addition, we need to tune our grammars so that they can pro-391

duce models where past temperature values can be used to pre-392

dict temperature a certain number of samples into the future.393

Grammar 1 shows an example where variable x may take val-394

ues in the current time step k, i.e., x[k−0] or in previous samples395

like x[k−1] or x[k−2]. Moreover, a new variable xpred[k−idx]396

can be included, that accounts for previously predicted values397

of variable x.398

Including time dependence into a grammar has some draw-399

backs. First, we are substantially increasing the search space400

of our algorithms, as now the GE needs to search for the best 401

solution among all variables within the specified window. As 402

a consequence, the number of generations needed to obtain a 403

good fitness value increases. Second, as we show in the re- 404

sults section, depending on the prediction horizon the models 405

tend to fall into a local optimum, in which the best phenotype is 406

the last available observation of the predicted variable. To ad- 407

dress the latter challenge, we propose the use of the premature 408

convergence prevention techniques that are next explained, and 409

that also benefit the converge time of our algorithms. Despite 410

the drawbacks, introducing time dependence in our modeling 411

is a must, as temperature transients (both at the server and the 412

data center level) are not negligible and need to be accurately 413

predicted. 414

For a more detailed explanation on the principles of the map- 415

ping process, and how the BNF grammars are used to incorpo- 416

rate time dependence, the reader is referred to the Appendix. 417

Grammar 1 Example of a grammar in BNF that generates phe-
notypes with time dependence

(I)〈expr〉 ::= 〈expr〉〈op〉〈expr〉 | 〈preop〉(〈expr〉) | 〈var〉

(II)〈op〉 ::= +|-|*|/

(III)〈preop〉 ::= sin| cos | log

(IV)〈var〉 ::= x[k-〈idx〉] | xpred[k-〈idx〉] | y | z | 〈num〉

(V)〈num〉 ::= 〈dig〉.〈dig〉 | 〈dig〉

(VI)〈dig〉 ::= 0 | 1 | 2 | 3 | 4 | 5

(VII)〈idx〉 ::= 0 | 1 | 2

4.2. Preventing premature convergence 418

Premature convergence of a genetic algorithm arises when 419

the chromosomes of some high rated individuals quickly dom- 420

inate the population, reducing diversity, and constraining it to 421

converge to a local optimum. Premature convergence is one of 422

the major shortcomings when trying to model low variability 423

magnitudes by using GE techniques. 424

To overcome the lack of variety in the population, work by 425

Kureichick et al. [28] proposes the usage of Social Disaster 426

Techniques (SDT). This technique is based on monitoring the 427

population to find local optima, and apply an operator: 428

1. Packing: all individuals having the same fitness value ex- 429

cept one are fully randomized. 430

2. Judgment day: only the fittest individual survives while 431

the remaining are fully randomized. 432

Work by Rocha et al. [29] proposes the usage of Random 433

Off-spring Generation (ROG) to prevent the crossover of two 434

individuals with equal genotype, as this would result in the off- 435

spring being equal to the parents. Individuals are tested before 436

crossover and, if equal, then one off-spring (1-RO) or both of 437

them (2-RO) are randomly generated. 438

5


Both previous solutions have shown important benefits in439

classical Genetic Algorithms problems. In our work, we use440

these techniques to improve the convergence time of our solu-441

tions, as we show in Section 6.442

4.3. Fitness443

The goal of using GE for data room thermal modeling is to444

obtain accurate models. Thus, our fitness function needs to ex-445

press the error resulting in the estimation process. To measure446

the accuracy in our prediction, we would preferably use the447

Mean Absolute Error (MAE). However, because temperature is448

a magnitude that varies slowly and might remain constant dur-449

ing large time intervals, we need to give higher weight to large450

errors. To this end, the fitness function f presented in Eq.(3)451

tries to reduce the variance of the model, leading the evolution452

to obtain solutions that minimize the the Root Mean Square Er-453

ror (RMSE):454

f =

√
1
N
·
∑

i

ei
2 (3)

where the estimation error ei represents the deviation between455

the real temperature samples (both for CPU and inlet temper-456

ature modeling) obtained by the monitoring system T , and the457

estimation obtained by the model T̂ . i represents each sample458

of the entire set of N samples used to train the algorithms.459

4.4. Problem constraints460

As we are modeling the behavior of physical magnitudes for461

optimization purposes, we need to obtain a solution with phys-462

ical meaning. To this end, we constrain the general problem463

of temperature modeling in several ways that are subsequently464

presented, while still being able to address heterogeneous work-465

loads, architectures and topologies. In the results section we466

evaluate the impact of these constraints on the model genera-467

tion stage.468

4.4.1. Constraining the grammar469

The mathematical expressions can be constrained to a limited470

number of functions with physical meaning. Because temper-471

ature exhibits exponential transients, we can include the expo-472

nential function in our grammar, whereas we do not find physi-473

cal basis to include other mathematical functions such as sines474

or cosines.475

4.4.2. Fitness biasing476

Some parameters drive the variables being modeled. For in-477

stance, power consumption drives CPU temperature. As we478

want to obtain models able to capture the physical phenomena479

that drive temperature, this magnitude should be present in the480

final model. Thus, CPU temperature models that do not include481

power in their phenotype are expected to provide good results482

in the training phase, but to perform poorly for the test, as they483

are not capturing the physical phenomena. To solve this issue,484

we can force the appearance of some parameters by biasing the485

fitness, giving higher weights (i.e. worse fitness) to expressions486

that miss a parameter. By biasing fitness we speed-up conver- 487

gence, we ensure that our models incorporate all parameters 488

directly correlated with temperature, but we could obtain less 489

accurate results. 490

4.4.3. Real vs. mixed models 491

Purely real models only use real temperature data measure- 492

ments to predict future samples. Purely predictive models do 493

not used previous temperature measurements, but may use pre- 494

vious predictions. Mixed models may used both real and pre- 495

dicted data. Adding the predicted samples as a variable in- 496

creases the size of the search space but may provide higher ac- 497

curacy. 498

5. Experimental Methodology 499

In this section we describe the experimental methodology 500

followed in this paper to model server and environmental pa- 501

rameters in Data Centers. First, we describe an scenario con- 502

sisting only in the temperature prediction of one server in a 503

small air-cooled data room. We use this scenario to tune the 504

model parameters, testing those that generate better models and 505

studying the convergence of the solutions. Then, we apply the 506

best algorithm configuration to a real data center. As a case 507

study, we use real traces of CeSViMa Data Center, a High 508

Performance Computing cluster at Universidad Politécnica de 509

Madrid in Spain. 510

5.1. Reduced scenario 511

This scenario consists on an Intel Xeon RX-300 S6 server 512

equipped with 1 quad-core CPU and 16GB of RAM. The server 513

is installed in a rack with another 4 servers, 2 switches and 2 514

UPS units, in an air-cooled data room of approximately 30m2, 515

with the rack inlet facing the cold air supply and the outlet to the 516

heat exhaust. The cooling infrastructure consists on a Daikin 517

FTXS30 split that pumps cold air from the ceiling, and there 518

is no floor plenum. The cold air supply ranges from 16°C to 519

26°C. The data room is fully monitored, and both the cooling 520

and server workload are controllable. 521

5.1.1. Monitoring 522

Both the server and data room are fully monitored using the 523

internal server sensors and a wireless sensor network, as de- 524

scribed in [18]. In particular, server CPU temperature and fan 525

speed values are obtained via the server internal sensors, col- 526

lected through the Intelligent Platform Management Interface 527

(IPMI) tool 1. IPMI allows polling the internal sensors of en- 528

terprise servers with negligible overhead. As the server is not 529

shipped with power consumption sensors, we use non-intrusive 530

current clamps connected to the power cord of the server to 531

gather total server power consumption. Wireless sensors moni- 532

tor the inlet temperature of the server, the cold air supply tem- 533

perature of the split unit and data room humidity. Data are sent 534

to the monitoring server, stored and aligned to ensure a common 535

timestamp. 536

1http://ipmitool.sourceforge.net/

6


100 200 300 400 500 600 700 800
20

25

30

T
e

m
p

(d
e

g
)

a) Inlet temperature

100 200 300 400 500 600 700 800
120
140
160
180
200
220

P
o

w
e

r(
W

)

b) Server power consumption

100 200 300 400 500 600 700 800
3000

4000

5000

6000

F
a

n
 s

p
e

e
d

(r
p

m
)

c) Server fan speed

100 200 300 400 500 600 700 800

40

60

T
e

m
p

(d
e

g
)

d) Measured CPU temperature

Te
m

p(
°C

) 
Te

m
p(

°C
) 

1000      2000    3000    4000     5000     6000    7000     8000

1000      2000    3000    4000     5000     6000    7000     8000

1000      2000    3000    4000     5000     6000    7000     8000

1000      2000    3000    4000     5000     6000    7000     8000
d) CPU temperature with time (sec) 

Figure 3: Training samples used for CPU temperature modeling

5.1.2. Training and test set generation537

We generate the training and test set by assigning a wide va-538

riety of workloads that exhibit various stress levels in the CPU539

and memory subsystems of the server while we modify the cold540

air supply temperature of the split in a range from 16°C to 26°C.541

All workloads used are a representative set, in terms of stress542

to the server subsystem and power consumption, of the ones543

that can be found in High-Performance Computing data centers.544

Also, the temperatures selected for cold air supply temperature545

are within the allowable ranges in current data centers.546

The workloads used are the following: i) Lookbusy2, a syn-547

thetic workload that stresses the CPU to a customizable utiliza-548

tion value, avoiding the stress of memory and disk; ii) a modi-549

fied version of the synthetic benchmark RandMem3, that allows550

us to stress random memory regions of a given size with a given551

access pattern, and iii) HPC workloads belonging to the SPEC552

CPU 2006 benchmark suite [30].553

During training, we launch Lookbusy and Randmem at var-554

ious utilization values, plus a subset of the SPEC CPU 2006555

benchmarks that exhibit a distinctive set of characteristics ac-556

cording to Phansalkar et al. [31]. Both the arrival time and task557

duration are randomly selected. During execution, the cold air558

supply temperature is also randomly changed. For the test set,559

we randomly launch a SPEC CPU benchmark, with random560

waiting intervals while changing cold air supply temperature.561

Our monitoring system collects all data with a 10 second562

sampling interval for a total time of 5 hours for the training563

and 10 hours for the test set. Figure 3 shows part of the training564

set used for modeling.565

5.2. Case study: CeSViMa Data Center566

To show how our solution can be applied to a real data cen-567

ter scenario, this paper presents a case study for a real High-568

Performance Computing Data Center at the Madrid Supercom-569

puting and Visualization Center (CeSViMa)4. CeSViMa hosts570

2http://www.devin.com/lookbusy/
3http://www.roylongbottom.org.uk
4http://www.cesvima.upm.es/

Storage
Rack

10

C
R
A
C

C
R
A
C

C
R
A
C C

R
A
C

C
R
A
C

C
R
A
C

VPS

C02

C01

C03

C04

1 2 3 4 5 6 7

a) CeSViMa Data Center room layout b) Power7 rack front view

Figure 4: CeSViMa data room layout. Models are developed for Power7 nodes
1,4 and 7 at high c02 in racks 1 and 4.

the Magerit Supercomputer, a cluster consisting of 286 com- 571

puter nodes in 11 racks, providing 4,160 processors to execute 572

High-Performance jobs on demand. 245 of the 286 nodes are 573

IBM PS702 2S with 2 Power7 CPU’s blade servers, each with 574

8 cores running at 3.3GHz and 32GB of RAM. The other 41 575

nodes are HS22 2U servers with 2 Intel Xeon processors of 8 576

cores each at 2.5GHz and 64GB RAM. 577

CeSViMa data room has a cold-hot aisle layout and is cooled 578

by means of 6 CRAC units arranged in the walls that impulse air 579

through the floor plenum. To control data room cooling, the air 580

return temperature of each CRAC unit can be set independently. 581

The room has a total size of 190 square meters. Figure 4a shows 582

the layout of the data center. Rack 0 is a control rack that runs 583

no HPC computation. Racks 1-9 are filled with Power7 blade 584

servers, whereas rack 10 contains Intel Xeon servers. Each 585

Power7 node is installed in a blade center. Each blade center 586

contains up to 7 blades, and each rack contains 4 chassis (C01 587

to C04), as shown in Figure 4b. To run our models, we have de- 588

ployed the same sensor network as in the reduced scenario. In 589

particular, to model inlet temperature we gather inlet temper- 590

ature, humidity, CRAC air return temperature and differential 591

pressure through the floor plenum. Because we have placed 592

pressure sensors in the tiles in front of racks 1 and 4, we model 593

the Power7 nodes in these racks. To model CPU temperature, 594

we also collect CPU temperature and fan speed of all servers via 595

IPMI. CeSViMa Power7 chassis do not have per-server power 596

sensors and we are not able to deploy current clamps. Thus, 597

we use per-server utilization as a proxy for the power consump- 598

tion of the node. As stated before, utilization is not an accurate 599

metric for arbitrary workloads. However, because of the nature 600

of the workloads in CeSViMa and only for thermal modeling 601

purposes, utilization can be used as a proxy variable to power 602

consumption, as we show in Section 6. 603

In this work we show the modeling results for the servers 604

highlighted in red in Figure 4, i.e. nodes 1,4,7 at chassis c02 605

of both rack 1 and 4. These nodes are the ones that exhibit 606

the most variable workload and extreme temperature conditions 607

and constitute the worst-case scenario for modeling. 608

7


5.3. Modeling framework609

Because of the large number of servers in CeSViMa, to en-610

able cooling optimization we need a framework that allows to611

automatically model and predict the CPU and inlet tempera-612

ture of all servers. Even though CeSViMa is a small-sized data613

room, it has a very high density in terms of IT equipment. For614

instance, the amount of data gathered that needs to be processed615

to enable full environmental modeling and prediction, for a pe-616

riod of 1 year, is above 100GB. Thus, modeling the whole data617

center with traditional approaches that require human interac-618

tion is not feasible.619

Our work uses the proposed GE techniques to automatically620

model all the parameters involved in data center optimization by621

automatically running the training of the algorithms and testing622

them during runtime.623

6. Results624

In this section we present the experimental results obtained625

when applying Grammatical Evolution to model CPU and in-626

let temperature. First, we show the results for the controlled627

scenario, describing the best algorithm configuration, and com-628

pare our method with state-of-the art solutions. Then, we apply629

the best configuration to train and test the models in a real data630

center scenario.631

6.1. Algorithm setup and performance632

First, we use GE to obtain a set of candidate solutions with633

low error when compared to the temperature measurements in634

our controlled experimental setup, under different constraints.635

After evaluating the performance of our model with several636

setups, we select the following one for each model in this paper:637

• Population size: 200 individuals638

• Chromosome length: 100 codons639

• Mutation probability: inversely proportional to the number640

of rules.641

• Crossover probability: 0.9642

• Maximum wraps: 3643

• Codon size: 8 bits (values from 0 to 255)644

• Tournament size: 2 (binary)645

For CPU temperature prediction, we use a data window of646

Wcpu = 20 samples (corresponding to 200 seconds) and a pre-647

diction window of α = 6 (corresponding to 60 seconds). The648

data window is heuristically chosen with respect to the largest649

observed temperature transient, and its size is a trade-off be-650

tween model accuracy and convergence time. In this sense, the651

data window needs to incorporate enough samples to capture652

the trends of thermal transients, but we also need to consider653

that the larger the data window, the larger the exploration space.654

The prediction window needs to be selected with respect to the655

time it takes to actuate on the system and observe a response. In656

our case, the data window size Wcpu has been chosen in accor- 657

dance to our previous work on server power modeling, where 658

we analytically modeled temperature transients in enterprise 659

servers, observing that the largest transients, i.e. the worst-case 660

modeling scenario, occur for the lower fan speed values [12]. 661

The prediction window is chosen given the physical constraints 662

of the problem: 1-minute prediction is sufficient time to change 663

the workload assignment of a server, as canceling the workload 664

of a server in case of thermal redlining takes few seconds. 665

For inlet temperature prediction, we also use a data window 666

of Winlet = 20 samples but a prediction window of β = 5 sam- 667

ples. Inlet temperature dynamics are much slower than CPU 668

temperature. Because of this, a sampling rate of 2 minutes over 669

inlet temperature is sufficient to get accurate results. Given the 670

size of the prediction window, we are able to obtain inlet tem- 671

perature samples 10 minutes advance, which is sufficient time 672

to act upon data room cooling. 673

Next, we present the comparison among several configura- 674

tions in terms of grammar expressions and rules, premature 675

convergence prevention and fitness biasing. We detail our re- 676

sults for CPU temperature modeling. The procedure to tune 677

inlet temperature models is completely equivalent. 678

6.1.1. Data preprocessing and model simplification 679

Because the power measurements of the Intel Xeon server are 680

taken with a current clamp, the power values obtained exhibit 681

some noise. We preprocess the data to eliminate high-frequency 682

noise, smoothing the power consumption trace by means of a 683

low pass filter. The remaining traces did not exhibit noise, so 684

no preprocessing was needed. 685

Moreover, we perform variable standardization for every fea- 686

ture (in the range [1, 2]) to assure the same probability of ap- 687

pearance for all the variables and to enhance the GE symbolic 688

regression. 689

6.1.2. Grammars used 690

To model CPU temperature we have tested three different 691

grammars: 692

• The first is shown in Grammar 2 and contains a wide set 693

of operands and preoperands (rules II and III), that do not 694

necessarily yield models with a physical meaning. 695

• The second grammar is a variation of Grammar 2 in which 696

the number of preoperands (rule III) is reduced to expo- 697

nentials only, i.e. 〈preop〉 ::= exp 698

• The last grammar is the one presented in Grammar 3 and 699

also reduces the set of possible expressions (rule I). 700

From the previous three grammars the one that has faster con- 701

vergence time to achieve a low error, is Grammar 3. Constrain- 702

ing the grammar improves convergence time and provides phe- 703

notypes that have physical meaning, without an increase in the 704

modeling error obtained. Thus, for the remaining of the paper 705

we work with the simplified Grammar 3 when modeling CPU 706

temperature. 707

8


Grammar 2 Grammar used for CPU temperature modeling in
BNF, that uses inlet temperature (TIN), fan speed (FS), power
consumption (PS), past CPU temperature (TS) and past pre-
dicted CPU temperature (TpS)

(I)〈expr〉 ::= 〈expr〉〈op〉〈expr〉|(〈expr〉〈op〉〈expr〉)
| 〈preop〉(〈expr〉)|〈var〉|〈cte〉

(II)〈op〉 ::= +|-|*|/

(III)〈preop〉 ::= exp | sin | cos | tan

(IV)〈var〉 ::= TS[k-〈idx〉]|TIN[k-〈idx〉]|PS[k-〈idx〉]
|FS[k-〈idx〉]

(V)〈idx〉 ::= 〈dgt2〉〈dgt〉

(VI)〈cte〉 ::= 〈dgt〉.〈dgt〉

(VII)〈dgt〉 ::= 0|1|2|3|4|5|6|7|8|9

(VIII)〈dgt2〉 ::= 0|1

Grammar 3 Simplified grammar in BNF format used for CPU
temperature modeling

(I)〈expr〉 ::= 〈expr〉〈op〉〈expr〉|(〈expr〉〈op〉〈expr〉)
〈preop〉(〈exponent〉)|〈var〉|〈cte〉

(II)〈op〉 ::= +|-|*|/

(III)〈preop〉 ::= exp

(IV)〈exponent〉 ::= 〈sign〉〈cte〉*〈var〉
|〈sign〉〈cte〉*(〈var〉〈op〉〈var〉)

(V)〈sign〉 ::= +|-

(VI)〈var〉 ::= TpS[k-〈idx〉]|TS[k-〈idx〉]|TIN[k-〈idx〉]
|PS[k-〈idx〉]|FS[k-〈idx〉]

(VII)〈idx〉 ::= 〈dgt〉

(VIII)〈cte〉 ::= 〈dgt2〉.〈dgt2〉

(IX)〈dgt〉 ::= 1|2|3|4|5|6|7|8|9|10|11|12|13|14
|15|16|17|18|19|20

(X)〈dgt2〉 ::= 0|1|2|3|4|5|6|7|8|9

1 2 3 4 5 6 7 8 9 10

x 10
4

2.5

3

3.5

4

4.5

5

R
M

S
E

 e
rr

o
r 

(d
e
g
re

e
)

a) Error evolution with number of generations for real models

 
1 2 3 4 5 6 7 8 9 10

x 10
4

2.5

3

3.5

4

4.5

5

b) Error evolution with number of generations for mixed models

R
M

S
E

 e
rr

o
r 

(d
e
g
re

e
)

No ROG/SDT

ROG and SDT 5%

ROG and SDT

R
M

S
E

 e
rr

or
 (°

C
)

R
M

S
E

 e
rr

or
 (°

C
)

Figure 5: CPU temperature error evolution for real and mixed models under dif-
ferent premature convergence prevention techniques: i) no technique applied,
ii) ROG + SDT keeping 5% of equal individuals and iii) ROG + SDT random-
izing all equal individuals.

We test two variations of this grammar: i) one that searches 708

for a mixed model (i.e. uses past temperature predictions, and 709

it is the one shown in Grammar 3), and ii) the one that provides 710

a real model (i.e. only uses CPU temperature measurements). 711

The only difference between the mixed and the real grammars, 712

is the presence of the parameter T pS . 713

6.1.3. Tested configurations 714

With respect to premature convergence, we test three differ- 715

ent techniques: 716

• No premature convergence technique applied 717

• Random Off-Spring Generation (2-RO) plus Packing, 718

keeping no more that a 5% of equal individuals. 719

• Random Off-Spring Generation (2-RO) plus Packing, 720

leaving no more than 1 individual with equal phenotype. 721

For each of the previous configurations, we run both real and 722

mixed models, with the goal of comparing the convergence time 723

and the fitness evolution of each configuration. Because of the 724

random evolution of the algorithms, for comparison purposes, 725

we run the same model training 5 times and average the RMSE 726

obtained for different number of generations. Figure 5 shows 727

the RMSE evolution for the three configurations, with both real 728

and mixed models. 729

When we do not apply any technique, error decay is much 730

slower, as population loses diversity and improves only due to 731

mutation in the individuals. The impact is higher for the mixed 732

models, where search space is larger. When we apply ROG and 733

SDT, we need less generations to obtain good fitness values. 734

However, keeping only 1 individual with the same phenotype 735

and randomizing the remaining population is too aggressive, 736

9


0 1 2 3 4 5 6 7 8 9 10

x 10
4

2.5

3

3.5

4

4.5

5

R
M

S
E

 e
rr

o
r 

(d
e

g
re

e
)

Number of generations

 
Real − No bias

Mixed − No bias

Real − Biased

Mixed − Biased

R
M

S
E

 e
rr

or
 (°

C
)

Figure 6: CPU temperature error evolution for real and mixed models under
ROG + SDT 5% when fitness is biased vs. not biased.

while keeping a higher percentage of equal individuals, i.e. a737

5%, yields better results. As shown, using 30,000 generations738

is enough to obtain low RMSE values.739

Regarding fitness biasing, Figure 6 shows the differences in740

terms of RMSE for different number of generations for real and741

mixed models when we bias the fitness to force all parame-742

ters and when we do not bias it. Convergence is similar, be-743

ing slightly better that of the non-biased models. In fact, all744

variables in the grammar tend to appear in non-biased models,745

backing up the hypothesis that all those magnitudes are corre-746

lated with temperature.747

Finally, we show results for both the training and test set748

when modeling CPU temperature with the best configuration,749

i.e. a mixed model obtained with Grammar 3, using ROG and750

Packing techniques leaving 5% of equal individuals, and not751

biasing the fitness. Table 1 shows the 5 better phenotypes ob-752

tained and their corresponding RMSE and MAE values for the753

test set after simplification. To avoid overfitting, we use the754

five best models to compute the samples of the test set, i.e., we755

predict the next temperature sample with all five equations, ob-756

taining 5 different results, and we average them to obtain the757

prediction value. By applying this methodology we obtain a758

RMSE of 2.48°C and a MAE of 1.77°C. Because CPU tem-759

perature sensors usually have a resolution of 1°C we consider760

these results to be accurate enough for our purposes. Figure 7761

shows a zoom-in of the real CPU temperature trace and its pre-762

diction, for both the training and the test set. As can be seen,763

the prediction accurately matches the measured values in both764

the training and test sets.765

6.2. Comparison to other approaches766

We compare our results with three common techniques for767

CPU temperature modeling in the state-of-the-art: autoregres-768

sive moving average models, linear subspace identification769

techniques and dynamic neural networks. We first briefly de-770

scribe these three modeling techniques and then we show the re-771

sults obtained and compare them with our proposed technique.772

6.2.1. ARMA models773

ARMA models are mathematical models of autocorrelation774

in a time series, that use past values alone to forecast future775

values of a magnitude. ARMA models assume the underlying776

model as stationary and that there is a serial correlation with the777

data, something that temperature modeling accomplishes. In a778

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

T
e
m

p
(°

C
)

30

40

50

60

70
Real temperature Predicted temperature

(a) 1-minute training set prediction for mixed model

0 1000 2000 3000 4000 5000 6000 7000 8000 9000

T
e
m

p
(°

C
)

30

40

50

60

70 Real temperature Predicted temperature

(b) 1-minute test set prediction for mixed model

Figure 7: Training and test set CPU temperature prediction vs. real measure-
ments with time (secs)

general way, an ARMA model can be described as in Equa- 779

tion 4: 780

yt =

p∑
i=1

(ai · yt−i) = et +

q∑
j=1

(c j · et− j) (4)

where yt is the value of the time series (CPU temperature in our 781

case) at time t, ai’s are the lag-i autoregressive coefficients, ci’s 782

are the moving average coefficient and et is the error. The error 783

is assumed to be random and normally distributed. p and q are 784

the orders of the autoregressive (AR) and the moving average 785

(MA) parts of the model, respectively. 786

The ARMA modeling methodology consists on two different 787

steps: i) identification and ii) estimation. In our work we use an 788

automated methodology similar to the one proposed by Coskun 789

et.al. [24]. During the identification phase, the model order is 790

computed, i.e, we find the optimum values for p and q of the 791

ARMA(p, q) process. To perform model identification we use 792

an automated strategy, that computes the goodness of fit for a 793

range of p and q values, starting by the simplest model (i.e., 794

an ARMA(1,0)). The goodness of fit is computed using the 795

Final Prediction Error (FPE), and the best model is the one with 796

lowest FPE value, given by Equation 5: 797

FPE =
1 + n/N
1 − n/N

· V (5)

where n = p + q, N is the length of the time series and V is the 798

variance of the model residuals. For a fair comparison with our 799

proposed methodology, the model obtained needs to forecast 800

α = 6 samples. 801

6.2.2. N4SID 802

N4SID is a subspace identification method that estimates an 803

n order state-space model using measured input-output data, to 804

obtain a model that represents the following system: 805

ẋ(t) = Ax(t) + Bu(t) + Ke(t) (6)
y(t) = Cx(t) + Du(t) + e(t) (7)

10


Phenotype RMSE MAE

TS [k − 3] + PS [k − 1] − PS [k − 4] + 1.1 · T IN[k − 7]/(e(−0.1·(T pS [k−8]−TS [k−3]))

·(PS [k − 20]/9.9) + 9.9) − (e(−4.5·(T pS [k−5]/TS [k−19])) · TS [k − 7]) − 9.6 2.8 2.08
TS [k − 5] + PS [k − 1]) − PS [k − 5] + (TS [k − 5]/5.7 − e(−1.3·(TS [k−5]/T IN[k−12])) · TS [k − 5])
+PS [k − 11]/(3.7 − TS [k − 5] · (e(−0.3·(PS [k−18]+FS [k−7])) − e(−4.9·(PS [k−12]/PS [k−16])))) − 9.8 2.59 1.86
TS [k − 5] + PS [k − 1] − PS [k − 4] + e(+6.1·(T IN[k−10]/TS [k−5])) − 5.2) − T IN[k − 14] · (0.06 · (T IN[k − 7]/TS [k − 5]))
+(PS [k − 1]/3.1 · T IN[k − 7]) · TS [k − 5] − (PS [k − 20]/T IN[k − 7]) ·(−6.4·(T IN[k−15]/TS [k−11])) −8.0 2.77 2.01
TS [k − 3] + PS [k − 1] − PS [k − 4] − e(−4.1·(T pS [k−5]/T IN[k−19]))/TS [k − 3]
+(T pS [k − 1]/(e(−0.1·(T pS [k−8]−TS [k−3])) · (PS [k − 20]/9.9) + 9.2))/1.6 − 9.6 2.55 1.77
TS [k − 1] + 0.73 · (PS [k − 1] · 4.4 − PS [k − 2] · 4.4 − TS [k − 1] + T pS [k − 8] + e(−0.2·T IN[k−8])

+(e(−3.4·(PS [k−20]/PS [k−10])) − e(−4.0·(PS [k−6]/PS [k−19]))) · T pS [k − 12]/9.0 − 0.9) 2.5 1.75

Table 1: Phenotype, RMSE and MAE for the test set in the CPU temperature modeling reduced scenario

where A,B,C and D are state-space matrices, K is the distur-806

bance matrix, u(t) is the input, y(t) is the output, x(t) is the807

vector of n states and e(t) is the disturbance.808

State-space models are models that use state variable obser-809

vations to describe a system by a set of first-order differential810

equations, using a black-box approach. The approach consists811

on identifying a parameterization of the model, and then de-812

termining the parameters so that the measurements explain the813

model in the most accurate possible way. They have been very814

successful for the identification of linear multivariable dynamic815

systems.816

To be constructed, certain parameters need to be fed into817

the model, such as the number of forward predictions (r), the818

number of past inputs (su), and the number of past outputs(sy).819

Again, for a fair comparison with our proposed methodology,820

we need a model in the form N4S ID[r, su, sy] where r = 6, su =821

20 and sy = 20.822

6.2.3. NARX823

A Nonlinear Autoregressive eXogenous (NARX) model is a824

nonlinear autoregressive model with exogenous inputs. In this825

case, the current value of a time series is computed in function826

of a) past values of the same series, and b) current and past val-827

ues of the exogenous series. Additionally, the model contains828

an error term, since knowledge of the other terms will not en-829

able the current value of the time series to be predicted with830

precision. This is model is described as follows:831

y(t) = F(y(t−1).y(t−2), y(t−3), . . . , u(t), u(t−1), u(t−2), u(t−3), . . .)+e(t)
(8)

where y(t) is the output, u(t) is the exogenous variable and e(t)832

is the error term. F is a nonlinear function, and in our case is833

defined through a neural network. The NARX model is based834

on the linear ARX model, which is commonly used in time-835

series modeling.836

6.2.4. Model comparison837

Finally, we compare the results for CPU temperature mod-838

eling between our proposed approach, ARMA, N4SID, and839

NARX models, all of them with a prediction window of 6 sam-840

ples (1 minute). To perform a statistical comparison, we have841

Model Training set Test set
RMSE MAE RMSE MAE

ARMA1,4 3.38 ± 0.23 1.62 ± 0.13 3.23 ± 1.02 1.62 ± 0.57
ARMA9,8 3.35 ± 0.24 1.70 ± 0.14 3.24 ± 1.01 1.74 ± 0.64
N4SID 2.55 ± 0.28 1.69 ± 0.07 3.98 ± 1.11 2.93 ± 0.58
NARX 2.53 ± 0.10 1.64 ± 0.05 3.88 ± 1.16 2.35 ± 0.69
GE 2.32 ± 0.19 1.50 ± 0.08 2.56 ± 0.90 1.66 ± 0.45

Table 2: RMSE and MAE in CPU temperature prediction for each model
(ARMA, N4SID, NARX and GE)

Figure 8: Zoomed-in averaged CPU temperature modeling comparison be-
tween ARMA(1,4), N4SID, NARX and GE

conducted a non-exhaustive 5-fold cross-validation. The com- 842

plete data set is obtained from more than 10 hours of tempera- 843

ture traces gathered from an Intel Xeon RX-300 S6 server run- 844

ning a wide range of workloads under various cooling setups. 845

This data set has been partitioned into 5 equal sized subsamples. 846

A single subsample is retained as the validation data for test- 847

ing the model, and the remaining 4 subsets are used as training 848

data. This process is repeated 5 times with each of the 5 sub- 849

samples used exactly once as the validation data. RMSE and 850

MAE are averaged to perform a comparison between ARMA, 851

N4SID, NARX and GE. Each resultant set of five models will 852

be averaged to produce the final estimation. 853

Table 2 shows the RMSE and MAE errors obtained for our 854

proposed modeling technique based on GE, ARMA, N4SID 855

and NARX models, and Figure 8 shows a zoom-in into the 856

11


Time (hours)

4 8 12 16 20 24

T
e

m
p

(°
C

)

18

20

22

24 Real temperature Predicted temperature

Figure 9: 10-minute inlet temperature prediction in the reduced scenario for a
mixed model with SDT Packing 5% and simplified grammar

CPU temperature curve for the actual measurements and the857

averaged prediction of the five models obtained in the cross858

validation over the whole data set. As can be seen, GE mod-859

els are the ones with both lower RMSE and MAE. Moreover,860

the CPU temperature trend is accurately predicted. This does861

not happen for ARMA models that, even though keep the max-862

imum error low, provide values that are always behind the real863

trend, yielding poor forecasting capabilities. This issue cannot864

be solved by increasing the model order, as shown in Table 2.865

N4SID models, even though they are very accurate in the train-866

ing set, perform poorly in the test set and have an important867

bias error. Even if the bias error is corrected (which has been868

done in Figure 8) the prediction is still behind the measure-869

ments and the model is unable to capture the system dynamics.870

NARX models clearly show an overfitting effect in the train-871

ing data. The GE prediction, even though has more oscillations872

(due to the smoothed noise of the power consumption signal) is873

the only one that captures the temperature trend, advancing the874

real measurements.875

6.3. Inlet temperature modeling876

For inlet temperature modeling we perform the same study877

than for CPU temperature in terms of grammars, premature878

convergence and fitness biasing. As expected, the results in879

terms of the best model configuration yield very similar results.880

Thus, for inlet temperature modeling, we use the same configu-881

ration: i) a mixed model using a simplified version of the gram-882

mar that only allows exponentials, ii) SDT with 5% packing883

and iii) RMSE fitness function without biasing.884

The BNF grammar used is very similar to Grammar 3, where885

instead of rule VI, we use the following rule:886

〈var〉 ::= TIN[k-〈idx〉] | TpIN[k-〈idx〉]
| TSUP[k-〈idx〉] | HUM[k-〈idx〉]

where TSUP is cold air supply temperature, HUM is humidity,887

TIN are past inlet temperatures and TpIN are past inlet temper-888

ature predictions. Figure 9 shows the prediction for the test889

set. The RMSE of the prediction is of 0.33°C and MAE is890

0.27°C for a prediction window of 10 minutes and for the test891

set. Again, the model includes all the available variables, i.e.,892

TSUP, TIN and HUM appear in the final model.893

6.4. Data Center modeling894

We use the previous model with the same configuration to895

predict the CPU and inlet temperature of the blade servers at896

1 2 3 4 5 6 7

T
e
m

p
 (
°
C

)

22

24

26

28

30

a) Rack 1 Chassis 02 Inlet temperature prediction for 1 week(time in days)

Real temperature Predicted temperature

0 1 2 3 4 5 6 7

T
e
m

p
 (
°
C

)

22

24

26

28

30

b) Rack 4 Chassis 02 Inlet temperature prediction for 1 week (time is in days)

Real temperature Predicted temperature

Figure 10: Inlet temperature modeling for various racks

CeSViMa data center. Because CeSViMa is a production en- 897

vironment, when it comes to server data, we are subject to the 898

data sampling rates provided by the data center. CeSViMa col- 899

lects all data from servers every 2 minutes, and environmen- 900

tal data (i.e. from coolers) every 15 minutes. Thus, for both 901

CPU and inlet temperatures, we need to change our prediction 902

windows. For CPU temperature we use a prediction window 903

α′ = 1, which means that we predict CPU temperature two 904

minutes into the future. For inlet temperature we use a predic- 905

tion window β′ = 1 samples, i.e. we predict temperature 15 906

minutes ahead. 907

Because in CeSViMa we cannot control the workload being 908

executed, nor modify the cooling setup, we need to select longer 909

training and tests sets to ensure that they exhibit high variability 910

on the magnitudes of interest. For CPU temperature, we select 2 911

days of execution for the training set, and 4 days for the test set. 912

For inlet temperature, which varies very slowly in a real data 913

center setup we use 14 days of execution for both the training 914

and the test set. 915

Figure 10 shows a zoomed-in plot of the measured and pre- 916

dicted inlet temperature to the chassis c02 of racks 1 and 4 in 917

CeSViMa data center for a period of 8 days. Figure 11 shows 918

the measured and predicted CPU temperature traces for blades 919

b01, b04 and b07 in chassis c02 of both racks, for the first two 920

days of the same period, as well as the prediction error (i.e. the 921

difference between the real measurements and the prediction). 922

To generate these last models, instead of using the real inlet 923

temperature measurements, we use predicted inlet temperature. 924

This way, we are able to accurately predict all variables needed 925

for optimization. 926

Table 3 shows the phenotypes obtained for CPU and temper- 927

ature modeling of the servers in CeSViMa data center. We also 928

report MAE for both training and test sets. We observe that 929

all phenotypes that model CPU temperature incorporate the pa- 930

rameters of interest (inlet temperature, power and fan speed), 931

and we obtain errors below 1°C in all cases. The average RMSE 932

across models are 1.52 °C and 1.57°C for the training and test 933

set respectively. As for inlet temperature, the phenotypes in- 934

corporate both differential pressure, and CRAC return temper- 935

ature. Moreover, depending on the rack placement, the influ- 936

ence of the CRAC units vary. Here we can observe the benefits 937

of the feature selection performed by GE. Rack1, which is the 938

leftmost rack in the data center, is affected only by CRAC2; 939

whereas Rack4, situated in the middle of the row, is affected 940

12


CPU temperature traces

0 12 24 36 48

T
e
m

p
 (
°
C

)

35

45

55

65

Predicted Temperature Real temperature

CPU temperature traces

0 12 24 36 48
35

45

55

65
CPU temperature traces

0 12 24 36 48
35

45

55

65

Temperature Prediction Error

0 12 24 36 48T
e
m

p
 (
°
C

)

-5

0

5
Temperature Prediction Error

0 12 24 36 48
-5

0

5
Temperature Prediction Error

0 12 24 36 48
-5

0

5

CPU temperature traces

0 12 24 36 48

T
e
m

p
 (
°
C

)

35

45

55

65
CPU temperature traces

0 12 24 36 48
35

45

55

65
CPU temperature traces

0 12 24 36 48
35

45

55

65

Temperature Prediction Error

0 12 24 36 48T
e
m

p
 (
°
C

)

-5

0

5 Temperature Prediction Error

0 12 24 36 48
-5

0

5
Temperature Prediction Error

0 12 24 36 48
-5

0

5

a) Rack 1 Chassis 02 Blade 01 b) Rack 1 Chassis 02 Blade 04 c) Rack 1 Chassis 02 Blade 07

d) Rack 4 Chassis 02 Blade 01 e) Rack 4 Chassis 02 Blade 04 f) Rack 4 Chassis 02 Blade 07

Figure 11: Data Center CPU temperature modeling and prediction error for various servers in different racks for 2 days of traces (time in hours)

both by CRAC2 and CRAC3. The model automatically in-941

corporates the most relevant features, discarding the irrelevant942

ones. For inlet temperature prediction, our error is below 0.5°C,943

which is enough for our purposes and below other state-of-the-944

art approaches.945

7. Discussion946

In this section we briefly discuss the applicability of our mod-947

els, and the computational effort needed to model a full data948

center scenario, to validate the feasibility of our approach.949

7.1. Applicability950

The goal of our modeling is predicting server CPU temper-951

ature under variable cooling setups, so that cooling-associated952

costs can be reduced without incurring on reliability issues. To953

this end, we first predict the inlet temperature of servers given954

the data room conditions and cooling setup, and use this result955

to predict server temperature.956

Having analyzed the spatio-temporal variability of inlet tem-957

perature traces in CeSViMa data center, we find that it is suf-958

ficient to predict inlet temperature at 3 different heights (at the959

bottom, middle and top of the rack), in one out of two racks.960

This way, we need to generate 30 inlet temperature models at961

most. Because the maximum CPU temperature in the data cen-962

ter is the one limiting the cooling, at most we need to predict963

the CPU temperature of each server in the data room, i.e. we 964

need as many models as servers in the data center. However, if 965

by analyzing the traces we find that there is a subset of CPUs 966

that limit the maximum cooling of the overall data center, our 967

problem can be reduced to modeling those that always exhibit 968

higher temperatures. For the particular case of the traces of 969

CeSViMa, if we examine 6 months of CPU temperature traces, 970

we find that the CPUs limiting the cooling are the blades b04 971

and b07 placed in the second chassis (c02) of all racks. In this 972

sense, for energy optimization purposes our problem reduces to 973

generating 10 different models. These models allow us to pre- 974

dict the maximum server temperature attained in the data center 975

and, thus, detect any potential thermal redlining and act before 976

it occurs. Moreover, to leverage energy optimization, our re- 977

sults can be used to set cooling dynamically during runtime, 978

by predicting the maximum data center CPU temperature un- 979

der various cooling conditions and increasing CRAC air supply 980

temperature without incurring in reliability issues. 981

Even though in this paper we have applied our modeling 982

methodology to a raised-floor air-cooled data center scenario, 983

the proposed technique is also valid for data centers equipped 984

with other state-of-the-art cooling mechanisms, such as in-row 985

or in-rack cooling used in high-density racks. 986

7.2. Computational effort 987

Our approach is computationally intensive in the model train- 988

ing stage. The GE model needs to evolve a random initial pop- 989

13


Model Phenotype Train. Test
Inlet T IN[k − 1] + e(−4.3·(TRET2[k−3]/TRET2[k−11]))/T IN[k − 1] · (e(+1.5·T IN[k−5]) 0.32 0.4
Rack1 −e(−3.8∗(PDIF[k−20]−T IN[k−1])))
Inlet T IN[k − 1] + 3.1/e(+5.0·T IN[k−1]) · e(+2.2·T IN[k−2]) − e(−2.9·T pIN[k−3]) · TRET3[k − 12]·
Rack4 (TRET3[k − 5] + e(−4.9·(HUM[k−6]/TRET2[k−1]))/T IN[k − 2]/e(+7.2·(PDIF[k−20]−T IN[k−1]))) 0.18 0.44
CPU TS [k − 1] + (PS [k − 20] · FS [k − 1] − 9.4 · (T pS [k − 5] · T pS [k − 2]))/(e(+2.0·FS [k−7])/
Rack1, C02, B01 e(−4.1·T pS [k−5])/(2.3 + e(+1.7∗(T pS [k−10]∗T pS [k−10])) + e(+1.5∗FS [k−1]) − e(+1.6∗PS [k−7]))) 0.68 0.76
CPU TS [k − 1] + e(−7.2∗(TS [k−6]/PS [k−1]))/(e(−6.1∗(TS [k−10]−PS [k−15]))/5.6/FS [k − 20]
Rack1, C02, B04 −1.7 + T pS [k − 20] − e(−3.0∗(T IN[k−4]/TS [k−15])))
CPU TS [k − 1] + e(−6.2·(TS [k−2]·T IN[k−11]))/((e(−9.8·TS [k−8]) + e(−5.2∗(TS [k−9]∗PS [k−19]))) 0.51 0.85
Rack1, C02, B07 ·e(+5.8∗FS [k−9]))
CPU TS [k − 1] + e(−3.1·TS [k−2])/(((TS [k − 3]/PS [k − 3]) + (FS [k − 18] − FS [k − 8])/ 0.55 0.75
Rack4, C02, B01 e(−9.4·(FS [k−1]−T pS [k−9]))) − T pS [k − 4] + (T pS [k − 6] + TS [k − 3]) − (T IN[k − 3]/T pS [k − 9])) 0.29 0.46
CPU
Rack4, C02, B04 TS [k − 1] + e(−5.7·(T pS [k−6]∗T pS [k−10]))/e(+9.9·(T pS [k−9]−T IN[k−10])) 0.26 0.73
CPU
Rack4, C02, B07 TS [k − 1] + T IN[k − 11] · e(−9.5·(T pS [k−10]/T IN[k−5]))/e(−9.9·(T IN[k−10]−TS [k−5])) 0.43 0.87

Table 3: Phenotype and average error (in Celsius) in training and test set for CPU and inlet temperature modeling in a production Data Center

ulation for 30,000 generations to obtain accurate results. In our990

experiments, running 30,000 generations of 4 different models991

in parallel takes 28h in a computer equipped with a QuadCore992

Intel i7 CPU @3.4GHz and 8GB of RAM. This computational993

cost is much larger than the computational cost for training994

ARMA and N4SID models. However, to obtain accurate re-995

sults ARMA needs to be manually tuned, and N4SID requires996

a manual feature selection step that greatly impacts accuracy,997

whereas GE models can be automatically developed.998

However, as the models obtained for homogeneous servers999

are very similar, it is possible to reduce the training overhead1000

by using already evolved populations to fine-tune the models1001

instead of using the a new random population every time. This1002

way, we can reduce the training time significantly.1003

As for the model testing, in the worst case scenario, the1004

model needs to be tested every 10 seconds. The overhead to test1005

one model is found to be negligible. In this sense, it is feasible1006

to compute all temperatures to find the maximum. Moreover,1007

because of the temperature imbalances of servers in the data1008

room we can reduce the amount of models run to those that are1009

limiting the cooling, i.e. the servers with higher CPU tempera-1010

ture values. Overhead incurred by testing is in the same order1011

of magnitude than the overhead of ARMA and N4SID, but pro-1012

vides better results in terms of error.1013

8. Conclusions1014

In this paper we have presented a methodology for the unsu-1015

pervised generation of models to predict on runtime the thermal1016

behavior of production data centers running arbitrary workloads1017

and equipped with heterogeneous servers.1018

Our approach leverages the usage of Grammatical Evolution1019

to automatically generate models of the data room by using real1020

data center traces. Our solution allows to predict the CPU tem-1021

perature and inlet temperature of servers, with an average error1022

below 2°C and 0.5°C respectively. These errors are within the1023

margin obtained by other off-line supervised approaches in the 1024

state-of-the art. Our solution, generates the models in an unsu- 1025

pervised way, is able to work on runtime, is trained and tested in 1026

a real scenario, and does not require the usage of CFD software. 1027

To the best of our knowledge our work is the first to propose 1028

data center temperature forecasting using evolutionary tech- 1029

niques, allowing predictive model generation for runtime op- 1030

timization. 1031

Appendix 1032

In this Appendix we provide further information on the map- 1033

ping process used by our grammar. For a more detailed expla- 1034

nation on the principles of GE, the reader is referred to [32]. A 1035

BNF specification is a set of derivation rules, expressed in the 1036

form: 1037

〈symbol〉 ::= 〈expression〉

Rules are composed of sequences of terminals, which are 1038

items that can appear in the language, and non-terminals, which 1039

can be expanded into one or more terminals and non-terminals. 1040

A grammar is represented by the tuple N,T, P, S , being N the 1041

non-terminal set, T is the terminal set, P the production rules 1042

for the assignment of elements on N and T , and S is a start 1043

symbol that should appear in N. The options within a produc- 1044

tion rule are separated by a “|” symbol. 1045

Grammar 4 represents an example grammar in BNF. The fi- 1046

nal expression consists of elements of the set of terminals T , 1047

which have been combined with the rules of the grammar. 1048

The chromosome is used to map the start symbol onto termi- 1049

nals by reading genes (or codons) of 8 bits to generate a corre- 1050

sponding integer value, from which the options of a production 1051

rule are selected by using the modulus operator: 1052

Rule = Codon Value % Number of Rule Choices (9)

14


Grammar 4 Example of a grammar in BNF designed for sym-
bolic regression
N = {expr, op, preop, var, num, dig}
T = {+, -, *, /, sin, cos, exp, x, y, z, 0, 1, 2, 3, 4, 5, (, ), .}
S = {expr}
P = {I, II, III, IV, V, VI}

(I)〈expr〉 ::= 〈expr〉〈op〉〈expr〉 | 〈preop〉(〈expr〉) | 〈var〉

(II)〈op〉 ::= +|-|*|/

(III)〈preop〉 ::= sin|cos|log

(IV)〈var〉 ::= x|y|z

(V)〈num〉 ::= 〈dig〉.〈dig〉 | 〈dig〉

(VI)〈dig〉 ::= 0 | 1 | 2 | 3 | 4 | 5

Example: In this example, we explain the mapping process1053

performed in GE to obtain a phenotype (mathematical function)1054

given a genotype (chromosome). Let us suppose we have the1055

BNF grammar provided in Figure 4 and the following 7-gene1056

chromosome:1057

21-64-17-62-38-254-2

According to Figure 4, the start symbol is S = 〈expr〉, hence1058

the decoded expression begins with the non-terminal:1059

S olution = 〈expr〉

Now, we use the first gene of the chromosome (i.e. 21) in1060

rule I of the grammar. The number of choices in that rule is1061

3. Hence, the mapping function is applied: 21 MOD 3 = 0 and1062

the first option is selected 〈expr〉〈op〉〈expr〉. The selected op-1063

tion substitutes the decoded non-terminal, giving the following1064

expression:1065

S olution = 〈expr〉〈op〉〈expr〉

The process continues with the codon 64, used to decode the1066

first non-terminal of the current expression, 〈expr〉. Again, the1067

mapping function is applied to the rule: 64 MOD 3 = 1 and the1068

second option 〈preop〉(〈expr〉) is selected. So far, the current1069

expression is:1070

S olution = 〈prep〉(〈expr〉)〈op〉〈expr〉

The next codons (17, 62, 38, 254 and 2) are decoded in the1071

same way. After codon 2 has been decoded, the expression is:1072

S olution = exp(x) ∗ 〈var〉

At this point, the decoding process has run out of codons, and1073

we need to reuse codons starting from the first one. This tech-1074

nique is known as wrapping and mimics the gene-overlapping1075

phenomenon in many organisms [33]. Applying wrapping, we1076

use gene 21 to decode 〈VAR〉 with rule IV. This result gives the 1077

final expression of the phenotype: 1078

S olution = exp(x) ∗ y

Apart from performing parameter identification, in conjunc- 1079

tion with a well-defined fitness function, the evolutionary algo- 1080

rithm is also computing mathematical expressions with the set 1081

of features that best fit the target system. Thus, GE is also defin- 1082

ing the optimal set of features that derive into the most accurate 1083

model. 1084

Adding time dependence: Previously shown grammars allow 1085

us to obtain phenotypes that depend on a certain number of vari- 1086

ables (e.g. x, y, z). We could use the previous method to predict 1087

variables that depend only on the current observation of other 1088

magnitudes, such as server power [34]. 1089

Models created this way can be used to predict magnitudes 1090

without memory and the data used for model creation consists 1091

of samples. Temperature, however, is a magnitude with mem- 1092

ory, i.e. the current temperature depends on past temperature 1093

values. Thus, the data used for model creation need to be a time 1094

series. By properly tuning our grammars, we can add time de- 1095

pendence to the variables in the phenotype, so that past values 1096

can be used to predict the variable a certain number of samples 1097

ahead. 1098

Acknowledgments 1099

Research by Marina Zapater has been partly supported by 1100

a PICATA predoctoral fellowship of the Moncloa Campus of 1101

International Excellence (UCM-UPM). This project has been 1102

partially supported by the Spanish Ministry of Economy and 1103

Competitiveness, under contracts TEC2012-33892, IPT-2012- 1104

1041-430000 and RTC-2014-2717-3. The authors thankfully 1105

acknowledge the computer resources, technical expertise and 1106

assistance provided by the Centro de Supercomputación y Vi- 1107

sualización de Madrid (CeSViMa). 1108

References 1109

[1] J. Kaplan, W. Forrest, N. Kindler, Revolutionizing data center energy ef- 1110

ficiency, Tech. Rep. July, McKinsey & Company (2008). 1111

[2] J. Koomey, Growth in data center electricity use 2005 to 2010, Tech. rep., 1112

Analytics Press, Oakland, CA (2011). 1113

[3] Archana Venkatraman. ComputerWeekly.com, Global cen- 1114

sus shows datacentre power demand grew 63% in 2012, 1115

http://www.computerweekly.com/news/2240164589/Datacentre-power- 1116

demand-grew-63-in-2012-Global-datacentre-census (October 2012). 1117

[4] T. Breen, et al., From chip to cooling tower data center modeling: Part I 1118

influence of server inlet temperature and temperature rise across cabinet, 1119

in: ITherm, 2010, pp. 1–10. 1120

[5] J. K. Matt Stansberry, Uptime institute 2013 data center industry survey, 1121

Tech. rep., Uptime Institute (2013). 1122

[6] N. El-Sayed, et al., Temperature management in data centers: why some 1123

(might) like it hot, in: SIGMETRICS, 2012, pp. 163–174. 1124

[7] J. Brandon, Going green in the data center: Practical steps for your SME 1125

to become more environmentally friendly, Processor (29). 1126

[8] R. Miller, Too hot for humans, but google servers keep humming, 1127

http://www.datacenterknowledge.com/archives/2012/03/23/too-hot-for- 1128

humans-but-google-servers-keep-humming/ (March 2012). 1129

15


[9] ASHRAE Technical Commitee (TC) 9.9, 2011 Thermal Guidelines for1130

Data Processing Environments, Tech. rep., American Society of Heating,1131

Refrigerating and Air-Conditioning Engineers, Inc. (2011).1132

[10] C. Ryan, J. Collins, M. Neill, Grammatical evolution: Evolving programs1133

for an arbitrary language, in: Genetic Programming, Vol. 1391 of Lecture1134

Notes in Computer Science, Springer Berlin Heidelberg, 1998, pp. 83–96.1135

[11] D. Atienza, et al., Reliability-aware design for nanometer-scale devices,1136

in: Proceedings of the 2008 Asia and South Pacific Design Automation1137

Conference, IEEE Computer Society Press, 2008, pp. 549–554.1138

[12] M. Zapater, et al., Leakage-aware cooling management for improving1139

server energy efficiency, IEEE Transactions on Parallel and Distributed1140

Systems (TPDS).1141

[13] R. Miller, Data center cooling set points debated,1142

http://www.datacenterknowledge.com/archives/2007/09/24/ data-center-1143

cooling-set-points-debated/ (September 2007).1144

[14] P. B. Liz Marshall, Using CFD for data center design and analysis, Tech.1145

rep., Applied Math Modeling White Paper (2011).1146

[15] Z. Abbasi, G. Varsamopoulos, S. K. S. Gupta, Thermal aware1147

server provisioning and workload distribution for internet data cen-1148

ters, in: HPDC, ACM, New York, NY, USA, 2010, pp. 130–141.1149

doi:10.1145/1851476.1851493.1150

[16] J. Chen, et al., A high-fidelity temperature distribution forecasting system1151

for data centers, in: Proceedings of the 2012 IEEE 33rd Real-Time Sys-1152

tems Symposium, RTSS ’12, IEEE Computer Society, Washington, DC,1153

USA, 2012, pp. 215–224.1154

[17] Z. Abbasi, M. Jonas, A. Banerjee, S. Gupta, G. Varsamopoulos, Evolu-1155

tionary green computing solutions for distributed cyber physical systems,1156

in: Evolutionary Based Solutions for Green Computing, Springer Berlin1157

Heidelberg, 2013, pp. 1–28.1158

[18] J. Pagán, M. Zapater, O. Cubo, P. Arroba, V. Martı́n, J. M. Moya, A1159

Cyber-Physical approach to combined HW-SW monitoring for improv-1160

ing energy efficiency in data centers, in: Conference on Design of Circuits1161

and Integrated Systems, 2013, pp. 140–145.1162

[19] G. Varsamopoulos, A. Banerjee, S. Gupta, Energy efficiency of thermal-1163

aware job scheduling algorithms under various cooling models, in: Con-1164

temporary Computing, Vol. 40 of Communications in Computer and In-1165

formation Science, 2009, pp. 568–580.1166

[20] J. Moore, J. Chase, P. Ranganathan, Weatherman: Automated, online and1167

predictive thermal mapping and management for data centers, in: IEEE1168

International Conference on Autonomic Computing, ICAC’06, 2006, pp.1169

155–164. doi:10.1109/ICAC.2006.1662394.1170

[21] A. M. D. Silva, F. Noorian, R. I. A. Davis, P. H. W. Leong, A hybrid1171

feature selection and generation algorithm for electricity load prediction1172

using grammatical evolution, in: Proceedings of the 2013 12th Interna-1173

tional Conference on Machine Learning and Applications, ICMLA ’13,1174

IEEE Computer Society, Washington, DC, USA, 2013, pp. 211–217.1175

[22] M. Patterson, The effect of data center temperature on energy efficiency,1176

in: Thermal and Thermomechanical Phenomena in Electronic Systems,1177

ITHERM’08, 2008, pp. 1167 –1174.1178

[23] T. Heath, A. P. Centeno, P. George, L. Ramos, Y. Jaluria, R. Bianchini,1179

Mercury and freon: Temperature emulation and management for server1180

systems, in: ASPLOS, New York, NY, USA, 2006, pp. 106–116.1181

[24] A. Coskun, T. Rosing, K. Gross, Utilizing predictors for efficient thermal1182

management in multiprocessor socs, TCAD 28 (10) (2009) 1503 –1516.1183

[25] E. Vladislavleva, G. Smits, D. den Hertog, Order of nonlinearity as a com-1184

plexity measure for models generated by symbolic regression via pareto1185

genetic programming, IEEE Transactions on Evolutionary Computation1186

13 (2) (2009) 333–349. doi:10.1109/TEVC.2008.926486.1187

[26] M. O’Neill, C. Ryan, Grammatical evolution, IEEE Transactions on Evo-1188

lutionary Computation 5 (4) (2001) 349–358.1189

[27] T. Back, U. Hammel, H.-P. Schwefel, Evolutionary computation: com-1190

ments on the history and current state, IEEE Transactions on Evolutionary1191

Computation 1 (1) (1997) 3–17. doi:10.1109/4235.585888.1192

[28] K. Melikhov, V. M. Kureichick, A. N. Melikhov, V. V. Miagkikh, O. V.1193

Savelev, A. P. Topchy, Some New Features In Genetic Solution Of The1194

Traveling Salesman Problem., in: Adaptive Computing in Engineering1195

Design and Control (ACEDC), 1996.1196

[29] M. Rocha, J. Neves, Preventing premature convergence to local optima1197

in genetic algorithms via random offspring generation, in: International1198

Conference on Industrial and Engineering Applications of Artificial Intel-1199

ligence and Expert Systems, IEA/AIE’99, Secaucus, NJ, USA, 1999, pp.1200

127–136. 1201

[30] SPEC CPU Subcommittee and John L. Henning, SPEC CPU 2006 bench- 1202

mark descriptions, http://www.spec.org/cpu2006/. 1203

[31] A. Phansalkar, A. Joshi, L. K. John, Subsetting the spec cpu2006 bench- 1204

mark suite, SIGARCH Computer Architecture News 35 (1) (2007) 69–76. 1205

[32] C. Ryan, M. O’Neill, Grammatical evolution: A steady state approach., 1206

in: In Late Breaking Papers, Genetic Programming, 1998, pp. 180–185. 1207

[33] E. Hemberg, L. Ho, M. O’Neill, H. Claussen, A comparison of gram- 1208

matical genetic programming grammars for controlling femtocell network 1209

coverage, Genetic Programming and Evolvable Machines 14 (1) (2013) 1210

65–93. doi:10.1007/s10710-012-9171-8. 1211

[34] P. Arroba, J. L. Risco-Martin, M. Zapater, J. M. Moya, J. L. Ayala, En- 1212

hancing regression models for complex systems using evolutionary tech- 1213

niques for feature engineering, Journal of Grid Computing. 1214

16