Runtime Data Center Temperature Prediction using Grammatical Evolution Techniques Marina Zapatera,c,b,∗, José L. Risco-Martı́na, Patricia Arrobac,b, José L. Ayalaa, José M. Moyac,b, Román Hermidaa aDACYA, Universidad Complutense de Madrid, Madrid 28040, Spain bCCS - Center for Computational Simulation, Campus de Montegancedo UPM, 28660, Spain cLSI - Integrated Systems Lab., Universidad Politécnica de Madrid, Madrid 28040, Spain Abstract Data Centers are huge power consumers, both because of the energy required for computation and the cooling needed to keep servers below thermal redlining. The most common technique to minimize cooling costs is increasing data room temperature. However, to avoid reliability issues, and to enhance energy efficiency, there is a need to predict the temperature attained by servers under variable cooling setups. Due to the complex thermal dynamics of data rooms, accurate runtime data center temperature prediction has remained as an important challenge. By using Gramatical Evolution techniques, this paper presents a methodology for the generation of temperature models for data centers and the runtime prediction of CPU and inlet temperature under variable cooling setups. As opposed to time costly Computational Fluid Dynamics techniques, our models do not need specific knowledge about the problem, can be used in arbitrary data centers, re-trained if conditions change and have negligible overhead during runtime prediction. Our models have been trained and tested by using traces from real Data Center scenarios. Our results show how we can fully predict the temperature of the servers in a data rooms, with prediction errors below 2°C and 0.5°C in CPU and server inlet temperature respectively. Keywords: Temperature prediction; Data Centers; Energy efficiency 1. Introduction1 Data Centers are found in every sector of the economy2 and provide the computational infrastructure to support a wide3 range of applications, from traditional applications to High-4 Performance Computing or Cloud services. Over the past5 decade, both the computational capacity of data centers and the6 number of these facilities have increased tremendously without7 relative and proportional energy efficiency, leading to unsus-8 tainable costs [1]. In 2010, data center electricity represented9 1.3% of all the electricity use in the world, and 2% of all elec-10 tricity use in the US [2]. In year 2012, global data center power11 consumption increased to 38GW, and in year 2013 there was a12 further rise of 17% to 43GW [3].13 The cooling needed to keep the servers within reliable ther-14 mal operating conditions is one of the major contributors to data15 center power consumption, and accounts for over 30% of the16 electricity bill [4] in traditional air-cooled infrastructures. In17 the last years, both industry and academia have devoted signifi-18 cant effort to decrease the cooling power, increasing data center19 Power Usage Effectiveness (PUE), defined as the ratio between20 total facility power and IT power. According to a report by the21 Uptime Institute, average PUE improved from 2.5 in 2007 to22 1.65 in 2013 [5], mainly due to more efficient cooling systems23 and higher data room ambient temperatures.24 ∗Corresponding author Email addresses: marina.zapater@ucm.es (Marina Zapater), jlrisco@ucm.es (José L. Risco-Martı́n), parroba@die.upm.es (Patricia Arroba), jayala@ucm.es (José L. Ayala), josem@die.upm.es (José M. Moya), rhermida@ucm.es (Román Hermida) However, increased room temperatures reduce the safety 25 margins to CPU thermal redlining and may cause potential reli- 26 ability problems. To avoid server shutdown, the maximum CPU 27 temperature limits the minimum cooling. The key question of 28 how to set the supply temperature of the cooling system to en- 29 sure the worst-case scenario, is still to be clearly answered [6]. 30 Most data centers typically operate with server inlet tempera- 31 tures ranging between 18°C and 24°C, but we can find some of 32 them as cold as 13°C [7], and others as hot as 35°C [8]. These 33 values are often chosen based on conservative suggestions pro- 34 vided by manufacturers, and ensure inlet temperatures within 35 the ranges published by ASHRAE (i.e., 15°C to 32°C for enter- 36 prise servers [9]). 37 Data center designers have collided with the lack of accu- 38 rate models for the energy-efficient real-time management of 39 computing facilities. One modeling barrier in these scenarios 40 is the high number of variables potentially correlated with tem- 41 perature that prevent the development of macroscopic analyt- 42 ical models. Nowadays, to simulate the inlet temperature of 43 servers under certain cooling conditions, designers rely on time 44 consuming and very expensive Computational Fluid Dynamics 45 (CFD) simulations. These techniques use numerical methods 46 to solve the differential equations that drive the thermal dynam- 47 ics of the data room. They need to consider a comprehensive 48 number of parameters both from the server and the data room 49 (i.e. specific characteristics of servers such as airflow rates, 50 data room dimensions and setup). Moreover, they are not ro- 51 bust to changes in the data center (i.e. rack placement and lay- 52 out changes, server turn-off, inclusion of new servers, etc.). If 53 Preprint submitted to Applied Soft Computing August 30, 2016 the simulation fails to properly incorporate a relevant param-54 eter, or if there is a deviation between the theoretical and the55 real values, the simulation becomes inaccurate. Due to the high56 economic and computational cost of CFD simulation, models57 cannot be re-run each time there is a change in the data room.58 To minimize cooling costs, the development of models that59 accurately predict the CPU temperature of the servers under60 variable environmental conditions is a major challenge. These61 models need to work on runtime, adapting to the changing con-62 ditions of the data room automatically re-training if data center63 conditions change dramatically, and enabling data center oper-64 ators to increase room temperature safely.65 The nature of the problem suggests the usage of meta-66 heuristics instead of analytical solutions. Meta-heuristics make67 few assumptions about the problem, providing good solutions68 even when they have fragmented information. Some meta-69 heuristics such as Genetic Programming (GP) perform Feature70 Engineering (FE), a particularly useful technique to select the71 set of features and combination of variables that best describe72 a model. Grammatical Evolution (GE) is an evolutionary com-73 putation technique based on GP used to perform symbolic re-74 gression [10]. This technique is particularly useful to provide75 solutions that include non-linear terms offering Feature Engi-76 neering capabilities and removing analytical modeling barriers.77 Also, designer’s expertise is not required to process a high vol-78 ume of data as GE is an automatic method. However, GE pro-79 vides a vast space of solutions that may need to be bounded to80 achieve algorithm efficiency.81 This paper develops a data center room thermal modeling82 methodology based on GE to predict on runtime, and with suf-83 ficient anticipation, the critical variables that drive reliability84 and cooling power consumption in data centers. Particularly,85 the main contributions of our work are the following:86 • The development of multi-variable models that incorpo-87 rate time dependence based on Grammatical Evolution88 to predict CPU and inlet temperature of the servers in a89 data room during runtime. Due to the feature engineering90 and symbolic regression performed by GE, our models in-91 corporate the optimum selection of representative features92 that best describe the thermal behavior.93 • We prevent premature convergence by means of Social94 Disaster Techniques and Random Off-Spring Generation,95 dramatically reducing the number of generations needed96 to obtain accurate solutions. We tune the models by se-97 lecting the optimum parameters and fitness function using98 a reduced experimental setup, consisting of real measure-99 ments taken from a single server isolated in a fully sen-100 sorized data room.101 • We offer a comparison with other techniques commonly102 used in literature to solve temperature modeling problems,103 such as autoregressive moving average (ARMA) mod-104 els, linear model identification methods (N4SID), and dy-105 namic neural networks (NARX).106 • The proposal of an automatic data room thermal model-107 ing methodology that scales our solution to a realistic Data108 Figure 1: Typical raised-floor air-cooled Data Center layout Center scenario. As a case study, we model CPU and in- 109 let temperatures using real traces from a production data 110 center. 111 Our work allows the generation of accurate temperature models 112 able to work on runtime and adapt to the ever changing con- 113 ditions of these scenarios, while achieving very low average 114 errors of 2°C for CPU temperature and 0.5°C for inlet temper- 115 ature. 116 The remainder of the paper is organized as follows: Section 2 117 accurately describes the modeling problem, whereas Section 3 118 provides an overview of the current solutions. Section 4 de- 119 scribes our proposed solution, whereas Section 5 presents the 120 experimental methodology. Section 6 shows the results ob- 121 tained and Section 7 discusses them. Finally, Section 8 con- 122 cludes the paper. 123 2. Problem description 124 2.1. Data room thermal dynamics 125 To ensure the safe operation of a traditional raised-floor air- 126 cooled Data Center, data rooms are equipped with chilled-water 127 Computer Room Air Conditioning (CRAC) units that use con- 128 ventional air-cooling methods. Servers are mounted in racks on 129 a raised floor. Racks are arranged in alternating cold/hot aisles, 130 with server inlets facing cold air and outlets creating hot aisles. 131 CRAC units supply air at a certain temperature and air flow rate 132 to the Data Center through the floor plenum. The floor has some 133 perforated tiles through which the blown air comes out. Cold 134 air refrigerates servers and heated exhaust air is returned to the 135 CRAC units via the ceiling, as shown in Figure 1. 136 Even though this solution is very inefficient in terms of en- 137 ergy consumption, the majority of the data centers use this 138 mechanism. In fact, despite the recent advances in high-density 139 cooling techniques, according to a survey by the Uptime Insti- 140 tute, in 2012 only 19% of large scale data centers had incor- 141 porated other cooling mechanisms [5]. In some scenarios, the 142 control knob of the cooling subsystem is the cold air supply 143 temperature, whereas in others, it is the return temperature of 144 the heated exhaust air to the CRAC unit. 145 The maximum IT power density that can be deployed in the 146 Data Center is limited by the perforated tile airflow. Because 147 the plenum is usually obstructed (e.g. blocked with cables in 148 some areas), a non-uniform airflow distribution is generated and 149 each tile exhibits a different pressure drop. Moreover, in data 150 centers where the hot and cold aisles are not isolated, which is 151 2 the most common scenario, the heated exhaust air recirculates152 to the cold aisle, mixing with the cold air.153 2.2. Temperature-energy tradeoffs154 The factor limiting minimum data room cooling is maximum155 server CPU temperature. Temperatures higher than 85°C can156 cause permanent reliability failures [11]. At temperatures above157 95°C, servers usually turn off to prevent thermal redlining. Pre-158 vious work on server power and thermal modeling [12], shows159 how CPU temperature is dominated by: i) power consumption,160 which is dependent on workload execution, ii) fan speed, which161 changes the cooling capacity of the server, and iii) server cold162 air supply (inlet temperature).163 Thus, to keep all the equipment under normal operation,164 CRAC units have to supply the air at an adequately low temper-165 ature to ensure that all CPU’s are below the critical threshold.166 However, inlet temperature is also not uniform across servers.167 The cold air temperature at the server inlet depends on several168 parameters: i) the CRAC cold air supply, ii) the airflow rate169 through the perforated tiles, and iii) the outlet temperature of170 adjacent servers due to air recirculation.171 Setting the cooling air supply temperature to a low value,172 even though ensures safety operation, implies increased power173 consumption due to a larger burden on the chiller system. The174 goal of energy-efficient cooling strategies is to increase the cold175 air supply temperature without reaching thermal redlining. Due176 to the non-linear efficiency of cooling systems, lowering air177 supply temperature can yield important energy savings. A met-178 ric widely used is that each degree of increase in air supply179 temperature yields 4% energy savings in the cooling subsys-180 tem [13]. To increase air supply temperature safely, however,181 we need to predict not only the inlet temperature to the servers,182 but also the CPU temperature that each server attains under the183 current workload.184 Due to the temperature gradients between hot and cold aisles185 and the data room layout and geometry, the air inside a data cen-186 ter behaves like a turbulent fluid. Thus, obtaining an analytical187 relation between cold air supply and server inlet temperature is188 not trivial, making inlet and CPU temperature prediction a chal-189 lenging problem. Besides, data centers are composed of thou-190 sands of CPU cores, whose temperatures need to be modeled191 independently. This prevents the usage of classical regression192 techniques that need human interaction to train and validate the193 models.194 3. Literature overview195 Data center room thermal modeling enables both thermal196 emergency management and energy optimization, and enhances197 reliability. Because of the turbulent behavior of the air in the198 Data Center room, Computational Fluid Dynamics (CFD) sim-199 ulation has traditionally been the most commonly used solution200 in both industry and academia [14].201 CFD is used to model the inlet and outlet temperature of202 servers, given cold air supply parameters, room layout, server203 configuration and utilization, in order to either optimize cooling204 costs or detect hot spots [15]. CFD solvers perform a three- 205 dimensional numerical analysis of the thermodynamic equa- 206 tions that govern the data room. Their main drawbacks is that 207 they require and expert to configure the simulation, and are 208 computationally costly both in the modeling stage (i.e. mod- 209 eling a small-sized data room may take from hours to days) 210 and in the evaluation phase, thus preventing their online usage. 211 Moreover, CFD simulation is not robust to changes in the layout 212 of the Data Center, i.e. server placement, open tiles, workload 213 running or cold air supply setting. 214 To solve these issues, Chen et.al. [16] use CFD together with 215 sensor information to calibrate the simulation and reduce com- 216 putational complexity. Their work achieves a prediction er- 217 ror below 2°C when predicting temperature 10 minutes in ad- 218 vance. Other work [17] presents the Data Center as a distributed 219 Cyber-Physical System (CPS) in which both computational and 220 physical parameters can be measured with the goal of minimiz- 221 ing energy consumption. Our work leverages this concept by 222 using a monitoring system [18] capable of collecting environ- 223 mental (i.e. cold air supply and server inlet temperature, air- 224 flow, etc.) and server data (i.e. temperature, power, fan speed, 225 etc.) from a real data center. 226 A common alternative to CFD are abstract heat flow models. 227 These models characterize the steady state of hot air recircu- 228 lation inside the data center. Recirculation is described by a 229 cross-interference coefficient matrix which denotes how much 230 of every node’s outlet heat contributes to the inlet of every other 231 node. This matrix is obtained in an offline profiling stage using 232 CFD [19]. Even though profiling is still costly, model evalua- 233 tion can be performed online. 234 Machine learning techniques have also been used in Data 235 Center modeling. The Weatherman [20] tool uses neural net- 236 works to predict the inlet temperature of servers, obtaining pre- 237 diction errors below 1°C in over 90% of their traces. How- 238 ever, they use simulation traces obtained with CFD simulation 239 for their training and test sets, instead of real data. The prob- 240 lem behind time series prediction can be explained as a prob- 241 lem of extracting a manageable set of adequate features, fol- 242 lowed by a regression mechanism. Careful selection of features 243 and their horizon is therefore of much greater importance com- 244 pared with the static-data prediction problem. Neural network- 245 based approaches require previous knowledge of the parame- 246 ters that drive thermal modeling, obtaining them using pseudo- 247 exhaustive algorithms. Our work, on the contrary, relies on the 248 benefits of feature engineering in Symbolic Regression prob- 249 lems to obtain the relevant features and construct the models in 250 an automated way. 251 The work by De Silva et al. [21] is the one most similar to 252 ours, regarding the modeling methodology. The authors use 253 Grammatical Evolution for electricity load prediction. As op- 254 posed to our work, this paper is focused on predicting the trend, 255 momentum and volatility indicators of a timeseries, not on ob- 256 taining a physical model, i.e. they do not solve a multi-variable 257 problem. 258 In summary, the main issues in all previous approaches are: 259 i) they monitor and predict inlet temperature instead of CPU 260 temperature, ii) modeling is performed for only certain hand- 261 3 picked cooling and workload configurations, iii) the use of CPU262 utilization as a proxy for server power, iv) they assume data263 centers with homogeneous servers, v) server fan speed is con-264 sidered constant, vi) results are not validated with real traces,265 and vii) model construction requires specific knowledge on the266 problem and classical feature selection, which prevents the us-267 age of automated techniques.268 Our work, on the contrary, first predicts inlet temperature and269 then uses this result to predict CPU temperature, which is the270 factor limiting cooling. Both in our training and test sets, we271 use real traces obtained from enterprise servers in a data cen-272 ter. Moreover, as shown in previous work [12], in highly multi-273 threaded enterprise servers, utilization is not a good proxy for274 power for arbitrary workloads.275 Enterprise servers come with automatic temperature-driven276 variable fan control policies. When fan speed changes, so does277 the airflow and the server cooling capacity [22]. Our method-278 ology also considers the contribution of variable fan speed, al-279 lowing us to predict temperature in heterogeneous data center280 setups running arbitrary workloads.281 At the server level, Heather et al. [23] propose a server282 temperature prediction model based on simplified thermody-283 namic equations obtaining results within 1°C of accuracy. Even284 though this approach predicts CPU temperature and takes into285 account inlet temperature, it does not predict the inlet and needs286 specific knowledge about several server parameters. Our ap-287 proach only uses data from the generic sensors deployed in the288 server and data room.289 Another common approach to CPU temperature modeling is290 the usage of autoregressive moving average (ARMA) model-291 ing to estimate future temperature accurately based on previous292 measurements [24]. Their main drawbacks are that, because293 they only use past temperature samples, the prediction horizon294 is usually below one second. Moreover, they do not provide a295 physical model, disregarding the effect of power or airflow, and296 need to be retrained often.297 As opposed to others, our work achieves prediction horizons298 of 1 minute for CPU temperature and 10 minutes for inlet tem-299 perature with high accuracy. This enables data center opera-300 tions to take action before thermal events occur, by changing301 either the workload or the cooling in the data center.302 4. Gramatical evolution techniques303 Evolutionary algorithms use the principles of evolution to304 turn one population of solutions into another, by means of selec-305 tion, crossover and mutation. Among them, Genetic Program-306 ming (GP) has proven to be effective in a number of Symbolic307 Regression (SR) problems [25]. However, GP presents some308 limitations like bloating of the evolution (excessive growth of309 memory computer structures), often produced in the phenotype310 of the individual. In the last years, variants to GP like Gram-311 matical Evolution (GE) appeared as a simpler optimization pro-312 cess [26]. GE is inspired in the biological process of generating313 a protein given the DNA of an organism. GE evolves computer314 programs given a set of rules, adopting a bio-inspired genotype-315 phenotype mapping process.316 Real data measurements Grammatical Evolution Prediction window Data window 2 min 1 min predicted sample current measurement Figure 2: CPU temperature prediction diagram. In this section, we describe how we perform feature selec- 317 tion, provide a brief insight on the grammars and mapping pro- 318 cess, as well as on several model parameters. 319 4.1. Feature selection and model definition 320 In this work we use Feature Engineering (FE) and Grammat- 321 ical Evolution to obtain a mathematical expression that models 322 CPU and server inlet temperature. This expression is derived 323 from experimental measurements in real server and data room 324 scenarios, gathering data that have an impact on temperature, 325 according to previous work in the area [12, 18]. To predict CPU 326 temperature, we gather server power, fan speed, inlet tempera- 327 ture and previous CPU temperature measurements. For inlet 328 temperature, we gather the CRAC air supply and return tem- 329 perature, humidity, pressure difference through perforated tiles 330 (which is a measurement that provides information about air- 331 flow) and previous inlet temperature measurements. Our goal 332 is to predict temperature a certain time (samples) in advance, 333 by using past data of the available magnitudes within a win- 334 dow. We may use past samples from the magnitude we need to 335 predict, or even previously predicted data. 336 For illustration purposes, in Figure 2 we show a diagram in 337 which CPU temperature is predicted 1 minute ahead given: i) 338 2 minutes of past measurements (data window) for fan speed, 339 server power, inlet and CPU temperature and, ii) the previous 340 CPU temperature predictions (prediction window). 341 Formally, we claim that CPU temperature prediction for a 342 certain time instant α samples into the future is a function of 343 past data measurements within a window of size i = {0..Wcpu}, 344 and previously predicted values within a window of size j = 345 {1..α} as expressed in Eq.(1): 346 T̂CPU(k + α) = f ( Tinlet(k − i), FS (k − i), P(k − i), TCPU(k − i), T̂CPU(k + j) ) (1) where Tinlet is a short form for the previous inlet temperature 347 values in a window: {Tinlet(k − i)|i ∈ {0, ...,Wcpu}}. TCPU , FS 348 4 and P are past CPU temperature, fan speed, and server power349 consumption values respectively, which are defined similarly,350 and T̂CPU are previous temperature predictions.351 For inlet temperature, our claim is that inlet temperature Tinlet352 of a certain server is driven by the room thermal dynamics and353 can be expressed as a function of the cold air supply (or return)354 temperature, TCRAC , differential pressure across perforated tiles355 γ (measured in inches of water, inH2O) and data room humidity356 h (in percentage), as in Eq.(2):357 T̂inlet(k + β) = f ( TCRAC(k − i), γ(k − i), h(k − i), FS p−m(k − i), T̂inlet(k + j) ) (2) where the data window can be defined in the range i = {0..Winlet}358 and the prediction window is j = {1..β}359 Note that, in general, α and β are not equal, as the room dy-360 namics are much slower than the CPU temperature dynamics361 of the servers, i.e. in a real data room we might need hours362 to appreciate substantial differences in ambient temperature,363 whereas CPU temperature changes within seconds. The selec-364 tion of relevant features among all data measurements is a Sym-365 bolic Regression (SR) problem. In our approach, GE allows the366 generation of mathematical models applying SR.367 Regarding both the structure and the internal operators, GE368 works like a classic Genetic Algorithm [27]. GE evolves a pop-369 ulation formed by a set of individuals, each one constituted by a370 chromosome and a fitness value. In SR, the fitness value is usu-371 ally a regression metric like Mean Squared Error (MSE). In GE,372 a chromosome is a string of integers. In the optimization pro-373 cess, GA operators, are iteratively applied to improve the fitness374 value of each individual. In order to compute the fitness func-375 tion for every iteration and extract the mathematical expression376 given by an individual (phenotype), a mapping process is ap-377 plied to the chromosome (genotype). This mapping process is378 achieved by defining a set of rules to obtain the mathematical379 expression, using grammars in Backus Naur Form (BNF) [26].380 The process does not only perform parameter identification.381 In conjunction with a well-defined fitness function, the evolu-382 tionary algorithm is also computing mathematical expressions383 with the set of features that best fit the target system. Thus, GE384 is also defining the optimal set of features that derive into the385 most accurate power model.386 Moreover, this methodology can be used to predict magni-387 tudes with memory, such as temperature, where the current ob-388 servation depends on past values. To incorporate time depen-389 dence, data used for model creation needs to be a timeseries. In390 addition, we need to tune our grammars so that they can pro-391 duce models where past temperature values can be used to pre-392 dict temperature a certain number of samples into the future.393 Grammar 1 shows an example where variable x may take val-394 ues in the current time step k, i.e., x[k−0] or in previous samples395 like x[k−1] or x[k−2]. Moreover, a new variable xpred[k−idx]396 can be included, that accounts for previously predicted values397 of variable x.398 Including time dependence into a grammar has some draw-399 backs. First, we are substantially increasing the search space400 of our algorithms, as now the GE needs to search for the best 401 solution among all variables within the specified window. As 402 a consequence, the number of generations needed to obtain a 403 good fitness value increases. Second, as we show in the re- 404 sults section, depending on the prediction horizon the models 405 tend to fall into a local optimum, in which the best phenotype is 406 the last available observation of the predicted variable. To ad- 407 dress the latter challenge, we propose the use of the premature 408 convergence prevention techniques that are next explained, and 409 that also benefit the converge time of our algorithms. Despite 410 the drawbacks, introducing time dependence in our modeling 411 is a must, as temperature transients (both at the server and the 412 data center level) are not negligible and need to be accurately 413 predicted. 414 For a more detailed explanation on the principles of the map- 415 ping process, and how the BNF grammars are used to incorpo- 416 rate time dependence, the reader is referred to the Appendix. 417 Grammar 1 Example of a grammar in BNF that generates phe- notypes with time dependence (I)〈expr〉 ::= 〈expr〉〈op〉〈expr〉 | 〈preop〉(〈expr〉) | 〈var〉 (II)〈op〉 ::= +|-|*|/ (III)〈preop〉 ::= sin| cos | log (IV)〈var〉 ::= x[k-〈idx〉] | xpred[k-〈idx〉] | y | z | 〈num〉 (V)〈num〉 ::= 〈dig〉.〈dig〉 | 〈dig〉 (VI)〈dig〉 ::= 0 | 1 | 2 | 3 | 4 | 5 (VII)〈idx〉 ::= 0 | 1 | 2 4.2. Preventing premature convergence 418 Premature convergence of a genetic algorithm arises when 419 the chromosomes of some high rated individuals quickly dom- 420 inate the population, reducing diversity, and constraining it to 421 converge to a local optimum. Premature convergence is one of 422 the major shortcomings when trying to model low variability 423 magnitudes by using GE techniques. 424 To overcome the lack of variety in the population, work by 425 Kureichick et al. [28] proposes the usage of Social Disaster 426 Techniques (SDT). This technique is based on monitoring the 427 population to find local optima, and apply an operator: 428 1. Packing: all individuals having the same fitness value ex- 429 cept one are fully randomized. 430 2. Judgment day: only the fittest individual survives while 431 the remaining are fully randomized. 432 Work by Rocha et al. [29] proposes the usage of Random 433 Off-spring Generation (ROG) to prevent the crossover of two 434 individuals with equal genotype, as this would result in the off- 435 spring being equal to the parents. Individuals are tested before 436 crossover and, if equal, then one off-spring (1-RO) or both of 437 them (2-RO) are randomly generated. 438 5 Both previous solutions have shown important benefits in439 classical Genetic Algorithms problems. In our work, we use440 these techniques to improve the convergence time of our solu-441 tions, as we show in Section 6.442 4.3. Fitness443 The goal of using GE for data room thermal modeling is to444 obtain accurate models. Thus, our fitness function needs to ex-445 press the error resulting in the estimation process. To measure446 the accuracy in our prediction, we would preferably use the447 Mean Absolute Error (MAE). However, because temperature is448 a magnitude that varies slowly and might remain constant dur-449 ing large time intervals, we need to give higher weight to large450 errors. To this end, the fitness function f presented in Eq.(3)451 tries to reduce the variance of the model, leading the evolution452 to obtain solutions that minimize the the Root Mean Square Er-453 ror (RMSE):454 f = √ 1 N · ∑ i ei 2 (3) where the estimation error ei represents the deviation between455 the real temperature samples (both for CPU and inlet temper-456 ature modeling) obtained by the monitoring system T , and the457 estimation obtained by the model T̂ . i represents each sample458 of the entire set of N samples used to train the algorithms.459 4.4. Problem constraints460 As we are modeling the behavior of physical magnitudes for461 optimization purposes, we need to obtain a solution with phys-462 ical meaning. To this end, we constrain the general problem463 of temperature modeling in several ways that are subsequently464 presented, while still being able to address heterogeneous work-465 loads, architectures and topologies. In the results section we466 evaluate the impact of these constraints on the model genera-467 tion stage.468 4.4.1. Constraining the grammar469 The mathematical expressions can be constrained to a limited470 number of functions with physical meaning. Because temper-471 ature exhibits exponential transients, we can include the expo-472 nential function in our grammar, whereas we do not find physi-473 cal basis to include other mathematical functions such as sines474 or cosines.475 4.4.2. Fitness biasing476 Some parameters drive the variables being modeled. For in-477 stance, power consumption drives CPU temperature. As we478 want to obtain models able to capture the physical phenomena479 that drive temperature, this magnitude should be present in the480 final model. Thus, CPU temperature models that do not include481 power in their phenotype are expected to provide good results482 in the training phase, but to perform poorly for the test, as they483 are not capturing the physical phenomena. To solve this issue,484 we can force the appearance of some parameters by biasing the485 fitness, giving higher weights (i.e. worse fitness) to expressions486 that miss a parameter. By biasing fitness we speed-up conver- 487 gence, we ensure that our models incorporate all parameters 488 directly correlated with temperature, but we could obtain less 489 accurate results. 490 4.4.3. Real vs. mixed models 491 Purely real models only use real temperature data measure- 492 ments to predict future samples. Purely predictive models do 493 not used previous temperature measurements, but may use pre- 494 vious predictions. Mixed models may used both real and pre- 495 dicted data. Adding the predicted samples as a variable in- 496 creases the size of the search space but may provide higher ac- 497 curacy. 498 5. Experimental Methodology 499 In this section we describe the experimental methodology 500 followed in this paper to model server and environmental pa- 501 rameters in Data Centers. First, we describe an scenario con- 502 sisting only in the temperature prediction of one server in a 503 small air-cooled data room. We use this scenario to tune the 504 model parameters, testing those that generate better models and 505 studying the convergence of the solutions. Then, we apply the 506 best algorithm configuration to a real data center. As a case 507 study, we use real traces of CeSViMa Data Center, a High 508 Performance Computing cluster at Universidad Politécnica de 509 Madrid in Spain. 510 5.1. Reduced scenario 511 This scenario consists on an Intel Xeon RX-300 S6 server 512 equipped with 1 quad-core CPU and 16GB of RAM. The server 513 is installed in a rack with another 4 servers, 2 switches and 2 514 UPS units, in an air-cooled data room of approximately 30m2, 515 with the rack inlet facing the cold air supply and the outlet to the 516 heat exhaust. The cooling infrastructure consists on a Daikin 517 FTXS30 split that pumps cold air from the ceiling, and there 518 is no floor plenum. The cold air supply ranges from 16°C to 519 26°C. The data room is fully monitored, and both the cooling 520 and server workload are controllable. 521 5.1.1. Monitoring 522 Both the server and data room are fully monitored using the 523 internal server sensors and a wireless sensor network, as de- 524 scribed in [18]. In particular, server CPU temperature and fan 525 speed values are obtained via the server internal sensors, col- 526 lected through the Intelligent Platform Management Interface 527 (IPMI) tool 1. IPMI allows polling the internal sensors of en- 528 terprise servers with negligible overhead. As the server is not 529 shipped with power consumption sensors, we use non-intrusive 530 current clamps connected to the power cord of the server to 531 gather total server power consumption. Wireless sensors moni- 532 tor the inlet temperature of the server, the cold air supply tem- 533 perature of the split unit and data room humidity. Data are sent 534 to the monitoring server, stored and aligned to ensure a common 535 timestamp. 536 1http://ipmitool.sourceforge.net/ 6 100 200 300 400 500 600 700 800 20 25 30 T e m p (d e g ) a) Inlet temperature 100 200 300 400 500 600 700 800 120 140 160 180 200 220 P o w e r( W ) b) Server power consumption 100 200 300 400 500 600 700 800 3000 4000 5000 6000 F a n s p e e d (r p m ) c) Server fan speed 100 200 300 400 500 600 700 800 40 60 T e m p (d e g ) d) Measured CPU temperature Te m p( °C ) Te m p( °C ) 1000 2000 3000 4000 5000 6000 7000 8000 1000 2000 3000 4000 5000 6000 7000 8000 1000 2000 3000 4000 5000 6000 7000 8000 1000 2000 3000 4000 5000 6000 7000 8000 d) CPU temperature with time (sec) Figure 3: Training samples used for CPU temperature modeling 5.1.2. Training and test set generation537 We generate the training and test set by assigning a wide va-538 riety of workloads that exhibit various stress levels in the CPU539 and memory subsystems of the server while we modify the cold540 air supply temperature of the split in a range from 16°C to 26°C.541 All workloads used are a representative set, in terms of stress542 to the server subsystem and power consumption, of the ones543 that can be found in High-Performance Computing data centers.544 Also, the temperatures selected for cold air supply temperature545 are within the allowable ranges in current data centers.546 The workloads used are the following: i) Lookbusy2, a syn-547 thetic workload that stresses the CPU to a customizable utiliza-548 tion value, avoiding the stress of memory and disk; ii) a modi-549 fied version of the synthetic benchmark RandMem3, that allows550 us to stress random memory regions of a given size with a given551 access pattern, and iii) HPC workloads belonging to the SPEC552 CPU 2006 benchmark suite [30].553 During training, we launch Lookbusy and Randmem at var-554 ious utilization values, plus a subset of the SPEC CPU 2006555 benchmarks that exhibit a distinctive set of characteristics ac-556 cording to Phansalkar et al. [31]. Both the arrival time and task557 duration are randomly selected. During execution, the cold air558 supply temperature is also randomly changed. For the test set,559 we randomly launch a SPEC CPU benchmark, with random560 waiting intervals while changing cold air supply temperature.561 Our monitoring system collects all data with a 10 second562 sampling interval for a total time of 5 hours for the training563 and 10 hours for the test set. Figure 3 shows part of the training564 set used for modeling.565 5.2. Case study: CeSViMa Data Center566 To show how our solution can be applied to a real data cen-567 ter scenario, this paper presents a case study for a real High-568 Performance Computing Data Center at the Madrid Supercom-569 puting and Visualization Center (CeSViMa)4. CeSViMa hosts570 2http://www.devin.com/lookbusy/ 3http://www.roylongbottom.org.uk 4http://www.cesvima.upm.es/ Storage Rack 10 C R A C C R A C C R A C C R A C C R A C C R A C VPS C02 C01 C03 C04 1 2 3 4 5 6 7 a) CeSViMa Data Center room layout b) Power7 rack front view Figure 4: CeSViMa data room layout. Models are developed for Power7 nodes 1,4 and 7 at high c02 in racks 1 and 4. the Magerit Supercomputer, a cluster consisting of 286 com- 571 puter nodes in 11 racks, providing 4,160 processors to execute 572 High-Performance jobs on demand. 245 of the 286 nodes are 573 IBM PS702 2S with 2 Power7 CPU’s blade servers, each with 574 8 cores running at 3.3GHz and 32GB of RAM. The other 41 575 nodes are HS22 2U servers with 2 Intel Xeon processors of 8 576 cores each at 2.5GHz and 64GB RAM. 577 CeSViMa data room has a cold-hot aisle layout and is cooled 578 by means of 6 CRAC units arranged in the walls that impulse air 579 through the floor plenum. To control data room cooling, the air 580 return temperature of each CRAC unit can be set independently. 581 The room has a total size of 190 square meters. Figure 4a shows 582 the layout of the data center. Rack 0 is a control rack that runs 583 no HPC computation. Racks 1-9 are filled with Power7 blade 584 servers, whereas rack 10 contains Intel Xeon servers. Each 585 Power7 node is installed in a blade center. Each blade center 586 contains up to 7 blades, and each rack contains 4 chassis (C01 587 to C04), as shown in Figure 4b. To run our models, we have de- 588 ployed the same sensor network as in the reduced scenario. In 589 particular, to model inlet temperature we gather inlet temper- 590 ature, humidity, CRAC air return temperature and differential 591 pressure through the floor plenum. Because we have placed 592 pressure sensors in the tiles in front of racks 1 and 4, we model 593 the Power7 nodes in these racks. To model CPU temperature, 594 we also collect CPU temperature and fan speed of all servers via 595 IPMI. CeSViMa Power7 chassis do not have per-server power 596 sensors and we are not able to deploy current clamps. Thus, 597 we use per-server utilization as a proxy for the power consump- 598 tion of the node. As stated before, utilization is not an accurate 599 metric for arbitrary workloads. However, because of the nature 600 of the workloads in CeSViMa and only for thermal modeling 601 purposes, utilization can be used as a proxy variable to power 602 consumption, as we show in Section 6. 603 In this work we show the modeling results for the servers 604 highlighted in red in Figure 4, i.e. nodes 1,4,7 at chassis c02 605 of both rack 1 and 4. These nodes are the ones that exhibit 606 the most variable workload and extreme temperature conditions 607 and constitute the worst-case scenario for modeling. 608 7 5.3. Modeling framework609 Because of the large number of servers in CeSViMa, to en-610 able cooling optimization we need a framework that allows to611 automatically model and predict the CPU and inlet tempera-612 ture of all servers. Even though CeSViMa is a small-sized data613 room, it has a very high density in terms of IT equipment. For614 instance, the amount of data gathered that needs to be processed615 to enable full environmental modeling and prediction, for a pe-616 riod of 1 year, is above 100GB. Thus, modeling the whole data617 center with traditional approaches that require human interac-618 tion is not feasible.619 Our work uses the proposed GE techniques to automatically620 model all the parameters involved in data center optimization by621 automatically running the training of the algorithms and testing622 them during runtime.623 6. Results624 In this section we present the experimental results obtained625 when applying Grammatical Evolution to model CPU and in-626 let temperature. First, we show the results for the controlled627 scenario, describing the best algorithm configuration, and com-628 pare our method with state-of-the art solutions. Then, we apply629 the best configuration to train and test the models in a real data630 center scenario.631 6.1. Algorithm setup and performance632 First, we use GE to obtain a set of candidate solutions with633 low error when compared to the temperature measurements in634 our controlled experimental setup, under different constraints.635 After evaluating the performance of our model with several636 setups, we select the following one for each model in this paper:637 • Population size: 200 individuals638 • Chromosome length: 100 codons639 • Mutation probability: inversely proportional to the number640 of rules.641 • Crossover probability: 0.9642 • Maximum wraps: 3643 • Codon size: 8 bits (values from 0 to 255)644 • Tournament size: 2 (binary)645 For CPU temperature prediction, we use a data window of646 Wcpu = 20 samples (corresponding to 200 seconds) and a pre-647 diction window of α = 6 (corresponding to 60 seconds). The648 data window is heuristically chosen with respect to the largest649 observed temperature transient, and its size is a trade-off be-650 tween model accuracy and convergence time. In this sense, the651 data window needs to incorporate enough samples to capture652 the trends of thermal transients, but we also need to consider653 that the larger the data window, the larger the exploration space.654 The prediction window needs to be selected with respect to the655 time it takes to actuate on the system and observe a response. In656 our case, the data window size Wcpu has been chosen in accor- 657 dance to our previous work on server power modeling, where 658 we analytically modeled temperature transients in enterprise 659 servers, observing that the largest transients, i.e. the worst-case 660 modeling scenario, occur for the lower fan speed values [12]. 661 The prediction window is chosen given the physical constraints 662 of the problem: 1-minute prediction is sufficient time to change 663 the workload assignment of a server, as canceling the workload 664 of a server in case of thermal redlining takes few seconds. 665 For inlet temperature prediction, we also use a data window 666 of Winlet = 20 samples but a prediction window of β = 5 sam- 667 ples. Inlet temperature dynamics are much slower than CPU 668 temperature. Because of this, a sampling rate of 2 minutes over 669 inlet temperature is sufficient to get accurate results. Given the 670 size of the prediction window, we are able to obtain inlet tem- 671 perature samples 10 minutes advance, which is sufficient time 672 to act upon data room cooling. 673 Next, we present the comparison among several configura- 674 tions in terms of grammar expressions and rules, premature 675 convergence prevention and fitness biasing. We detail our re- 676 sults for CPU temperature modeling. The procedure to tune 677 inlet temperature models is completely equivalent. 678 6.1.1. Data preprocessing and model simplification 679 Because the power measurements of the Intel Xeon server are 680 taken with a current clamp, the power values obtained exhibit 681 some noise. We preprocess the data to eliminate high-frequency 682 noise, smoothing the power consumption trace by means of a 683 low pass filter. The remaining traces did not exhibit noise, so 684 no preprocessing was needed. 685 Moreover, we perform variable standardization for every fea- 686 ture (in the range [1, 2]) to assure the same probability of ap- 687 pearance for all the variables and to enhance the GE symbolic 688 regression. 689 6.1.2. Grammars used 690 To model CPU temperature we have tested three different 691 grammars: 692 • The first is shown in Grammar 2 and contains a wide set 693 of operands and preoperands (rules II and III), that do not 694 necessarily yield models with a physical meaning. 695 • The second grammar is a variation of Grammar 2 in which 696 the number of preoperands (rule III) is reduced to expo- 697 nentials only, i.e. 〈preop〉 ::= exp 698 • The last grammar is the one presented in Grammar 3 and 699 also reduces the set of possible expressions (rule I). 700 From the previous three grammars the one that has faster con- 701 vergence time to achieve a low error, is Grammar 3. Constrain- 702 ing the grammar improves convergence time and provides phe- 703 notypes that have physical meaning, without an increase in the 704 modeling error obtained. Thus, for the remaining of the paper 705 we work with the simplified Grammar 3 when modeling CPU 706 temperature. 707 8 Grammar 2 Grammar used for CPU temperature modeling in BNF, that uses inlet temperature (TIN), fan speed (FS), power consumption (PS), past CPU temperature (TS) and past pre- dicted CPU temperature (TpS) (I)〈expr〉 ::= 〈expr〉〈op〉〈expr〉|(〈expr〉〈op〉〈expr〉) | 〈preop〉(〈expr〉)|〈var〉|〈cte〉 (II)〈op〉 ::= +|-|*|/ (III)〈preop〉 ::= exp | sin | cos | tan (IV)〈var〉 ::= TS[k-〈idx〉]|TIN[k-〈idx〉]|PS[k-〈idx〉] |FS[k-〈idx〉] (V)〈idx〉 ::= 〈dgt2〉〈dgt〉 (VI)〈cte〉 ::= 〈dgt〉.〈dgt〉 (VII)〈dgt〉 ::= 0|1|2|3|4|5|6|7|8|9 (VIII)〈dgt2〉 ::= 0|1 Grammar 3 Simplified grammar in BNF format used for CPU temperature modeling (I)〈expr〉 ::= 〈expr〉〈op〉〈expr〉|(〈expr〉〈op〉〈expr〉) 〈preop〉(〈exponent〉)|〈var〉|〈cte〉 (II)〈op〉 ::= +|-|*|/ (III)〈preop〉 ::= exp (IV)〈exponent〉 ::= 〈sign〉〈cte〉*〈var〉 |〈sign〉〈cte〉*(〈var〉〈op〉〈var〉) (V)〈sign〉 ::= +|- (VI)〈var〉 ::= TpS[k-〈idx〉]|TS[k-〈idx〉]|TIN[k-〈idx〉] |PS[k-〈idx〉]|FS[k-〈idx〉] (VII)〈idx〉 ::= 〈dgt〉 (VIII)〈cte〉 ::= 〈dgt2〉.〈dgt2〉 (IX)〈dgt〉 ::= 1|2|3|4|5|6|7|8|9|10|11|12|13|14 |15|16|17|18|19|20 (X)〈dgt2〉 ::= 0|1|2|3|4|5|6|7|8|9 1 2 3 4 5 6 7 8 9 10 x 10 4 2.5 3 3.5 4 4.5 5 R M S E e rr o r (d e g re e ) a) Error evolution with number of generations for real models 1 2 3 4 5 6 7 8 9 10 x 10 4 2.5 3 3.5 4 4.5 5 b) Error evolution with number of generations for mixed models R M S E e rr o r (d e g re e ) No ROG/SDT ROG and SDT 5% ROG and SDT R M S E e rr or (° C ) R M S E e rr or (° C ) Figure 5: CPU temperature error evolution for real and mixed models under dif- ferent premature convergence prevention techniques: i) no technique applied, ii) ROG + SDT keeping 5% of equal individuals and iii) ROG + SDT random- izing all equal individuals. We test two variations of this grammar: i) one that searches 708 for a mixed model (i.e. uses past temperature predictions, and 709 it is the one shown in Grammar 3), and ii) the one that provides 710 a real model (i.e. only uses CPU temperature measurements). 711 The only difference between the mixed and the real grammars, 712 is the presence of the parameter T pS . 713 6.1.3. Tested configurations 714 With respect to premature convergence, we test three differ- 715 ent techniques: 716 • No premature convergence technique applied 717 • Random Off-Spring Generation (2-RO) plus Packing, 718 keeping no more that a 5% of equal individuals. 719 • Random Off-Spring Generation (2-RO) plus Packing, 720 leaving no more than 1 individual with equal phenotype. 721 For each of the previous configurations, we run both real and 722 mixed models, with the goal of comparing the convergence time 723 and the fitness evolution of each configuration. Because of the 724 random evolution of the algorithms, for comparison purposes, 725 we run the same model training 5 times and average the RMSE 726 obtained for different number of generations. Figure 5 shows 727 the RMSE evolution for the three configurations, with both real 728 and mixed models. 729 When we do not apply any technique, error decay is much 730 slower, as population loses diversity and improves only due to 731 mutation in the individuals. The impact is higher for the mixed 732 models, where search space is larger. When we apply ROG and 733 SDT, we need less generations to obtain good fitness values. 734 However, keeping only 1 individual with the same phenotype 735 and randomizing the remaining population is too aggressive, 736 9 0 1 2 3 4 5 6 7 8 9 10 x 10 4 2.5 3 3.5 4 4.5 5 R M S E e rr o r (d e g re e ) Number of generations Real − No bias Mixed − No bias Real − Biased Mixed − Biased R M S E e rr or (° C ) Figure 6: CPU temperature error evolution for real and mixed models under ROG + SDT 5% when fitness is biased vs. not biased. while keeping a higher percentage of equal individuals, i.e. a737 5%, yields better results. As shown, using 30,000 generations738 is enough to obtain low RMSE values.739 Regarding fitness biasing, Figure 6 shows the differences in740 terms of RMSE for different number of generations for real and741 mixed models when we bias the fitness to force all parame-742 ters and when we do not bias it. Convergence is similar, be-743 ing slightly better that of the non-biased models. In fact, all744 variables in the grammar tend to appear in non-biased models,745 backing up the hypothesis that all those magnitudes are corre-746 lated with temperature.747 Finally, we show results for both the training and test set748 when modeling CPU temperature with the best configuration,749 i.e. a mixed model obtained with Grammar 3, using ROG and750 Packing techniques leaving 5% of equal individuals, and not751 biasing the fitness. Table 1 shows the 5 better phenotypes ob-752 tained and their corresponding RMSE and MAE values for the753 test set after simplification. To avoid overfitting, we use the754 five best models to compute the samples of the test set, i.e., we755 predict the next temperature sample with all five equations, ob-756 taining 5 different results, and we average them to obtain the757 prediction value. By applying this methodology we obtain a758 RMSE of 2.48°C and a MAE of 1.77°C. Because CPU tem-759 perature sensors usually have a resolution of 1°C we consider760 these results to be accurate enough for our purposes. Figure 7761 shows a zoom-in of the real CPU temperature trace and its pre-762 diction, for both the training and the test set. As can be seen,763 the prediction accurately matches the measured values in both764 the training and test sets.765 6.2. Comparison to other approaches766 We compare our results with three common techniques for767 CPU temperature modeling in the state-of-the-art: autoregres-768 sive moving average models, linear subspace identification769 techniques and dynamic neural networks. We first briefly de-770 scribe these three modeling techniques and then we show the re-771 sults obtained and compare them with our proposed technique.772 6.2.1. ARMA models773 ARMA models are mathematical models of autocorrelation774 in a time series, that use past values alone to forecast future775 values of a magnitude. ARMA models assume the underlying776 model as stationary and that there is a serial correlation with the777 data, something that temperature modeling accomplishes. In a778 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 T e m p (° C ) 30 40 50 60 70 Real temperature Predicted temperature (a) 1-minute training set prediction for mixed model 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 T e m p (° C ) 30 40 50 60 70 Real temperature Predicted temperature (b) 1-minute test set prediction for mixed model Figure 7: Training and test set CPU temperature prediction vs. real measure- ments with time (secs) general way, an ARMA model can be described as in Equa- 779 tion 4: 780 yt = p∑ i=1 (ai · yt−i) = et + q∑ j=1 (c j · et− j) (4) where yt is the value of the time series (CPU temperature in our 781 case) at time t, ai’s are the lag-i autoregressive coefficients, ci’s 782 are the moving average coefficient and et is the error. The error 783 is assumed to be random and normally distributed. p and q are 784 the orders of the autoregressive (AR) and the moving average 785 (MA) parts of the model, respectively. 786 The ARMA modeling methodology consists on two different 787 steps: i) identification and ii) estimation. In our work we use an 788 automated methodology similar to the one proposed by Coskun 789 et.al. [24]. During the identification phase, the model order is 790 computed, i.e, we find the optimum values for p and q of the 791 ARMA(p, q) process. To perform model identification we use 792 an automated strategy, that computes the goodness of fit for a 793 range of p and q values, starting by the simplest model (i.e., 794 an ARMA(1,0)). The goodness of fit is computed using the 795 Final Prediction Error (FPE), and the best model is the one with 796 lowest FPE value, given by Equation 5: 797 FPE = 1 + n/N 1 − n/N · V (5) where n = p + q, N is the length of the time series and V is the 798 variance of the model residuals. For a fair comparison with our 799 proposed methodology, the model obtained needs to forecast 800 α = 6 samples. 801 6.2.2. N4SID 802 N4SID is a subspace identification method that estimates an 803 n order state-space model using measured input-output data, to 804 obtain a model that represents the following system: 805 ẋ(t) = Ax(t) + Bu(t) + Ke(t) (6) y(t) = Cx(t) + Du(t) + e(t) (7) 10 Phenotype RMSE MAE TS [k − 3] + PS [k − 1] − PS [k − 4] + 1.1 · T IN[k − 7]/(e(−0.1·(T pS [k−8]−TS [k−3])) ·(PS [k − 20]/9.9) + 9.9) − (e(−4.5·(T pS [k−5]/TS [k−19])) · TS [k − 7]) − 9.6 2.8 2.08 TS [k − 5] + PS [k − 1]) − PS [k − 5] + (TS [k − 5]/5.7 − e(−1.3·(TS [k−5]/T IN[k−12])) · TS [k − 5]) +PS [k − 11]/(3.7 − TS [k − 5] · (e(−0.3·(PS [k−18]+FS [k−7])) − e(−4.9·(PS [k−12]/PS [k−16])))) − 9.8 2.59 1.86 TS [k − 5] + PS [k − 1] − PS [k − 4] + e(+6.1·(T IN[k−10]/TS [k−5])) − 5.2) − T IN[k − 14] · (0.06 · (T IN[k − 7]/TS [k − 5])) +(PS [k − 1]/3.1 · T IN[k − 7]) · TS [k − 5] − (PS [k − 20]/T IN[k − 7]) ·(−6.4·(T IN[k−15]/TS [k−11])) −8.0 2.77 2.01 TS [k − 3] + PS [k − 1] − PS [k − 4] − e(−4.1·(T pS [k−5]/T IN[k−19]))/TS [k − 3] +(T pS [k − 1]/(e(−0.1·(T pS [k−8]−TS [k−3])) · (PS [k − 20]/9.9) + 9.2))/1.6 − 9.6 2.55 1.77 TS [k − 1] + 0.73 · (PS [k − 1] · 4.4 − PS [k − 2] · 4.4 − TS [k − 1] + T pS [k − 8] + e(−0.2·T IN[k−8]) +(e(−3.4·(PS [k−20]/PS [k−10])) − e(−4.0·(PS [k−6]/PS [k−19]))) · T pS [k − 12]/9.0 − 0.9) 2.5 1.75 Table 1: Phenotype, RMSE and MAE for the test set in the CPU temperature modeling reduced scenario where A,B,C and D are state-space matrices, K is the distur-806 bance matrix, u(t) is the input, y(t) is the output, x(t) is the807 vector of n states and e(t) is the disturbance.808 State-space models are models that use state variable obser-809 vations to describe a system by a set of first-order differential810 equations, using a black-box approach. The approach consists811 on identifying a parameterization of the model, and then de-812 termining the parameters so that the measurements explain the813 model in the most accurate possible way. They have been very814 successful for the identification of linear multivariable dynamic815 systems.816 To be constructed, certain parameters need to be fed into817 the model, such as the number of forward predictions (r), the818 number of past inputs (su), and the number of past outputs(sy).819 Again, for a fair comparison with our proposed methodology,820 we need a model in the form N4S ID[r, su, sy] where r = 6, su =821 20 and sy = 20.822 6.2.3. NARX823 A Nonlinear Autoregressive eXogenous (NARX) model is a824 nonlinear autoregressive model with exogenous inputs. In this825 case, the current value of a time series is computed in function826 of a) past values of the same series, and b) current and past val-827 ues of the exogenous series. Additionally, the model contains828 an error term, since knowledge of the other terms will not en-829 able the current value of the time series to be predicted with830 precision. This is model is described as follows:831 y(t) = F(y(t−1).y(t−2), y(t−3), . . . , u(t), u(t−1), u(t−2), u(t−3), . . .)+e(t) (8) where y(t) is the output, u(t) is the exogenous variable and e(t)832 is the error term. F is a nonlinear function, and in our case is833 defined through a neural network. The NARX model is based834 on the linear ARX model, which is commonly used in time-835 series modeling.836 6.2.4. Model comparison837 Finally, we compare the results for CPU temperature mod-838 eling between our proposed approach, ARMA, N4SID, and839 NARX models, all of them with a prediction window of 6 sam-840 ples (1 minute). To perform a statistical comparison, we have841 Model Training set Test set RMSE MAE RMSE MAE ARMA1,4 3.38 ± 0.23 1.62 ± 0.13 3.23 ± 1.02 1.62 ± 0.57 ARMA9,8 3.35 ± 0.24 1.70 ± 0.14 3.24 ± 1.01 1.74 ± 0.64 N4SID 2.55 ± 0.28 1.69 ± 0.07 3.98 ± 1.11 2.93 ± 0.58 NARX 2.53 ± 0.10 1.64 ± 0.05 3.88 ± 1.16 2.35 ± 0.69 GE 2.32 ± 0.19 1.50 ± 0.08 2.56 ± 0.90 1.66 ± 0.45 Table 2: RMSE and MAE in CPU temperature prediction for each model (ARMA, N4SID, NARX and GE) Figure 8: Zoomed-in averaged CPU temperature modeling comparison be- tween ARMA(1,4), N4SID, NARX and GE conducted a non-exhaustive 5-fold cross-validation. The com- 842 plete data set is obtained from more than 10 hours of tempera- 843 ture traces gathered from an Intel Xeon RX-300 S6 server run- 844 ning a wide range of workloads under various cooling setups. 845 This data set has been partitioned into 5 equal sized subsamples. 846 A single subsample is retained as the validation data for test- 847 ing the model, and the remaining 4 subsets are used as training 848 data. This process is repeated 5 times with each of the 5 sub- 849 samples used exactly once as the validation data. RMSE and 850 MAE are averaged to perform a comparison between ARMA, 851 N4SID, NARX and GE. Each resultant set of five models will 852 be averaged to produce the final estimation. 853 Table 2 shows the RMSE and MAE errors obtained for our 854 proposed modeling technique based on GE, ARMA, N4SID 855 and NARX models, and Figure 8 shows a zoom-in into the 856 11 Time (hours) 4 8 12 16 20 24 T e m p (° C ) 18 20 22 24 Real temperature Predicted temperature Figure 9: 10-minute inlet temperature prediction in the reduced scenario for a mixed model with SDT Packing 5% and simplified grammar CPU temperature curve for the actual measurements and the857 averaged prediction of the five models obtained in the cross858 validation over the whole data set. As can be seen, GE mod-859 els are the ones with both lower RMSE and MAE. Moreover,860 the CPU temperature trend is accurately predicted. This does861 not happen for ARMA models that, even though keep the max-862 imum error low, provide values that are always behind the real863 trend, yielding poor forecasting capabilities. This issue cannot864 be solved by increasing the model order, as shown in Table 2.865 N4SID models, even though they are very accurate in the train-866 ing set, perform poorly in the test set and have an important867 bias error. Even if the bias error is corrected (which has been868 done in Figure 8) the prediction is still behind the measure-869 ments and the model is unable to capture the system dynamics.870 NARX models clearly show an overfitting effect in the train-871 ing data. The GE prediction, even though has more oscillations872 (due to the smoothed noise of the power consumption signal) is873 the only one that captures the temperature trend, advancing the874 real measurements.875 6.3. Inlet temperature modeling876 For inlet temperature modeling we perform the same study877 than for CPU temperature in terms of grammars, premature878 convergence and fitness biasing. As expected, the results in879 terms of the best model configuration yield very similar results.880 Thus, for inlet temperature modeling, we use the same configu-881 ration: i) a mixed model using a simplified version of the gram-882 mar that only allows exponentials, ii) SDT with 5% packing883 and iii) RMSE fitness function without biasing.884 The BNF grammar used is very similar to Grammar 3, where885 instead of rule VI, we use the following rule:886 〈var〉 ::= TIN[k-〈idx〉] | TpIN[k-〈idx〉] | TSUP[k-〈idx〉] | HUM[k-〈idx〉] where TSUP is cold air supply temperature, HUM is humidity,887 TIN are past inlet temperatures and TpIN are past inlet temper-888 ature predictions. Figure 9 shows the prediction for the test889 set. The RMSE of the prediction is of 0.33°C and MAE is890 0.27°C for a prediction window of 10 minutes and for the test891 set. Again, the model includes all the available variables, i.e.,892 TSUP, TIN and HUM appear in the final model.893 6.4. Data Center modeling894 We use the previous model with the same configuration to895 predict the CPU and inlet temperature of the blade servers at896 1 2 3 4 5 6 7 T e m p ( ° C ) 22 24 26 28 30 a) Rack 1 Chassis 02 Inlet temperature prediction for 1 week(time in days) Real temperature Predicted temperature 0 1 2 3 4 5 6 7 T e m p ( ° C ) 22 24 26 28 30 b) Rack 4 Chassis 02 Inlet temperature prediction for 1 week (time is in days) Real temperature Predicted temperature Figure 10: Inlet temperature modeling for various racks CeSViMa data center. Because CeSViMa is a production en- 897 vironment, when it comes to server data, we are subject to the 898 data sampling rates provided by the data center. CeSViMa col- 899 lects all data from servers every 2 minutes, and environmen- 900 tal data (i.e. from coolers) every 15 minutes. Thus, for both 901 CPU and inlet temperatures, we need to change our prediction 902 windows. For CPU temperature we use a prediction window 903 α′ = 1, which means that we predict CPU temperature two 904 minutes into the future. For inlet temperature we use a predic- 905 tion window β′ = 1 samples, i.e. we predict temperature 15 906 minutes ahead. 907 Because in CeSViMa we cannot control the workload being 908 executed, nor modify the cooling setup, we need to select longer 909 training and tests sets to ensure that they exhibit high variability 910 on the magnitudes of interest. For CPU temperature, we select 2 911 days of execution for the training set, and 4 days for the test set. 912 For inlet temperature, which varies very slowly in a real data 913 center setup we use 14 days of execution for both the training 914 and the test set. 915 Figure 10 shows a zoomed-in plot of the measured and pre- 916 dicted inlet temperature to the chassis c02 of racks 1 and 4 in 917 CeSViMa data center for a period of 8 days. Figure 11 shows 918 the measured and predicted CPU temperature traces for blades 919 b01, b04 and b07 in chassis c02 of both racks, for the first two 920 days of the same period, as well as the prediction error (i.e. the 921 difference between the real measurements and the prediction). 922 To generate these last models, instead of using the real inlet 923 temperature measurements, we use predicted inlet temperature. 924 This way, we are able to accurately predict all variables needed 925 for optimization. 926 Table 3 shows the phenotypes obtained for CPU and temper- 927 ature modeling of the servers in CeSViMa data center. We also 928 report MAE for both training and test sets. We observe that 929 all phenotypes that model CPU temperature incorporate the pa- 930 rameters of interest (inlet temperature, power and fan speed), 931 and we obtain errors below 1°C in all cases. The average RMSE 932 across models are 1.52 °C and 1.57°C for the training and test 933 set respectively. As for inlet temperature, the phenotypes in- 934 corporate both differential pressure, and CRAC return temper- 935 ature. Moreover, depending on the rack placement, the influ- 936 ence of the CRAC units vary. Here we can observe the benefits 937 of the feature selection performed by GE. Rack1, which is the 938 leftmost rack in the data center, is affected only by CRAC2; 939 whereas Rack4, situated in the middle of the row, is affected 940 12 CPU temperature traces 0 12 24 36 48 T e m p ( ° C ) 35 45 55 65 Predicted Temperature Real temperature CPU temperature traces 0 12 24 36 48 35 45 55 65 CPU temperature traces 0 12 24 36 48 35 45 55 65 Temperature Prediction Error 0 12 24 36 48T e m p ( ° C ) -5 0 5 Temperature Prediction Error 0 12 24 36 48 -5 0 5 Temperature Prediction Error 0 12 24 36 48 -5 0 5 CPU temperature traces 0 12 24 36 48 T e m p ( ° C ) 35 45 55 65 CPU temperature traces 0 12 24 36 48 35 45 55 65 CPU temperature traces 0 12 24 36 48 35 45 55 65 Temperature Prediction Error 0 12 24 36 48T e m p ( ° C ) -5 0 5 Temperature Prediction Error 0 12 24 36 48 -5 0 5 Temperature Prediction Error 0 12 24 36 48 -5 0 5 a) Rack 1 Chassis 02 Blade 01 b) Rack 1 Chassis 02 Blade 04 c) Rack 1 Chassis 02 Blade 07 d) Rack 4 Chassis 02 Blade 01 e) Rack 4 Chassis 02 Blade 04 f) Rack 4 Chassis 02 Blade 07 Figure 11: Data Center CPU temperature modeling and prediction error for various servers in different racks for 2 days of traces (time in hours) both by CRAC2 and CRAC3. The model automatically in-941 corporates the most relevant features, discarding the irrelevant942 ones. For inlet temperature prediction, our error is below 0.5°C,943 which is enough for our purposes and below other state-of-the-944 art approaches.945 7. Discussion946 In this section we briefly discuss the applicability of our mod-947 els, and the computational effort needed to model a full data948 center scenario, to validate the feasibility of our approach.949 7.1. Applicability950 The goal of our modeling is predicting server CPU temper-951 ature under variable cooling setups, so that cooling-associated952 costs can be reduced without incurring on reliability issues. To953 this end, we first predict the inlet temperature of servers given954 the data room conditions and cooling setup, and use this result955 to predict server temperature.956 Having analyzed the spatio-temporal variability of inlet tem-957 perature traces in CeSViMa data center, we find that it is suf-958 ficient to predict inlet temperature at 3 different heights (at the959 bottom, middle and top of the rack), in one out of two racks.960 This way, we need to generate 30 inlet temperature models at961 most. Because the maximum CPU temperature in the data cen-962 ter is the one limiting the cooling, at most we need to predict963 the CPU temperature of each server in the data room, i.e. we 964 need as many models as servers in the data center. However, if 965 by analyzing the traces we find that there is a subset of CPUs 966 that limit the maximum cooling of the overall data center, our 967 problem can be reduced to modeling those that always exhibit 968 higher temperatures. For the particular case of the traces of 969 CeSViMa, if we examine 6 months of CPU temperature traces, 970 we find that the CPUs limiting the cooling are the blades b04 971 and b07 placed in the second chassis (c02) of all racks. In this 972 sense, for energy optimization purposes our problem reduces to 973 generating 10 different models. These models allow us to pre- 974 dict the maximum server temperature attained in the data center 975 and, thus, detect any potential thermal redlining and act before 976 it occurs. Moreover, to leverage energy optimization, our re- 977 sults can be used to set cooling dynamically during runtime, 978 by predicting the maximum data center CPU temperature un- 979 der various cooling conditions and increasing CRAC air supply 980 temperature without incurring in reliability issues. 981 Even though in this paper we have applied our modeling 982 methodology to a raised-floor air-cooled data center scenario, 983 the proposed technique is also valid for data centers equipped 984 with other state-of-the-art cooling mechanisms, such as in-row 985 or in-rack cooling used in high-density racks. 986 7.2. Computational effort 987 Our approach is computationally intensive in the model train- 988 ing stage. The GE model needs to evolve a random initial pop- 989 13 Model Phenotype Train. Test Inlet T IN[k − 1] + e(−4.3·(TRET2[k−3]/TRET2[k−11]))/T IN[k − 1] · (e(+1.5·T IN[k−5]) 0.32 0.4 Rack1 −e(−3.8∗(PDIF[k−20]−T IN[k−1]))) Inlet T IN[k − 1] + 3.1/e(+5.0·T IN[k−1]) · e(+2.2·T IN[k−2]) − e(−2.9·T pIN[k−3]) · TRET3[k − 12]· Rack4 (TRET3[k − 5] + e(−4.9·(HUM[k−6]/TRET2[k−1]))/T IN[k − 2]/e(+7.2·(PDIF[k−20]−T IN[k−1]))) 0.18 0.44 CPU TS [k − 1] + (PS [k − 20] · FS [k − 1] − 9.4 · (T pS [k − 5] · T pS [k − 2]))/(e(+2.0·FS [k−7])/ Rack1, C02, B01 e(−4.1·T pS [k−5])/(2.3 + e(+1.7∗(T pS [k−10]∗T pS [k−10])) + e(+1.5∗FS [k−1]) − e(+1.6∗PS [k−7]))) 0.68 0.76 CPU TS [k − 1] + e(−7.2∗(TS [k−6]/PS [k−1]))/(e(−6.1∗(TS [k−10]−PS [k−15]))/5.6/FS [k − 20] Rack1, C02, B04 −1.7 + T pS [k − 20] − e(−3.0∗(T IN[k−4]/TS [k−15]))) CPU TS [k − 1] + e(−6.2·(TS [k−2]·T IN[k−11]))/((e(−9.8·TS [k−8]) + e(−5.2∗(TS [k−9]∗PS [k−19]))) 0.51 0.85 Rack1, C02, B07 ·e(+5.8∗FS [k−9])) CPU TS [k − 1] + e(−3.1·TS [k−2])/(((TS [k − 3]/PS [k − 3]) + (FS [k − 18] − FS [k − 8])/ 0.55 0.75 Rack4, C02, B01 e(−9.4·(FS [k−1]−T pS [k−9]))) − T pS [k − 4] + (T pS [k − 6] + TS [k − 3]) − (T IN[k − 3]/T pS [k − 9])) 0.29 0.46 CPU Rack4, C02, B04 TS [k − 1] + e(−5.7·(T pS [k−6]∗T pS [k−10]))/e(+9.9·(T pS [k−9]−T IN[k−10])) 0.26 0.73 CPU Rack4, C02, B07 TS [k − 1] + T IN[k − 11] · e(−9.5·(T pS [k−10]/T IN[k−5]))/e(−9.9·(T IN[k−10]−TS [k−5])) 0.43 0.87 Table 3: Phenotype and average error (in Celsius) in training and test set for CPU and inlet temperature modeling in a production Data Center ulation for 30,000 generations to obtain accurate results. In our990 experiments, running 30,000 generations of 4 different models991 in parallel takes 28h in a computer equipped with a QuadCore992 Intel i7 CPU @3.4GHz and 8GB of RAM. This computational993 cost is much larger than the computational cost for training994 ARMA and N4SID models. However, to obtain accurate re-995 sults ARMA needs to be manually tuned, and N4SID requires996 a manual feature selection step that greatly impacts accuracy,997 whereas GE models can be automatically developed.998 However, as the models obtained for homogeneous servers999 are very similar, it is possible to reduce the training overhead1000 by using already evolved populations to fine-tune the models1001 instead of using the a new random population every time. This1002 way, we can reduce the training time significantly.1003 As for the model testing, in the worst case scenario, the1004 model needs to be tested every 10 seconds. The overhead to test1005 one model is found to be negligible. In this sense, it is feasible1006 to compute all temperatures to find the maximum. Moreover,1007 because of the temperature imbalances of servers in the data1008 room we can reduce the amount of models run to those that are1009 limiting the cooling, i.e. the servers with higher CPU tempera-1010 ture values. Overhead incurred by testing is in the same order1011 of magnitude than the overhead of ARMA and N4SID, but pro-1012 vides better results in terms of error.1013 8. Conclusions1014 In this paper we have presented a methodology for the unsu-1015 pervised generation of models to predict on runtime the thermal1016 behavior of production data centers running arbitrary workloads1017 and equipped with heterogeneous servers.1018 Our approach leverages the usage of Grammatical Evolution1019 to automatically generate models of the data room by using real1020 data center traces. Our solution allows to predict the CPU tem-1021 perature and inlet temperature of servers, with an average error1022 below 2°C and 0.5°C respectively. These errors are within the1023 margin obtained by other off-line supervised approaches in the 1024 state-of-the art. Our solution, generates the models in an unsu- 1025 pervised way, is able to work on runtime, is trained and tested in 1026 a real scenario, and does not require the usage of CFD software. 1027 To the best of our knowledge our work is the first to propose 1028 data center temperature forecasting using evolutionary tech- 1029 niques, allowing predictive model generation for runtime op- 1030 timization. 1031 Appendix 1032 In this Appendix we provide further information on the map- 1033 ping process used by our grammar. For a more detailed expla- 1034 nation on the principles of GE, the reader is referred to [32]. A 1035 BNF specification is a set of derivation rules, expressed in the 1036 form: 1037 〈symbol〉 ::= 〈expression〉 Rules are composed of sequences of terminals, which are 1038 items that can appear in the language, and non-terminals, which 1039 can be expanded into one or more terminals and non-terminals. 1040 A grammar is represented by the tuple N,T, P, S , being N the 1041 non-terminal set, T is the terminal set, P the production rules 1042 for the assignment of elements on N and T , and S is a start 1043 symbol that should appear in N. The options within a produc- 1044 tion rule are separated by a “|” symbol. 1045 Grammar 4 represents an example grammar in BNF. The fi- 1046 nal expression consists of elements of the set of terminals T , 1047 which have been combined with the rules of the grammar. 1048 The chromosome is used to map the start symbol onto termi- 1049 nals by reading genes (or codons) of 8 bits to generate a corre- 1050 sponding integer value, from which the options of a production 1051 rule are selected by using the modulus operator: 1052 Rule = Codon Value % Number of Rule Choices (9) 14 Grammar 4 Example of a grammar in BNF designed for sym- bolic regression N = {expr, op, preop, var, num, dig} T = {+, -, *, /, sin, cos, exp, x, y, z, 0, 1, 2, 3, 4, 5, (, ), .} S = {expr} P = {I, II, III, IV, V, VI} (I)〈expr〉 ::= 〈expr〉〈op〉〈expr〉 | 〈preop〉(〈expr〉) | 〈var〉 (II)〈op〉 ::= +|-|*|/ (III)〈preop〉 ::= sin|cos|log (IV)〈var〉 ::= x|y|z (V)〈num〉 ::= 〈dig〉.〈dig〉 | 〈dig〉 (VI)〈dig〉 ::= 0 | 1 | 2 | 3 | 4 | 5 Example: In this example, we explain the mapping process1053 performed in GE to obtain a phenotype (mathematical function)1054 given a genotype (chromosome). Let us suppose we have the1055 BNF grammar provided in Figure 4 and the following 7-gene1056 chromosome:1057 21-64-17-62-38-254-2 According to Figure 4, the start symbol is S = 〈expr〉, hence1058 the decoded expression begins with the non-terminal:1059 S olution = 〈expr〉 Now, we use the first gene of the chromosome (i.e. 21) in1060 rule I of the grammar. The number of choices in that rule is1061 3. Hence, the mapping function is applied: 21 MOD 3 = 0 and1062 the first option is selected 〈expr〉〈op〉〈expr〉. The selected op-1063 tion substitutes the decoded non-terminal, giving the following1064 expression:1065 S olution = 〈expr〉〈op〉〈expr〉 The process continues with the codon 64, used to decode the1066 first non-terminal of the current expression, 〈expr〉. Again, the1067 mapping function is applied to the rule: 64 MOD 3 = 1 and the1068 second option 〈preop〉(〈expr〉) is selected. So far, the current1069 expression is:1070 S olution = 〈prep〉(〈expr〉)〈op〉〈expr〉 The next codons (17, 62, 38, 254 and 2) are decoded in the1071 same way. After codon 2 has been decoded, the expression is:1072 S olution = exp(x) ∗ 〈var〉 At this point, the decoding process has run out of codons, and1073 we need to reuse codons starting from the first one. This tech-1074 nique is known as wrapping and mimics the gene-overlapping1075 phenomenon in many organisms [33]. Applying wrapping, we1076 use gene 21 to decode 〈VAR〉 with rule IV. This result gives the 1077 final expression of the phenotype: 1078 S olution = exp(x) ∗ y Apart from performing parameter identification, in conjunc- 1079 tion with a well-defined fitness function, the evolutionary algo- 1080 rithm is also computing mathematical expressions with the set 1081 of features that best fit the target system. Thus, GE is also defin- 1082 ing the optimal set of features that derive into the most accurate 1083 model. 1084 Adding time dependence: Previously shown grammars allow 1085 us to obtain phenotypes that depend on a certain number of vari- 1086 ables (e.g. x, y, z). We could use the previous method to predict 1087 variables that depend only on the current observation of other 1088 magnitudes, such as server power [34]. 1089 Models created this way can be used to predict magnitudes 1090 without memory and the data used for model creation consists 1091 of samples. Temperature, however, is a magnitude with mem- 1092 ory, i.e. the current temperature depends on past temperature 1093 values. Thus, the data used for model creation need to be a time 1094 series. By properly tuning our grammars, we can add time de- 1095 pendence to the variables in the phenotype, so that past values 1096 can be used to predict the variable a certain number of samples 1097 ahead. 1098 Acknowledgments 1099 Research by Marina Zapater has been partly supported by 1100 a PICATA predoctoral fellowship of the Moncloa Campus of 1101 International Excellence (UCM-UPM). This project has been 1102 partially supported by the Spanish Ministry of Economy and 1103 Competitiveness, under contracts TEC2012-33892, IPT-2012- 1104 1041-430000 and RTC-2014-2717-3. The authors thankfully 1105 acknowledge the computer resources, technical expertise and 1106 assistance provided by the Centro de Supercomputación y Vi- 1107 sualización de Madrid (CeSViMa). 1108 References 1109 [1] J. Kaplan, W. Forrest, N. Kindler, Revolutionizing data center energy ef- 1110 ficiency, Tech. Rep. July, McKinsey & Company (2008). 1111 [2] J. Koomey, Growth in data center electricity use 2005 to 2010, Tech. rep., 1112 Analytics Press, Oakland, CA (2011). 1113 [3] Archana Venkatraman. ComputerWeekly.com, Global cen- 1114 sus shows datacentre power demand grew 63% in 2012, 1115 http://www.computerweekly.com/news/2240164589/Datacentre-power- 1116 demand-grew-63-in-2012-Global-datacentre-census (October 2012). 1117 [4] T. Breen, et al., From chip to cooling tower data center modeling: Part I 1118 influence of server inlet temperature and temperature rise across cabinet, 1119 in: ITherm, 2010, pp. 1–10. 1120 [5] J. K. Matt Stansberry, Uptime institute 2013 data center industry survey, 1121 Tech. rep., Uptime Institute (2013). 1122 [6] N. El-Sayed, et al., Temperature management in data centers: why some 1123 (might) like it hot, in: SIGMETRICS, 2012, pp. 163–174. 1124 [7] J. Brandon, Going green in the data center: Practical steps for your SME 1125 to become more environmentally friendly, Processor (29). 1126 [8] R. Miller, Too hot for humans, but google servers keep humming, 1127 http://www.datacenterknowledge.com/archives/2012/03/23/too-hot-for- 1128 humans-but-google-servers-keep-humming/ (March 2012). 1129 15 [9] ASHRAE Technical Commitee (TC) 9.9, 2011 Thermal Guidelines for1130 Data Processing Environments, Tech. rep., American Society of Heating,1131 Refrigerating and Air-Conditioning Engineers, Inc. (2011).1132 [10] C. Ryan, J. Collins, M. Neill, Grammatical evolution: Evolving programs1133 for an arbitrary language, in: Genetic Programming, Vol. 1391 of Lecture1134 Notes in Computer Science, Springer Berlin Heidelberg, 1998, pp. 83–96.1135 [11] D. Atienza, et al., Reliability-aware design for nanometer-scale devices,1136 in: Proceedings of the 2008 Asia and South Pacific Design Automation1137 Conference, IEEE Computer Society Press, 2008, pp. 549–554.1138 [12] M. Zapater, et al., Leakage-aware cooling management for improving1139 server energy efficiency, IEEE Transactions on Parallel and Distributed1140 Systems (TPDS).1141 [13] R. Miller, Data center cooling set points debated,1142 http://www.datacenterknowledge.com/archives/2007/09/24/ data-center-1143 cooling-set-points-debated/ (September 2007).1144 [14] P. B. Liz Marshall, Using CFD for data center design and analysis, Tech.1145 rep., Applied Math Modeling White Paper (2011).1146 [15] Z. Abbasi, G. Varsamopoulos, S. K. S. Gupta, Thermal aware1147 server provisioning and workload distribution for internet data cen-1148 ters, in: HPDC, ACM, New York, NY, USA, 2010, pp. 130–141.1149 doi:10.1145/1851476.1851493.1150 [16] J. Chen, et al., A high-fidelity temperature distribution forecasting system1151 for data centers, in: Proceedings of the 2012 IEEE 33rd Real-Time Sys-1152 tems Symposium, RTSS ’12, IEEE Computer Society, Washington, DC,1153 USA, 2012, pp. 215–224.1154 [17] Z. Abbasi, M. Jonas, A. Banerjee, S. Gupta, G. Varsamopoulos, Evolu-1155 tionary green computing solutions for distributed cyber physical systems,1156 in: Evolutionary Based Solutions for Green Computing, Springer Berlin1157 Heidelberg, 2013, pp. 1–28.1158 [18] J. Pagán, M. Zapater, O. Cubo, P. Arroba, V. Martı́n, J. M. Moya, A1159 Cyber-Physical approach to combined HW-SW monitoring for improv-1160 ing energy efficiency in data centers, in: Conference on Design of Circuits1161 and Integrated Systems, 2013, pp. 140–145.1162 [19] G. Varsamopoulos, A. Banerjee, S. Gupta, Energy efficiency of thermal-1163 aware job scheduling algorithms under various cooling models, in: Con-1164 temporary Computing, Vol. 40 of Communications in Computer and In-1165 formation Science, 2009, pp. 568–580.1166 [20] J. Moore, J. Chase, P. Ranganathan, Weatherman: Automated, online and1167 predictive thermal mapping and management for data centers, in: IEEE1168 International Conference on Autonomic Computing, ICAC’06, 2006, pp.1169 155–164. doi:10.1109/ICAC.2006.1662394.1170 [21] A. M. D. Silva, F. Noorian, R. I. A. Davis, P. H. W. Leong, A hybrid1171 feature selection and generation algorithm for electricity load prediction1172 using grammatical evolution, in: Proceedings of the 2013 12th Interna-1173 tional Conference on Machine Learning and Applications, ICMLA ’13,1174 IEEE Computer Society, Washington, DC, USA, 2013, pp. 211–217.1175 [22] M. Patterson, The effect of data center temperature on energy efficiency,1176 in: Thermal and Thermomechanical Phenomena in Electronic Systems,1177 ITHERM’08, 2008, pp. 1167 –1174.1178 [23] T. Heath, A. P. Centeno, P. George, L. Ramos, Y. Jaluria, R. Bianchini,1179 Mercury and freon: Temperature emulation and management for server1180 systems, in: ASPLOS, New York, NY, USA, 2006, pp. 106–116.1181 [24] A. Coskun, T. Rosing, K. Gross, Utilizing predictors for efficient thermal1182 management in multiprocessor socs, TCAD 28 (10) (2009) 1503 –1516.1183 [25] E. Vladislavleva, G. Smits, D. den Hertog, Order of nonlinearity as a com-1184 plexity measure for models generated by symbolic regression via pareto1185 genetic programming, IEEE Transactions on Evolutionary Computation1186 13 (2) (2009) 333–349. doi:10.1109/TEVC.2008.926486.1187 [26] M. O’Neill, C. Ryan, Grammatical evolution, IEEE Transactions on Evo-1188 lutionary Computation 5 (4) (2001) 349–358.1189 [27] T. Back, U. Hammel, H.-P. Schwefel, Evolutionary computation: com-1190 ments on the history and current state, IEEE Transactions on Evolutionary1191 Computation 1 (1) (1997) 3–17. doi:10.1109/4235.585888.1192 [28] K. Melikhov, V. M. Kureichick, A. N. Melikhov, V. V. Miagkikh, O. V.1193 Savelev, A. P. Topchy, Some New Features In Genetic Solution Of The1194 Traveling Salesman Problem., in: Adaptive Computing in Engineering1195 Design and Control (ACEDC), 1996.1196 [29] M. Rocha, J. Neves, Preventing premature convergence to local optima1197 in genetic algorithms via random offspring generation, in: International1198 Conference on Industrial and Engineering Applications of Artificial Intel-1199 ligence and Expert Systems, IEA/AIE’99, Secaucus, NJ, USA, 1999, pp.1200 127–136. 1201 [30] SPEC CPU Subcommittee and John L. Henning, SPEC CPU 2006 bench- 1202 mark descriptions, http://www.spec.org/cpu2006/. 1203 [31] A. Phansalkar, A. Joshi, L. K. John, Subsetting the spec cpu2006 bench- 1204 mark suite, SIGARCH Computer Architecture News 35 (1) (2007) 69–76. 1205 [32] C. Ryan, M. O’Neill, Grammatical evolution: A steady state approach., 1206 in: In Late Breaking Papers, Genetic Programming, 1998, pp. 180–185. 1207 [33] E. Hemberg, L. Ho, M. O’Neill, H. Claussen, A comparison of gram- 1208 matical genetic programming grammars for controlling femtocell network 1209 coverage, Genetic Programming and Evolvable Machines 14 (1) (2013) 1210 65–93. doi:10.1007/s10710-012-9171-8. 1211 [34] P. Arroba, J. L. Risco-Martin, M. Zapater, J. M. Moya, J. L. Ayala, En- 1212 hancing regression models for complex systems using evolutionary tech- 1213 niques for feature engineering, Journal of Grid Computing. 1214 16