Floating Wind Turbine Dynamics
Identification

Identificación de la dinámica de una turbina eólica flotante

Juan Tecedor Roa

Bachelor Degree in Software Engineering
FACULTY OF COMPUTER SCIENCE

Supervised by
Matilde Santos Peñas

Carlos Luis Serrano Barreto

MADRID, 2021-2022


Floating Wind Turbine Dynamics
Identification

Identificación de la dinámica de una turbina eólica flotante

Bachellor in Software Engineering Final Project

Juan Tecedor Roa

Supervised by
Matilde Santos Peñas

Carlos Luis Serrano Barreto

Department of Computer Architecture and Automation
Faculty of Computer Science

Universidad Complutense de Madrid

MADRID, 2021-2022


Acknowledgements

To my supervisors, Matilde Santos Peñas and Carlos Luis Serrano Barreto from the
Complutense University of Madrid, for their guiding throughout all the work, and to
Enrique Sierra Garćıa from the University of Burgos for his suggestions and guiding early
in the project.

Without their input and knowledge this work would have been a car without wheels.

To the workers at Biblioteca Complutense for their style guide in the formal aspects
of writing a final bachelor project [14].

To the developers, creators and maintainers of the free-open-source software that
enabled the creation of this work as well as the companies that facilitate educational
personnel the partial or complete use of their licensed tools.

i


Abstract

Climate change is one of the biggest and most worrying problems in the current world.

Renewable energy sources are one of the main tools that will allow the humanity to
fight against it. More precisely, floating wind turbines offer unprecedented amounts of
generated power compared to their onshore or offshore (bottom-fixed) counterparts. The
technology is however, at an early stage of development, with a lot of improvements to
be made and countless fields of study.

This project aims to study the behavior of the scale model of a floating wind turbine
by elaborating several statistical models that can predict some of its most important
statistical metrics. These statistical models are dependent on the wind speed and the
blade pitch angle of the wind turbine. Additionally, a periodicity analysis of the wind
turbine is also made in order to determine if there are frequencies associated with it at
different wind speeds and pitch angles.

In this work, a data preprocessing phase is carried out with the aid of statistics and
graphical representations. Then, two studies are made: a periodicity analysis by several
Fourier Transforms, and multiple regression supervised models. The supervised models
used were: Linear Regression, Polynomial Regression, Ridge Regressor, Huber Regressor,
Gaussian Regressor and a Neural Network (MLP Regressor).

Most of the supervised models were very successful and could be used to create a
virtual model of a wind turbine. The periodicity analysis was also successful and was
consistent with the physical analysis of the wind turbine.

Keywords —Wind turbine, floating, dynamics, identification, regression, neural net-
works, statistical analysis, data preprocessing, data representation, Fast Fourier Trans-
forms.

iii


Resumen

El cambio climático es uno de los problemas más importantes y preocupantes actualmente
en el mundo.

Las enerǵıas renovables son una de las herramientas que tenemos disponibles para
poder luchar contra él. En concreto, las turbinas eólicas flotantes pueden ofrecer una
proporción de enerǵıa eléctrica generada sin precedentes, especialmente si las comparamos
con las turbinas emplazadas en tierra o en el mar a baja profundidad cerca de la costa.
Sin embargo, la tecnoloǵıa de las turbinas eólicas flotantes está en sus comienzos, con
incontables mejoras por implementar y diversos campos de estudio.

Este proyecto estudia el comportamiento de un modelo a escala de una turbina
flotante, mediante la elaboración de varios modelos estad́ısticos que puedan predecir las
métricas estad́ısticas más relevantes de la turbina. Estos modelos estad́ısticos dependen
de la velocidad del viento y del ángulo de ataque de las palas de la turbina. Adicional-
mente, se ha realizado también un análisis de la periodicidad de la turbina de viento para
determinar qué frecuencias están asociadas con ella y a qué velocidades de viento y a qué
ángulos de ataque.

En este trabajo se realiza una primera fase de preprocesado de datos, con ayuda de he-
rramientas estad́ısticas y representaciones gráficas. Posteriormente, se realizan dos estu-
dios: un análisis de la periodicidad mediante varias Transformadas Rápidas de Fourier,
y múltiples modelos supervisados de regresión. Los modelos supervisados fueron los
siguientes: Regresión lineal, Regresión Polinómica, Regresión de Ridge, Regresión de
Huber, Regresión Gaussiana y una red neuronal (regresor MLP).

La mayoŕıa de los modelos supervisados obtuvieron resultados muy satisfactorios y
podŕıan ser usados para crear un modelo virtual de la turbina de viento. El análisis
de periodicidad fue también exitoso y consistente con el análisis f́ısico de la turbina de
viento.

Palabras clave — Turbina eólica, flotante, dinámica, identificación, regresión, redes
neuronales, análisis estad́ıstico, preprocesamiento de datos, representación de datos, FFT.

v


Contents

1 Introduction 1
1.1 Climate change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Main polluters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 How a turbine generator works . . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Parts of a wind turbine . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Physics of power generation . . . . . . . . . . . . . . . . . . . . . 4
1.2.3 Blade pitch angle . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3 Types of wind turbines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Goals and specific objectives . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4.1 Main objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Objectives breakdown . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Relation of the work with the completed bachelor courses . . . . . . . . . 8
1.6 Work plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.7 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.7.1 Text editor software . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7.2 Programming software . . . . . . . . . . . . . . . . . . . . . . . . 10

1.8 Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.9 Structure of the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 State Of The Art 13
2.1 Current floating wind turbines wind farms . . . . . . . . . . . . . . . . . 13
2.2 Evolution of wind turbines power output . . . . . . . . . . . . . . . . . . 13
2.3 Types of platforms for floating wind turbines . . . . . . . . . . . . . . . . 15
2.4 Types of floating wind turbines anchors . . . . . . . . . . . . . . . . . . . 16
2.5 Fields of study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5.1 Overlap between floating wind turbines and offshore turbines . . . 16
2.5.2 Overlap with the oil industry . . . . . . . . . . . . . . . . . . . . 16

2.6 Current floating wind turbines study fields . . . . . . . . . . . . . . . . . 17

3 Materials and Methods 19
3.1 Introduction to the experimental setup . . . . . . . . . . . . . . . . . . . 19
3.2 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Materials: Data from experiments . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.4.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4.2 Machine learning workflow . . . . . . . . . . . . . . . . . . . . . . 25
3.4.3 Areas of machine learning . . . . . . . . . . . . . . . . . . . . . . 26
3.4.4 Feature scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

vii


CONTENTS viii

3.4.5 Supervised learners . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.6 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.7 Evaluation of model performance . . . . . . . . . . . . . . . . . . 30
3.4.8 Ovefitting in Machine Learning . . . . . . . . . . . . . . . . . . . 31
3.4.9 Hyperparameter tuning . . . . . . . . . . . . . . . . . . . . . . . . 31
3.4.10 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.11 Periodicity Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4.12 Data organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4 Statistical Analysis 35
4.1 Data visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Distribution study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2.2 Examples of normal distributions . . . . . . . . . . . . . . . . . . 39
4.2.3 Normality test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.3 Creating new variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 Statistical functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 3D Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.6 Correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5 Periodicity analysis 53
5.1 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Testing the library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.3 Plotting the data set Fast Fourier Transforms . . . . . . . . . . . . . . . 54

5.3.1 DC Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3.2 Comparison of the FFTs at different wind speeds . . . . . . . . . 58
5.3.3 Periodicity analysis results . . . . . . . . . . . . . . . . . . . . . . 65

6 Supervised Models 67
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.3 Models used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4 Scaler selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.5 Hyperparameters tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.6 Models parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

6.7.1 R2 score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.7.2 Overall scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.7.3 Predictions representations . . . . . . . . . . . . . . . . . . . . . . 72
6.7.4 Predictions for inputs outside the experiment . . . . . . . . . . . 85

6.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7 Conclusions and Future works 89
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89


CONTENTS ix

Appendices 92

Appendix A 92

Bibliography 94


List of Figures

1.1 Global Historial CO2 Emissions by Sector [11]. . . . . . . . . . . . . . . . 2
1.2 Renewable Energy Generation [24] [9]. . . . . . . . . . . . . . . . . . . . 3
1.3 Parts of a wind turbine as seen from the back and to the left side. [31]. . 4
1.4 Three types of turbine, with the types labeled. Adapted from [50]. . . . . 6
1.5 Global Wind Speed in January and July year 2001 [30]. . . . . . . . . . 7

2.1 Average rated output of a wind turbine for selected years [21]. . . . . . . 14
2.2 Four types of floating wind turbines. Adapted from [23]. . . . . . . . . . 15

3.1 Front view of the wind turbine with labeled axes. . . . . . . . . . . . . . 20
3.2 Front view of the turbine with labeled axes, simplified. . . . . . . . . . . 21
3.3 Top view of the turbine with labeled axes, simplified. . . . . . . . . . . . 21
3.4 IMU with its axes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.5 Polarity of rotation and orientation of the IMU axes. . . . . . . . . . . . 22
3.6 Neural network example. Illustration made thanks to NN-SVG [3]. . . . . 30
3.7 Overfitted data [18]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8 FFT of a Cosine Summation Function resonating at 10, 20, 30, 40, and 50

Hz. [2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

4.1 2D plots of the data corresponding to a wind speed of 8.5ms-1 and a pitch
angle of 10◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2 2D plots of the data corresponding to a wind speed of 13.8ms-1 and a pitch
angle of 10◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3 Histogram of the data corresponding to a wind speed of 8.5ms-1 and a
pitch angle of 10◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.4 Histogram of the data corresponding to a wind speed of 8.5ms-1 and a
pitch angle of 10◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.5 Histogram of the data corresponding to a wind speed of 13.8ms-1 and a
pitch angle of 30◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

4.6 Histogram of the data corresponding to a wind speed of 13.8ms-1 and a
pitch angle of 30◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.7 Examples of a normal distributions with different means and variances [26]. 40
4.8 Median of the magnetometer measurements in the z axis. . . . . . . . . . 43
4.9 Standard deviation of the acceleration in the z axis. . . . . . . . . . . . . 43
4.10 Standard deviation of the angular velocity in the y axis. . . . . . . . . . 44
4.11 Median of the magnetometer measurements in the z axis. . . . . . . . . . 44
4.12 Median absolute value of the magnetometer in the z axis. . . . . . . . . . 45
4.13 Median value of the modulus of the magnetometer vector in the z axis. . 46

x


LIST OF FIGURES xi

4.14 Median absolute value of the measurements of the magnetometer in the y
axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.15 Median angular velocity measurements in the z axis. . . . . . . . . . . . 47
4.16 Average modulus of the magnetometer vector. . . . . . . . . . . . . . . . 49
4.17 Average modulus of the magnetometer vector. . . . . . . . . . . . . . . . 50

5.1 FFT Test with a sin function at 2Hz, with a clear peak at x = 2. . . . . 54
5.2 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. . . . . 55
5.3 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. DC

component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. . . . . 57
5.5 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. DC

component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.6 Common naming of an aircraft in flight principal axes [7]. . . . . . . . . . 58
5.7 Front view of the turbine with labeled axes, simplified. . . . . . . . . . . 59
5.8 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of z. DC

component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.9 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of z. DC

component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.10 FFT with wind speed of 11.6 and an angle of 30◦, acceleration of z. DC

component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.11 FFT with wind speed of 13.8 and an angle of 30◦, acceleration of z. DC

component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.12 FFT with wind speed of 8.5 and an angle of 30◦, angular velocity on x.

DC component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.13 FFT with wind speed of 10.1 and an angle of 30◦, angular velocity on x.

DC component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.14 FFT with wind speed of 11.6 and an angle of 30◦, angular velocity on x.

DC component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.15 FFT with wind speed of 13.8 and an angle of 30◦, angular velocity on x.

DC component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . 64

6.1 Representation of the (2,4,2) Neural Network used. . . . . . . . . . . . . 70
6.2 Predictions of the Gaussian model of the median magnetometer values in

the z axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Predictions of the Linear model of the length of the magnetometer vector. 74
6.4 Predictions of the Polynomial model of the length of the magnetometer

vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.5 Predictions of the Huber model of the standard deviation of the gyroscope

in the y axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.6 Predictions of the Ridge model of the standard deviation of the gyroscope

in the y axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
6.7 Predictions of the MLPR model of the range of the gyroscope in the z axis. 78
6.8 Predictions of the Polynomial model of variance of the gyroscope in the y

axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.9 Predictions of the Linear model of the mean of the acceleration in the z axis. 80
6.10 Predictions of the Polynomial model of the mean of the acceleration in the

z axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.11 Predictions of the Ridge model of the mean of the acceleration in the z axis. 82


LIST OF FIGURES xii

6.12 Predictions of the Huber model of the mean of the length of the acceleration
vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.13 Predictions of the MLPR model of the standard deviation of the length of
the acceleration vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84


List of Tables

1.1 Work plan table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Official deadlines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Main project tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Python packages used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3.1 Equivalence between voltages and wind speeds. . . . . . . . . . . . . . . 19
3.2 Sample of data at 130V (8.5ms-1) and 1◦. . . . . . . . . . . . . . . . . . . 23
3.3 Sample of data at 200V (13.8ms-1) and 1◦. . . . . . . . . . . . . . . . . . 24
3.4 Data at a wind speed of 8.5ms−1 and 1◦pitch angle. . . . . . . . . . . . . 24
3.5 Data at a wind speed of 13.8ms−1 and 1◦pitch angle. . . . . . . . . . . . 24
3.6 Grouped data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.1 Data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2 Data set columns with length vectors. . . . . . . . . . . . . . . . . . . . . 41
4.3 Statistical functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Grouped data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Grouped data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . 48

5.1 Data statistics at a wind speed of 8.5ms−1 and 30◦pitch angle. . . . . . . 54
5.2 Data statistics at a wind speed of 8.5ms−1 and 30◦pitch angle. . . . . . . 65

6.1 Data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
6.2 Model scores breakdown. Run 1. . . . . . . . . . . . . . . . . . . . . . . 71
6.3 Model scores breakdown. Run 2. . . . . . . . . . . . . . . . . . . . . . . 71
6.4 Model scores breakdown. Run 3. . . . . . . . . . . . . . . . . . . . . . . 71
6.5 Model scores breakdown. Run 4. . . . . . . . . . . . . . . . . . . . . . . 72
6.6 Predictions for inputs outside the experiment, only models with scores over

0.90. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

xiii


Chapter 1

Introduction

1.1 Climate change

Climate change is one of the most important and worrying problems in the current world.
Labeled as ”the biggest health threat facing humanity” by the World Health Organization
[33], with expectations of 250 000 additional deaths per year between 2030 and 2050 due
to climate change alone. The causes of this death toll are from: malnutrition, malaria,
diarrhea and heat stress. Direct damage costs to health are estimated to reach between
USD 2-4 billion per year by 2030 [32].

The main contributor to climate change is air pollution, and more concretely CO2

emissions. Therefore, we must push for reducing air pollution drastically.

1.1.1 Main polluters

In order to reduce air pollution we must first identify which elements contribute the most
to CO2 emissions.

In the following graph we analyze the main polluters:

1


CHAPTER 1. INTRODUCTION 2

Figure 1.1: Global Historial CO2 Emissions by Sector [11].

We can see that three quarters of the CO2 emissions are due to energy generation.
Electricity generation is, therefore, the first polluter, and it is where most of the work
must be done.

On the electricity generation mix, the main non-renewable sources represent the fol-
lowing percentages: oil (31.2%), coal (27.2%) and gas (24.7%). The rest of the generation
is made by renewables (5.7%), hydroelectric (6.9%) and other low-carbon generators such
as nuclear (4.3%) [9].

The room for growth in renewable sources is therefore large. It must replace the main
polluters such as coal, oil and gas.

The following graph illustrates the accelerating growth in wind as a renewable energy
source in the recent years:


CHAPTER 1. INTRODUCTION 3

Figure 1.2: Renewable Energy Generation [24] [9].

In conclusion, wind as a renewable energy source is fundamental to reduce the global
CO2 emissions. It is also currently in a period of rapid adoption. Wind technology
is, however, still evolving technologically and there are still problems to be solved and
optimizations to be made.

1.2 How a turbine generator works

Wind power generation is made by the use of generators inside the wind turbines. Tur-
bines extract the kinetic energy of a fluid (wind) that is passing through its blades (in
this case it is wind) and converts it into a rotatory motion. This mechanical motion is fed
into an electrical generator, that converts the rotatory mechanical energy into electrical
energy (normally alternating voltage (AC)).

1.2.1 Parts of a wind turbine

To further explain how a turbine works we present a diagram:


CHAPTER 1. INTRODUCTION 4

Figure 1.3: Parts of a wind turbine as seen from the back and to the left side. [31].

From figure 1.3 we must first focus on the three main components: the tower (the col-
umn that supports the nacelle), the nacelle (streamlined body that houses the mechanical
and electrical components) and the blades.

Further decomposition of components can be made. We highlight the generator and
the shafts that connect the rotor to the generator. Finally, note the pitch arrows, indi-
cating that the angle at which the turbine blades encounter the wind can be changed.

1.2.2 Physics of power generation

The amount of energy generated by a wind turbine depends on several factors [20].
First, we take into account the kinetic energy of the wind:

Ek =
1

2
mv2 (1.1)

Equation 1.1 is the standard kinetic energy equation. Knowing that ρ (density) is
expressed in [kgm−3], velocity in [ms−1], time in [s] and A area in [m2] we can substitute
m [kg] in the previous equation:

m = (
kg

m3
)(
m

s
)(s)(m2) = ρvtA = kg (1.2)

E =
1

2
mv2 =

1

2
(Avtρ)v2 =

1

2
Avtρv2 =

1

2
Atρv3 (1.3)

Power is expressed in [W ] (watts) which is the same as [Et−1] (Joules per second).
From equation 1.3:

P =
E

t
=

1

2
Aρv3 (1.4)


CHAPTER 1. INTRODUCTION 5

From equation 1.4 we conclude that wind power is proportionally dependent on the
area swept by the blades of the turbine, density of the air and wind speed (last one is
cubed, therefore much more important).

However, it must be noted that equation 1.4 is a simplification for ideal conditions,
where a turbine can extract all the energy from the wind. In real life, power generated
will be much lower [20]. Nonetheless, equation 1.4 is still useful as it introduces us to the
concepts and variables involved.

In conclusion, in order to maximize the power output from a turbine we have to
take into account the previous variables. From equation 1.4 the A term, the area, can
be changed reasonably, building larger turbines with longer blades, that sweep a bigger
area. The second term, p, the air density, can hardly be changed, as it depends in several
factors such as temperature, humidity or pressure. The last and most important term v
is cubed, we therefore have to carefully consider where to build the wind turbines (each
geographical location will have an average wind speed). The turbine should be built in
areas with high wind speeds.

1.2.3 Blade pitch angle

In the previous section we concluded that wind speed was the determinant factor in how
much power was generated. There is another factor to consider, and that is the blade
pitch angle.

Pitch controlled turbines allow the controller to change the angle at which the blade
contacts the wind, instead of having a fixed, determined angle. A 0◦pitch is considered to
have the blade is parallel to the wind, with no rotatory motion induced. Positive angles
increase this pitch from the 0◦reference, and negative values indicate the same motion
but in the opposite direction, resulting in a contrary rotation compared to the positive
values.

The range of rotation of the blades in a wind turbine is usually [0, 90] degrees (from
parallel to the wind to perpendicular to it).

Pitch control can therefore be used to control the power generated by the turbine and
to protect it by keeping its rotor speed in a controllable limit as well as optimizing the
energy generation [28].

1.3 Types of wind turbines

There are three types of turbines depending on their physical location: onshore (placed
inland), offshore (installed in a platform with a solid foundation that is fixed to the sea
bed) and floating (a floating platform that is prevented from drifting by the use of cables
and anchoring devices). Figure 1.4 illustrates this:


CHAPTER 1. INTRODUCTION 6

Figure 1.4: Three types of turbine, with the types labeled. Adapted from [50].

This third type of turbine is specially important at depths beyond 60m, where the
feasibility of a platform that runs down until the sea bed is no longer economically viable
or feasible [19]. These turbines are normally placed much farther from the coast than
their offshore counterparts.

However, this geographical location poses several problems that increase cost and the
difficulty of deployment. Some of these problems are:

• Harsher weather compared to inland or offshore counterparts (with some remarks
in the advantages section).

• Unstable, material fatigue-prone behavior due to constant oscillation and vibra-
tions.

• Greatly increased maintenance costs due to the difficulty of reaching the platform
by the maintenance personnel and its non-static behavior.

• Power lines have to be resistant to the underwater environment, resulting in more
expenses.

What are then the motivations for using offshore technology?

• Higher wind speeds compared to onshore installations, leading to increased power
generation.


CHAPTER 1. INTRODUCTION 7

Figure 1.5: Global Wind Speed in January and July year 2001 [30].

From figure 1.5 note that the light shaded areas are located in the sea and in some
situations close to the coast, but very rarely inland.

As a general rule, average wind speed increases with distance from the coast [23].

• Large areas without obstructions or restrictions on where to build the turbine,
allowing large projects. This property is known as fetch [23].

• More stable and predictable wind, with reduced turbulence and wind-shear [19] [8].

1.4 Goals and specific objectives

1.4.1 Main objective

The main objective of this work is to create a virtual model of a floating wind turbine from
the experimental data. The model should be able to reproduce the behavior of the turbine,
outputting variables with values that are identical to the real model. Additionally, it will
extrapolate predictions for values that are not present in the data, that can be later on
verified experimentally.

This virtual model would allow us to, up to a certain degree, know how a turbine will
behave at different wind conditions or blade pitch angle configurations, without actually
needing to measure, build or be physically near a real wind turbine.

More precisely, the virtual models studied in this work are dependent on the wind
speed and on the pitch angle of the blades of the turbine. Given these two variables, the
models output a metric of the wind turbine such as median acceleration.


CHAPTER 1. INTRODUCTION 8

Another approach in this work is made by studying the data and performing a peri-
odicity analysis. This method further compliments the virtual model by detecting if the
turbine is vibrating at a certain frequency.

1.4.2 Objectives breakdown

This work aims to identify the behavior of a floating wind turbine, given a data set of
experiments made on a scale model of a floating wind turbine. More precisely:

• Preprocess the data automatically. Select and extract the correct variables that are
informative for the models.

• Perform statistical analysis in the data in order to know which type of distribution
is the data the closest to. Using this information, select the correct algorithms that
are designed for the type of data.

• Visualize the data. Plot the necessary graphs that allow an easy visualization of
the data.

• Using machine learning and other techniques (supervised models), identify the sys-
tem, i.e.: find the best function (linear, plane, curve, etc) that best fits the data.
For a given set of inputs: wind speed and pitch angle.

• Study if there is a periodic behavior in the turbine. Find out if it is vibrating or not
and at what frequencies if it is. If it is vibrating, find out if the frequency changes
with different wind speeds.

• Extrapolate the findings of the models into new wind speeds, pitch angles and
parameters, verify if the trained models successfully predict different situations and
evaluate the results.

1.5 Relation of the work with the completed bache-

lor courses

The following list details the relation of a part of the work with the courses taken through-
out the bachelor:

• How a turbine works. Fundamentals of electricity and electronics helped with
the basic electrical knowledge and in understanding how power is generated elec-
tromagnetically by the generator.

• Data preprocessing and supervised models. The most important courses to
understand how to preprocess the data and understand the models: Cloud and Big
Data, Machine Learning and Big Data and Linear algebra.

• Statistical analysis. Applied statistics provided the foundation for the statistical
analysis section.

• Periodicity analysis. Calculus allowed some understanding of how a Fast Fourier
Transform works.

Additionally, all the subjects related to programming and data structures were fun-
damental.


CHAPTER 1. INTRODUCTION 9

1.6 Work plan

The following tables detail the work plan and the official deadlines. Dates are in YYYY-
MM-DD format (ISO 8601):

Table 1.1: Work plan table.
Task Start date End date Duration [days]

Project 2021-10-07 2022-07-06 272
Data preparation, filtering and processing* 2021-10-10 2022-02-07 120
Make LATEXtemplate, write introduction 2022-02-08 2022-02-22 14
Present preliminary results, discuss them 2022-02-22 2022-02-25 3

Periodicity analysis 2022-02-26 2022-03-10 12
Investigate and program prediction models 2022-03-10 2022-03-25 15

Present results, discuss them 2022-03-25 2022-03-26 1
Model refinement, tuning and discussion 2022-03-26 2022-03-30 5

Code refactor 2022-03-30 2022-03-31 1
Work on data representation** 2022-04-01 2022-04-15 14

Work on writing the main document for draft 2022-04-16 2022-04-25 9
Code refactor 2022-04-25 2022-05-02 7

Prepare for draft, resolve erratas 2022-05-02 2022-05-16 14
Submit draft 2022-05-17 2022-05-17 1

Improvements after draft feedback 2022-05-18 2022-05-29 11
Final submission 2022-05-30 2022-05-30 1

Prepare presentation 2022-05-30 2022-06-05 7
Presentations 2022-06-06 2022-06-09 4

*Low-amount of work expected to be done because of an overload of ECTS credits
amount.

**Downtime contemplated due to vacation period.

Table 1.2: Official deadlines.
Task Date

Draft submission 2022-05-17
Publication of accepted/rejected submissions 2022-05-25

Final submission 2022-05-30
Public presentation of the projects 2022-06-06

Tables 1.1 and 1.2 offer a detailed overview of the initially planed work plan. A more
simplified overview is shown below. The main parts of the project were the following:


CHAPTER 1. INTRODUCTION 10

Table 1.3: Main project tasks.
Manual data inspection
Work plan elaboration
Data preprocessing
Data representation
Statistical analysis
Supervised models
Periodicity study
Write report
Refinements
Presentation

1.7 Software

1.7.1 Text editor software

This document is written in LATEXusing the TeXstudio 4.2.2 editor.

1.7.2 Programming software

All of the work was done in Python version 3.8.10 inside the PyCharm 2022.1 (Community
Edition) IDE (Integrated Development Environment).

PyCharm was chosen because it enables the common debugging features (breakpoints,
watches, etc), DataFrame visualization with just one click and it is free to use.

Python was selected for its numerous packages and the ease of use it provides for data
analysis. A creation of a virtual environment is highly recommended 1.

The following table details the Python packages used and their versions:

Table 1.4: Python packages used.
Package Version Motivation or description
numpy 1.22.3 Ease of use of arrays and matrices, among other mathematical and statistical functions such as average, median, variance, etc.
pandas 1.4.2 Data analysis and manipulation. Used to load, store and manipulate the data throughout the work.

matplotlib 3.5.1 Graphs, graphical data representation.
scipy 1.8.0 Fast Fourier Transforms.

scikit-learn 1.0.2 Supervised Models
seaborn 0.11.2 Correlation matrices.
xlrd 2.0.1 Reading Excel files.

1.8 Repository

The code used to process all the data is publicly and freely available on my GitHub
repository:

https://github.com/JuanTecedor/FloatingWindTurbinesDynamicsIdentificationPublic.

1See Appendix A.

https://github.com/JuanTecedor/FloatingWindTurbinesDynamicsIdentificationPublic


CHAPTER 1. INTRODUCTION 11

1.9 Structure of the project

The following list details the contents of each chapter:

• Chapter 1. Introduction. Offers an introduction to wind turbines and what
floating wind turbines are, how they generate power, what affects the power gen-
eration, what can they modify to alter the power generation and what types of
turbines there are. The goals, work plan, repository and this structure of the
project are also detailed.

• Chapter 2. State of the Art. An overview of the current technology related to
floating wind turbines is described. A breakdown of multiple fields of study related
to floating wind turbines is also offered.

• Chapter 3. Materials and Methods. This chapter introduces us to the ex-
periment performed and the sources of the data. The different techniques used
throughout the work are also discussed.

• Chapter 4. Statistical Analysis. The processing of the data is started in this
chapter, with the aid of several plots in order to study the type of data and its
distribution by the use of statistical methods.

• Chapter 5. Periodicity Analysis. The frequency analysis of the turbine metrics
is detailed with several plots in this chapter along with the conclusions reached.

• Chapter 6. Supervised Models. This is the most important chapter of the
work. In this part, the supervised models are presented along with the reasons
for their selection and tuning. The results and predictions from these models are
shown also.

• Chapter 7. Conclusions and Future work. This chapter concludes the work
with an overview of the conclusions reached throughout the chapters and offers
future extensions of the project in the Future work section.

Additionally, an appendix is present:

• Appendix A. Describes the importance of using a virtual environments and offers
resources on how to create them. This is fundamental to reproduce and run the
code that does the data processing, model training, Fourier analysis, etc.

Finally, a bibliography with all the citations and references is located at the last pages
of the work.


Chapter 2

State Of The Art

2.1 Current floating wind turbines wind farms

As we learned in the introduction, wind speed is stronger in the ocean than on land [23].
Floating wind turbines are, by themselves, an evolution of a offshore turbine, allowing

the installation of turbines on deeper locations. We may ask ourselves, what types of
offshore wind turbine farms are there?

The current and projected wind farms average capacity is approximately 25MW,
with some projected farms with as little as 1MW and up to 88MW [15]. To put this
into perspective, the Spain’s peak electricity consumption is approximately in the 30GW
range [51], equivalent to 1200 25MW wind farms or 15000 2MW wind turbines.

As we can see, the power generated by a turbine is quite adequate but proportionally
low compared to other sources. If we were to use only wind turbines, we would need
thousands of them, with the footprint and economical costs associated with it. This one
of the factors as to why the efficiency of a wind turbine is such a big focus.

2.2 Evolution of wind turbines power output

From the previous section we made a prediction assuming the average wind turbine has
a power output of 2MW. We may ask ourselves if the median turbine has increased in
power throughout the decades.

The following figure illustrates the average rated output of a wind turbine for a given
year:

13


CHAPTER 2. STATE OF THE ART 14

Figure 2.1: Average rated output of a wind turbine for selected years [21].

From figure 2.1 we can clearly see the jump in rated output from the years 1990 to
2000, multiplying the output by 33 times. Later on, between the years 2000 and 2010,
the output multiplied by 1.2 times and between the years 2010 and 2016 by 1.42 times,
which is still a very good increase. The total increase in power from 1990 to 2016 is 56.96
times (2848kW / 50kW).

The trend in average rated output from the year 2000 onward looks faster than linear
but it will very likely slow down due to manufacturing limitations, transport logistics
(after manufacture) and practical constraints. We can also conclude that modern turbines
are more efficient and larger.

These improvements in power are not obviously free, as the height of the turbine and
the radius of its rotors is also increasing. Nonetheless, the most recent, larger turbines
are more efficient.

This graphic further verifies the equation 1.4 (from the introduction), where one of the
determining factors in the power output of a turbine is the area and thus by increasing
area swept by the blades, we increase the power outputted.


CHAPTER 2. STATE OF THE ART 15

2.3 Types of platforms for floating wind turbines

One of the most influential components on the turbine dynamics is the platform that
supports it. Different designs trade off stability for cost or ease of manufacture. The
following subsection gives an overview of the main types.

There are at least four types of platforms that can support a turbine [23]:

Figure 2.2: Four types of floating wind turbines. Adapted from [23].

• Barge This design maximizes the surface area in contact with the water. The
platform is wider than its height, which increases stability. This type resembles the
design philosophy of a ship and results in a low amount of draft 1.

• Semi-Submersible This model minimizes the surface area that touches the water
by maximizing volume. Due to manufacturing limitations, a sphere that would be
the ideal shape, is not feasible. The platform is composed then by a set of cylinders
or simpler shapes, placed a certain distance apart. The separation between these
buoyant elements provides stability.

• Spar This concept is designed around densities. A large mass is placed at the
deepest part, while a hollow structure completes the platform up to the top. This
density distribution makes the model float straight. This design has the most draft
compared with the rest. Buoyancy is provided by the lightest materials while the
dense mass at the bottom gives stability and prevents rotatory motions.

• Tensioned Legs Platform (TLP) This structure is the most complex of the four
and the newest development. The platform has excess buoyancy and has to be kept
under the waterline by a set of cables that are connected to a set of weights at
the seabed and the arms of the platform at the other end. The distribution and
separation of the extremes of these arms stabilize the platform [10]. This design
seeks to minimize manufacturing costs.

1The draft of a ship is the vertical distance between the waterline and the deepest point of the hull
of the ship (the keel).


CHAPTER 2. STATE OF THE ART 16

2.4 Types of floating wind turbines anchors

Another component that influences the stability of the turbine and its behavior is the
anchoring system. There are several studies being made on the types of anchors for the
floating turbines, as well as their placement [29] or a even a shared anchor concept in
order to reduce cost [16].

Depending on the seabed, there are at least four types of anchoring systems. The
following section offers an overview of the state of the art systems currently [23]:

• Dragging anchors This type is similar to a boat anchor, supporting tension in
only one direction.

• Suction buckets Only appropriate for sandy seabeds. The counter force is enabled
by suctioning forces that keep the bucket in place.

• Drilled piles Same principle than in fixed foundations. Large metal cilinders are
forced into the seabed or drilled in the case of harder bottoms suck as rocks.

• Gravity anchors A simple, massive and heavy mass (such as a concrete block)
is deposited on the seabed. This type of anchor has a very large footprint but the
lowest complexity in terms of installation procedures.

2.5 Fields of study

2.5.1 Overlap between floating wind turbines and offshore tur-
bines

As we explained in the introduction, floating turbines allow the placement of generators in
places where constructing a foundation is no longer economically feasible. In the process,
floating wind turbines additionally enable stronger winds thanks to the distance from the
coast.

However, thanks to recent developments, floating wind turbines can be placed in
shallow waters. This is specially useful in shallow seabeds that allow the placement
of a fixed-foundation turbine but prohibit it because of the type of seabed [23]. This
advancement blurs the line between the usual depths were each turbine is normally placed,
but allows more options and flexibility.

2.5.2 Overlap with the oil industry

Before floating wind turbines designs were adopted, the oil industry had already devel-
oped floating platforms that enabled them to extract oil at deeper seabeds. Not all the
knowledge and designs can be transferred. This is due to the fact that the economical
feasibility approach is very different for a single oil platform than for dozens of turbines
in a wind farm.

Nonetheless, when studying floating wind turbines, prior knowledge from the oil in-
dustry can be extracted, specially in anchoring and platform studies [23] [10]. The oil


CHAPTER 2. STATE OF THE ART 17

industry can therefore also be considered in some aspects state of the art in relation to
platform and anchoring technology.

2.6 Current floating wind turbines study fields

Several studies have been made about the interactions of the turbine with the medium.
These are the main fields of study:

• Platform design As we learned in previous sections, several types of platform
are available and are being studied and have a significant impact on the turbine
stability and thus, affect cost and efficiency. It is also imperative in these designs
that the turbine does not flip or fall, i.e: it is safe and stable enough.

• Mooring design (anchoring also in study) This field solves problems related
to fixing the turbine anchors to the seabed. As we learned previously, the chosen
design is heavily dependent on the seabed type.

• Turbine control, strategies planning In order to achieve maximum power gen-
eration and safe operation, some parameters of the turbine can be tweaked, such
as the pitch angle. Studies in this area optimize these parameters.

• Aerodynamic modeling This area studies the aerodynamic properties of the
turbine, with a special focus on the shape of the blades. An efficient blade design,
that extracts the most power is key to the wind generation.

• Hydrodynamic modeling This area studies the influence of the hydrodynamic
factors in the floating turbines, mainly the water.

• Wind and wave modeling, forecasting This field focuses on the prediction of
the wind and waves. With the information and models extracted from the studies
we can find the ideal geographical spots for the turbines.


Chapter 3

Materials and Methods

3.1 Introduction to the experimental setup

The experimental data was measured using a scale model of the wind turbine that will
be described shortly. The turbine was subjected to different wind speeds at various blade
pitch angles. The wind speed was controlled by regulating the voltage in a wind generator
and the blade pitch was changed by adjusting its own mechanism. The distance from
the wind generator to the turbine was not changed during experiments in order to have
consistent measurements.

The experimental data was stored with the wind speed in relation to voltage. As
voltage alone is not even a measure of power (watts could be much more informative
for example) a transformation to a more useful metric is needed. Thankfully, the data
included an equivalence.

The following table shows the equivalence between voltages and wind speeds 1:

Table 3.1: Equivalence between voltages and wind speeds.
Voltage [V] Wind speed [ms-1]

130 8,5
140 9,3
150 10,1
160 11
170 11,6
180 12,4
190 13,1
200 13,8

The values from table 3.1 will be used later on to convert the voltages of the generator
into wind speed at the turbine.

1Interestingly enough, the relation between voltage and wind speed is almost linear: wind = 0.07571×
voltage− 1.34285. In any case, it is more precise to use wind speed.

19


CHAPTER 3. MATERIALS AND METHODS 20

3.2 Experiment setup

Instead of having the turbine floating on a platform, a set of springs was placed around
the square perimeter of the platform to support it and imitate a turbine floating on the
water. This configuration should provide a similar behavior to a real floating platform.

The turbine was placed inside a wind tunnel that had the set of winds from table 3.1
calibrated. The following figure illustrates the wind turbine setup that was inside the
wind tunnel:

Figure 3.1: Front view of the wind turbine with labeled axes. All axes are perpendicular.
The Z axis is perpendicular to the ground plane, the Y axis is parallel to the axis of the
turbine, pointing towards the back, and the X axis is pointing right, perpendicular to the
Y and Z axes. Note some of the springs visible at the right of the photograph.

On figure 3.1, a set of springs is visible at the right of the photograph, with blue
circles along the contact points on the platform.

On the left, next to the turbine, the Arduino micro-controller (a blue PCB2 with a
black square chip on it and a USB type B port) can be seen. The Arduino is responsible
of registering the measurements from the IMU 3 and sending them to a computer.

Further to the left and to the back, with a green cable connecting to it, almost at the
edge of the platform, a second blue PCB can partially be seen. This blue PCB is the
IMU. The IMU is responsible of measuring acceleration, angular velocity and magnetic
flux density.

Note that the choice of all the axes representations and configurations is the same
throughout all the work. This configuration is also the exact same than the one given by

2Printed Circuit Board.
3Inertial Measurement Unit, a type of sensor. Explained later more in depth.


CHAPTER 3. MATERIALS AND METHODS 21

the sensor documentation and by the experiment setup. I.e.: The axes are labeled in such
a way that they are in the same configuration that as sensor uses for its measurements.

A simplified diagram of the turbine configuration is shown in figures 3.2 and 3.3:

Figure 3.2: Front view of the turbine with labeled axes, simplified. Note the springs on
the bottom of the platform.

Figure 3.3: Top view of the turbine with labeled axes, simplified.


CHAPTER 3. MATERIALS AND METHODS 22

The IMU is the sensor responsible of the measurements. It will later be detailed.
For now, just know that it measures metrics in relations to a direction, and therefore we
need axes. Two diagrams of IMU axes and its polarities are also shown, for illustration
purposes and axis labeling:

Figure 3.4: IMU with its axes.

Figure 3.5: Polarity of rotation and orientation of the IMU axes. Adapted from [25].


CHAPTER 3. MATERIALS AND METHODS 23

Again, the turbine axes, diagram axes and IMU axes are all the same throughout the
work. This on purpose to avoid confusion.

Wind turbine components

• Microcontroller: Arduino Mega.

• Inertial Measurement Unit: MPU6050.

• Electric motor: Brushless motor 22M-1000 GPMG4500.

• Wind turbine blade dimensions: 10×2 centimeters.

• Pitch angle range of the turbine blades: -30 to 30 degrees.

• Platform dimensions: 16×24 centimeters.

3.3 Materials: Data from experiments

A total of 48 simulations were carried out, each of which lasting 10 seconds. The 48
experiments correspond to the following combination of voltages: 130, 140, 150, 160, 170,
180, 190 and 200 [V] and the following blade pitch angles: -30, -20, -10, 10, 20 and 30
[degrees].

Each experiment lasted 10 seconds. Instantaneous measurements were taken at a
frequency of 10Hz or every 100ms, for a total of 99 measurements. Each of these mea-
surements consisted of a measurement of the acceleration [g]4, angular velocity [◦s−1] and
magnetic flux density [µT ]5 for each axis (x, y, z).

In total, 8 voltages (or wind speeds) were used, 6 pitch angles and 99 measurements
were made, totaling 99 ∗ 8 ∗ 6 = 4752 rows of data. This, in combination with the three
measurements made (acceleration, angular velocity and magnetic flux density), for every
axis, equates to 4752 ∗ 3 ∗ 3 = 42768 samples.

The following tables illustrate how the data looks like before any preprocessing, after
it is loaded. Acc x is the acceleration for the x axis, Gyro y is the angular velocity for the
y axis and Mag z is the magnetic flux density for the z axis, finally Time (0.1s) is the
timestamp at which the row was stored (in tenths of a second). Two tables are shown,
both have the same pitch angle but they have different wind speeds.

Table 3.2: Sample of data at 130V (8.5ms-1) and 1◦.
Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ]

1 -0,001062012 0,000982666 0,978607178 -0,002361081 -0,010571808 -0,010266725 -26,67796875 -6,1425 -9,7734375
2 -0,010552979 -0,00043335 1,0246521 0,041690331 0,010730982 -0,008038288 -26,16773438 -5,338125 -9,7734375
...

...
...

...
...

...
...

...
...

...
99 -0,00824585 -0,018615723 1,048773193 0,004443608 -0,017681582 -0,026210657 -26,31351563 -5,630625 -9,7734375

41g = 9.8ms−2

5The data sheet from [25] did not specify units for the magnetometer. However, as the Earth’s
magnetic field ranges from 25 to 65 µT and the data from the experiment was in this range, we will
consider it to be in µT .


CHAPTER 3. MATERIALS AND METHODS 24

Table 3.3: Sample of data at 200V (13.8ms-1) and 1◦.
Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ]

1 -0.062823 0.010199 1.154712 0.018504 -0.092069 -0.017973 -25.803281 -5.630625 -8.085938
2 0.322821 0.267554 1.694971 0.024261 0.116064 0.197668 -26.386406 -4.680000 -7.875000
...

...
...

...
...

...
...

...
...

...
99 -0.085931 -0.320795 0.890576 -0.100731 0.017615 -0.048097 -25.220156 -5.265000 -7.382812

Tables 3.4 and 3.5 show some statistical metrics from pandas.describe() function in
order to visualize the ranges, maximums, minimums, etc of the data6:

Table 3.4: Data at a wind speed of 8.5ms−1 and 1◦pitch angle.
Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ] len Acc len Gyro len Magn

count 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000
mean 50.000000 0.001532 -0.025455 1.031984 -0.023991 -0.000034 0.002218 -26.532924 -5.578182 -9.600142 1.033129 0.089261 28.766090
std 28.722813 0.018780 0.037758 0.045014 0.075233 0.026225 0.055913 0.268019 0.355799 0.307522 0.045542 0.044851 0.278839
min 1.000000 -0.048370 -0.131812 0.938586 -0.185862 -0.062211 -0.144265 -27.188203 -6.435000 -10.546875 0.940113 0.014925 28.027332
25% 25.500000 -0.009393 -0.048160 1.003128 -0.075707 -0.016375 -0.030296 -26.750859 -5.850000 -9.773438 1.003812 0.055962 28.603734
50% 50.000000 0.001868 -0.020959 1.034302 -0.018928 -0.001645 -0.004550 -26.532187 -5.557500 -9.562500 1.035235 0.086649 28.745159
75% 74.500000 0.014850 0.000546 1.060159 0.026682 0.016899 0.039144 -26.386406 -5.338125 -9.421875 1.062211 0.110022 28.952389
max 99.000000 0.044537 0.061096 1.144220 0.127100 0.063192 0.124381 -25.657500 -3.948750 -8.789062 1.151195 0.226626 29.424713

Table 3.5: Data at a wind speed of 13.8ms−1 and 1◦pitch angle.
Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ] len Acc len Gyro len Magn

count 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000
mean 50.000000 0.027833 -0.030619 1.100040 -0.041565 0.004803 0.001589 -25.751742 -5.556761 -8.289062 1.113743 0.159670 27.623939
std 28.722813 0.093319 0.140280 0.263003 0.105575 0.062644 0.114191 0.316227 0.396620 0.426263 0.261994 0.064237 0.303083
min 1.000000 -0.203320 -0.346960 0.684369 -0.306463 -0.131929 -0.215256 -26.459297 -7.458750 -10.195312 0.698817 0.038939 26.800792
25% 25.500000 -0.031522 -0.118387 0.926773 -0.125084 -0.040072 -0.089940 -25.985508 -5.703750 -8.507812 0.941567 0.115168 27.418520
50% 50.000000 0.022394 -0.037952 1.013049 -0.046399 0.009802 0.004165 -25.803281 -5.557500 -8.296875 1.025440 0.152568 27.638048
75% 74.500000 0.078992 0.059320 1.198538 0.037200 0.051533 0.079461 -25.548164 -5.338125 -8.015625 1.208403 0.194613 27.837288
max 99.000000 0.322821 0.334290 1.694971 0.210176 0.129700 0.341136 -24.782813 -4.680000 -7.101562 1.746060 0.347568 28.295480

From tables 3.4 and 3.5 note count = 99.0 (the number of measurements). The std
rows also gives an idea of the standard deviation of each column of the data. We can
also see the mean, max and min changing with the two wind speeds. The 25%, 50% and
75% rows are percentiles. As expected, the 50% percentile from the acceleration on the
z axis is approximately 1, this is due to the gravity.

Finally, we will highlight that overall the mean acceleration and the angular velocities
increased from table 3.4 to table 3.5.

The experimental data was stored in 6 files with the .xls extension (Microsoft Excel),
with the filenames corresponding to the angles, and within each file at least 8 sheets were
present with the corresponding voltages.

Thankfully, pandas and/or python has a package xlrd that can open excel files and
navigate excel sheets (or more precisely .xls files).

3.4 Methods

3.4.1 Machine Learning

There are several definitions of what machine learning is:

6len(⃗a) = ||⃗a|| refers to length, magnitude or norm. Not the dimension. For example, by len Acc we
mean the length, magnitude or norm of the acceleration vector.


CHAPTER 3. MATERIALS AND METHODS 25

”Machine learning is the field of study that gives computers the ability to learn without
being explicitly programmed.” [49]. This quote from Arthur Samuel is from 1959, some
consider it outdated and informal. Quoting a more modern definition from Tom Mitchell:

”A computer program is said to learn from experience E with respect to some class of
tasks T and performance measure P, if its performance at tasks in T, as measured by P,
improves with experience E.” [6] [27].

In this case, the experience E is the data set of the measurements of the turbine, the
tasks T are to predict the outputs and the measure P is how close it can predict the
outputs compared to the real experiments.

3.4.2 Machine learning workflow

The common machine learning workflow can be split in to following two phases [47]:

1. Transform the data in such a way that it is in the correct format for the machine
learning algorithm.

2. Select the machine learning algorithms that best fit the problem that is trying to
be solved and tune its parameters.

Part 1 can be further split into three parts:

• Data preprocessing. Process and perform the necessary transformations in such
a way that the data is prepared to be automatically processed by a program or an
application.

• Feature extraction. Process the data in order to extract useful feature candidates
to further use down the line.

• Feature selection. Discard variables that are redundant, for example having the
following two attributes in the same data set: size-f [feet] and size-m [meters] is
detrimental. One of these two variables is redundant, and should be removed.
Other transformations could be handling noise and missing values or creating new
attributes by combining others.

This phase is very determinant on the performance of our model. If the inputs and
outputs are not correlated, our models will hopelessly fail 7.

Finally, part 2 can be split into two parts:

• Algorithm selection. The selection part involves the use of the programmer’s
knowledge in order to apply the correct algorithm to the correct problem type.
For example, there are algorithms that need large data sets in order to start to
converge, and will fail with small amounts of data. The programmer is responsible
for selecting the correct model and having the knowledge to use it.

7Or in a more informal way: ”garbage in, garbage out”, if the input data is nonsense or contains no
information, the output data will also contain no information or it will be nonsensical.


CHAPTER 3. MATERIALS AND METHODS 26

• Parameter tuning or configuration. This final part is done by re-running the
algorithm and also tweaking it manually (with the aid of cross-validation8). Some
models have parameters that change the learning rate and it sometimes has to be
manually changed as the defaults do not work. This is a process of trial and error.

3.4.3 Areas of machine learning

In the following subsection, the three main areas of the machine learning field are de-
scribed. In the case of this work, only supervised learning is relevant, but there are many
more:

• Supervised learning. In this area, the type of problem allows the training data
to have examples comprised of pairs of inputs and outputs. This means that the
data set shows the correct output for a given output. The relation between the
inputs and the outputs is inferred by the algorithm.

• Unsupervised learning. In unsupervised problems the data only contains the
inputs, and the algorithm must find patterns and structure in the data by, for
example, grouping it.

• Reinforcement learning. In this field, the algorithm learns by selecting the
actions that maximize the notion of a cumulative reward. It is inspired in the
behavioral psychology.

A more in depth introduction to the supervised models workflow will be given below.

Formulation of the Machine Learning problem

We will now define the naming for the different elements that conform the model:

• Examples and features: Let n,m ∈ N, n = number of features and m = number of
examples.

• Input variables: This is a matrix Xm×n ∈ Rm×n. Each row of this matrix is an
example, a vector x⃗(i) ∈ Rn that has n features.

• Output variable or target space: Ym×1 ∈ Rm×1 or more clearly y⃗ ∈ Rm. In order to
find a relation, y⃗(i) should be dependent on x⃗(i). These are the known outputs for
an input X.

• Model parameters: θ the dimensions of this matrix is dependent on the concrete
model used, it can be a vector or a matrix.

• Hypothesis function: h(θ) This function, given parameters θ and an input vector
x⃗(i), tries to make the closest guess to the correct output. The concrete implemen-
tation is dependent on the model. Forr example, in linear regression the hypothesis
function could be the straight line equation h(θ) = θ1x+ θ0.

• Hypothesis h(x(ij)): An instantiation of the hypothesis function.

8Cross-validation gives us an idea on how well the model generalizes, we will look at it later on.


CHAPTER 3. MATERIALS AND METHODS 27

• Data set D ∈ Rm×(n+1) this is the data set that contains all the data plus an extra
column of ones. This column of ones is the bias term. It allows the algorithm to
have a basis of bias from which to start from. Recalling from the example of the
hypothesis function h(θ) = θ1x+ θ0 it allows us to have the θ0 term instead of only
having h(θ) = θ1x.

In this work, n = 2, corresponding with the wind speed and the angle target variables.

In order to measure the accuracy of the hypotheses, a cost function is introduced. One
example of this cost function could be: J(θ) = 1

2m

∑m
i=1 (h(xi)− yi)

2 [4]. This formula is
very similar to the mean squared error (MSE). By minimizing J(θ) we can improve the
performance of the algorithm, because we are reducing the distance (and thus the error)
between the predictions (hypotheses) and the target values.

3.4.4 Feature scaling

The ranges of our features should ideally be similar. Ideally: −0.5 ≤ x⃗(i) ≤ 0.5 [5]. This
requirement makes the minimization of the cost function much faster and additionally
reduces cumulative errors when multiplying large floating-point numbers. Floating-point
numbers are not appropriate for large numbers, the larger the number, the less precise
the value is.

One way of scaling data is using the Standard Scaler [36]:

z =
x− µ

σ
(3.1)

Where z is the new value, x is the value that is scaled, µ is the mean and σ is the
standard deviation. This scaler is appropriate for features distributed in a standard
normal distribution.

3.4.5 Supervised learners

A total of 6 machine learning models were used: Linear regression [39], Linear with
polynomial features [39] [45], Ridge [40], Huber [38], Gaussian [37] and a MLP Regressor
9 [43] [36].

3.4.6 Linear Models

Linear Regressor

Linear regression is the simplest regressor. In this case the model minimizes the residual
sum of squares between the observed targets in the data [39]. It assumes a linear relation
between the input values (X) and the output values (Y).

y = ax+ b (3.2)

The model will find the best a and b for equation 3.2 that best fits the data. a and b
are vectors. Equation 3.3 can be used to make an hypothesis function:

9An MLP Regressor is a type of Neural Network.


CHAPTER 3. MATERIALS AND METHODS 28

h(θ) = θ1x+ θ0 (3.3)

Please note that equations 3.2 and 3.3 are just examples and may not be appropriate
for the dimensions of the data.

Polynomial Regressor

Sometimes the data will not have a linear correlation. In these cases, polynomial regres-
sion is extremely useful.

One example of a polynomial regression could be the following:

h(θ) = θ1x+ θ2x
2 + θ3x

3 + θ0 (3.4)

In Python, Sklearn facilitates the use of polynomial regression with the Polynomi-
alFeatures class [45]:

Calling PolynomialFeatures with degree = 2, sklearn returns: [1, a, b, a2, ab, b2].

This added complexity has however some trade offs that will be discussed later.

Ridge Regressor

Ridge regression or Tikhonov regularization is useful when the independent variables are
highly correlated [22]. It addresses some of the problems of the ordinary least squares used
by linear regression by imposing a penalty on the size of the coefficients (l2 regularization)
[40].

It minimizes the following objective function [40]:

||y − θx||22 + α||θ||22 (3.5)

Where α is the regularization term, θ are the weights, x is the training data and y is
the target values.

This model was chosen for comparison of performance with the previous two in case
the variables were highly correlated.

Huber Regressor

This is a linear regression model that is robust to outliers by making sure that the loss
function is not heavily influenced by the outliers while not completely ignoring them [38].

It optimizes the squared loss for the samples where [38]:∣∣∣∣y − θx

σ

∣∣∣∣ < ϵ (3.6)

and the absolute loss for the samples where [38]:∣∣∣∣y − θx

σ

∣∣∣∣ > ϵ (3.7)


CHAPTER 3. MATERIALS AND METHODS 29

Where σ and θ are the parameters to be optimized. σ makes sure that if y is scaled,
ϵ does not need to be rescaled too (this archives the same robustness) [38], θ is the
hypothesis, x is the training data and y is the target values.

This model was again chosen for a comparison with the mentioned linear models and
because of its robustness with outliers, in order to know if the outliers were influencing
the models.

Gaussian Process Regressor

This is a non-parametric bayesian regressor [37] [52] [48].

The implementation of the equation is based on the algorithm 2.1 of Gaussian Pro-
cesses for Machine Learning [53], page 19.

In contrast to the previous learners, it makes probabilistic predictions, i.e: it infers a
probability distribution from the data.

This model was chosen because of its different nature from the previous models, in
the hope of observing different results.

Neural Network (MLP Regressor)

This model optimizes the squared error using gradient descent of a neural network [43].
The number of hidden layers can be customized. In this case, the solver lbfgs (op-

timizer in the family of quasi-Newton methods) will be used because it converges faster
and performs better for smaller datasets [43].

Structure of a Neural Network

A Neural Network is composed of a set of input, hidden and output nodes. In the case
of this work the input nodes could be the wind speed and angle and the output nodes
could be the acceleration.

The following figure represents an example neural network:


CHAPTER 3. MATERIALS AND METHODS 30

Figure 3.6: Neural network example. Illustration made thanks to NN-SVG [3].

Note the 2 input nodes, the two hidden layers with 4 and 8 nodes and the output
layer with three nodes.

The number of nodes in each layer is extremely important as it determines the di-
mensions of the matrices that will be used inside the neural network in order to make
the calculations.

It was chosen because it is very different from the rest of the models and could have
better results.

3.4.7 Evaluation of model performance

Normally, 80% of the data set is used for training the models (train set). The 20% of the
data set remaining is the test set and it is piped into the sklearn.pipeline.Pipeline.score
function [44] in order to find out the performance of the model. This function returns a
float with the score of the model. Note that depending on the model (for example R2

score) it can be negative, i.e: score ∈ (−∞, 1].

It is not possible to say what a good score is in general. For example: A neural
network is getting a score of 95% when trying to predict if a train is late. This score
looks great. However, the neural network is always predicting ”no” and therefore guessing
correct most of the time just by saying always ”no”, because the trains are rarely late (it
is also probably ignoring the input data).

There are several techniques and different scoring metrics in order to avoid these
pitfalls. For this work, however we will start considering scores at and above 50 (≥50%)
and consider scores excellent when over or at 90 (≥90%).

In order to avoid the previously mentioned pitfall (the one with the train delay esti-
mation) and the difficulty when picking a good score, we will also use distances and error.
For example, if the predicted value is 10 and the real value is 10.8 we could calculate the
error using the following formula:


CHAPTER 3. MATERIALS AND METHODS 31

E =
|r − p|
|p|

× 100 =
|10.8− 10|

|10|
× 100 = 8% (3.8)

Where r is the real value and p is the prediction.

Again, we cannot fix an overall acceptable or good error at this point in the work.
However, we will consider a ≈5% error to be acceptable and ≤1% to be excellent. Models
with errors ≫5% will be considered useless.

3.4.8 Ovefitting in Machine Learning

The introduction of non-linear, more complex equations has its disadvantages. The main
one is that we can make a model ”memorize” the data.

Figure 3.7: Overfitted data [18].

In figure 3.7 we can see two lines: in black, a straight line that could represent a
traditional linear regression and in blue a curved line that represents a polynomial of 10
degrees that has been made to fit perfectly the data.

The first disadvantage is the complexity introduced by a 10 degree polynomial, leading
to increased run time, due to having to calculate far more parameters compared to the
linear model. However, the most worrying problem is the following: The blue line has
an almost perfect fit, far more precise than the black line. However, this model will fall
apart once new points are introduced, i.e.: it will fail to generalize, it will probably be
much more imprecise than the linear counterpart once enough new points are introduced.

There are several ways of managing this problem, one being examining the models
manually or penalizing each time the model increases complexity. Nevertheless, in this
work the main way used was dividing the data into two sets.

We will use approximately 80% of the data to train our models, and the 20% left will
be used to score how good or how bad the models are. The data of this sets will also be
shuffled in order to avoid repeatedly creating a biased subset.

3.4.9 Hyperparameter tuning

Hyperparameter tuning is the selection of the optimal parameters for the models. One
example could be choosing the best degree of the polynomial in a polynomial regressor


CHAPTER 3. MATERIALS AND METHODS 32

model.

In order to pick the best parameters, an assortment of parameter combinations was
manually made and piped into sk-learn’s GridSearchCV [42] [34]. The GridSearchCV
model selector has a method that returns the best estimator.

After some manual checks (to make sure that the parameters make sense), one of the
best estimators was chosen for each model.

3.4.10 Pipelining

All of the aforementioned subsections detail a complex workflow. Fortunately, sk-learn
offers us a Pipe that can be set up for a group of models or only one model [35].

The concrete pipeline usage will is detailed in the Supervised Models section, but a
general pipe example is offered (adapted from [35]):

1. Scaler StandardScaler()

2. Classifier or classifiers PCA(), LogisticRegression()

After the pipe setup, only a call to fit and score or the necessary methods is needed
instead of several individual calls and parameter passings for each component.

3.4.11 Periodicity Study

Another attribute studied in this work is the periodicity of the turbine. The objective of
this part is to find out if the turbine is vibrating in some predictable way. And if it is
vibrating, how the frequency might change for different wind speeds or pitch angles.

The problem is therefore to extract frequencies from a data set that consists of mea-
surements made at a certain frequency. This is enabled by Fourier analysis.

Fourier analysis

Fourier analysis studies the decomposition of a function into the sum of several, simpler
trigonometric functions. This process itself is called a Fourier Transform.

One example of an application of a Fourier Transform is to detect and remove high
frequencies from a recording that may be irrelevant or distracting. The Fourier Transform
will be able to, from the signal with all the mixed sounds, detect the peak in the high
frequency, allowing for it to be removed.

In order to use Fourier Analysis, the data must be equally spaced. Some approaches
exist for unequally spaced data but it is outside of the scope of this work.

Fast Fourier Transform

A Fast Fourier Transform is an algorithm that computes the Discrete Fourier Transform of
a sequence [1]. The ”Fast” keyword in FFT is a variant of the algorithm that reduces the
run time complexity from O(N2) in a Discrete Fourier Transform (DFT) to O(N logN)
[1].


CHAPTER 3. MATERIALS AND METHODS 33

Fast Fourier Transform output representation

The output from the FFT is ultimately a representation in the frequency domain. Figure
3.8 shows an example:

Figure 3.8: FFT of a Cosine Summation Function resonating at 10, 20, 30, 40, and 50
Hz. [2].

Note from the bottom graph of figure 3.8 the peaks at x = 10, x = 20, x = 30, x = 40
and x = 50; the FFT has successfully detected the individual frequencies that where
mixed in the top graph.

An FFT analysis will be performed on all the metrics in the data later on, in order
to find out if the turbine is vibrating in a predictable manner.

3.4.12 Data organization

The data was distributed in several excel files and inside them, in different sheets. The
first step in the program was to load the data using Pandas ’s read excel function. This
function returns a DataFrame which is similar to an SQL table.

In order to organize the 48 tables (8 wind speeds and 6 angles), each DataFrame was
placed inside a Python dictionary that contained another dictionary.

The structure of the nested dictionaries was the following:

data = (key = angle : value = dict1) (3.9)

dict1 = (key = wind speed : value = DataFrame) (3.10)


CHAPTER 3. MATERIALS AND METHODS 34

DataFrame from equation 3.10 would contain the data from the turbine at an angle
of angle and a wind speed of wind speed.

This type of data structure allowed O(1) access time (because it is a Python dictio-
nary) to the DataFrames while it also preserved an user-friendly access method. I.e: It
is very easy to request the data from the turbine for a given angle and wind speed.

Further down the work, the supervised models required matrices of data without a
breakdown by angle and wind speed.

As the data structure described in the previous paragraphs did not prove to be ap-
propriate, the new data structure that was adopted combined all the preprocessed data
in a single DataFrame.

The DataFrame will be detailed in the following chapters. In any case for the purposes
of this section the following table is shown detailing the columns:

Table 3.6: Grouped data set columns.
angle windspeed median Acc x mean Acc x var Acc x ptp Acc x amax Acc x amin Acc x std Acc x . . . min abs len Magn
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

The data from table 3.6 can be easily accessed (again in O(1) access time): columns
can be selected in order to use them as input or outputs for the supervised models and
this columns can also be subdivided in order to divide the data set into test and train
sets.

We will also highlight that table 3.6 has 48 rows (recall from the previous paragraphs
8 wind speeds and 6 angles) and 122 columns (the column count will be explained later
on).


Chapter 4

Statistical Analysis

4.1 Data visualization

The first step made in the work after loading the data was to visualize the data. All the
plots were generated in Python using Matplotlib.

In any statistical study it is always recommended to visualize the data as very different
data sets can have the same statistical metrics.

The following figures show a graph of two 10 second experiments at the same angle
of 10◦. One graph is for a wind speed of 8.5ms-1 and the next for 13.8ms-1:

Figure 4.1: 2D plots of the data corresponding to a wind speed of 8.5ms-1 and a pitch
angle of 10◦.

35


CHAPTER 4. STATISTICAL ANALYSIS 36

Figure 4.2: 2D plots of the data corresponding to a wind speed of 13.8ms-1 and a pitch
angle of 10◦.

From figure 4.1 we can see some extreme values that might suggest the presence of
outliers, specially in the magnetometer (”Magn”) column. The most obvious one being
in the ”Magn y” subplot. In figure 4.2 we point out again extreme values, the most
remarkable one in the ”Magn z” subplot.

Not much more information from these graphs can be extracted visually. We can
however see that in graph 4.2 compared to the graph 4.1 the range of values for every
subplot except some (like in Magn x or Magn z) has increased greatly. This suggests that
wind speed is affecting the variables and is correlated with them.

Later on in the work, it was concluded that the outlier removal was not necessary,
and that in fact it might remove useful information and therefore it may be harmful.
This conclusion was reached after the supervised models performed very good without
the possible outliers removal, and after the normality analysis of the data, discussed later
on the work.

From these graphs not much information can be extracted. We can however conclude
that there are no missing or erroneous values in the data set. The existence or absence
of outliers is not clear but it is unimportant.


CHAPTER 4. STATISTICAL ANALYSIS 37

4.2 Distribution study

4.2.1 Histograms

In order to be able to use parametric methods we first have to observe the distribution
of the data. Parametric statistics assume that the data is modeled by a probability
distribution. If the data is normally distributed, some models will work much better
than others.

First, a set of histograms was obtained. The there are four histograms, allowing the
comparison of two different angles, and for the same angle, two wind speeds:

Figure 4.3: Histogram of the data corresponding to a wind speed of 8.5ms-1 and a pitch
angle of 10◦.


CHAPTER 4. STATISTICAL ANALYSIS 38

Figure 4.4: Histogram of the data corresponding to a wind speed of 8.5ms-1 and a pitch
angle of 10◦.

Figure 4.5: Histogram of the data corresponding to a wind speed of 13.8ms-1 and a pitch
angle of 30◦.


CHAPTER 4. STATISTICAL ANALYSIS 39

Figure 4.6: Histogram of the data corresponding to a wind speed of 13.8ms-1 and a pitch
angle of 30◦.

From figures 4.3, 4.4, 4.5 and 4.6 it can be seen that most of the graphs are comparable
to a normal distribution.

Examples of a normal distribution shape are: from figure 4.3 (Acc x, Acc y), figure
4.4 (Gyro x, Gyro y), figure 4.5 (Gyro x, Acc y) and from figure 4.6 (Gyro x, Gyro y).

Examples of a dubious (or not) normal distribution shape are: from figure 4.3 (Magn
y, Magn z), figure 4.4 (Acc x, Acc y), figure 4.5 (Gyro z, Magn y, Magn z) and from
figure 4.6 (Acc y, Acc z).

4.2.2 Examples of normal distributions

Figure 4.7 shows several normal distributions with different means and variances (and
therefore shapes):


CHAPTER 4. STATISTICAL ANALYSIS 40

Figure 4.7: Examples of a normal distributions with different means and variances [26].

4.2.3 Normality test

A more formal test of normality is needed. The Shapiro-Wilk test for normality presents
the null hypothesis that the data came from a normally distributed population. The
chance of rejecting the null hypothesis when it is true is close to 5% regardless of sample
size [17] [12].

The test was run with α = 5% (alpha level). The results were: passed = 310, failed =
122, total = 432 with a pass rate of 71.76% (i.e.: the null hypothesis was rejected 28.24%
of the times).

We therefore conclude that the data is mostly normally distributed, and that para-
metric statistical methods can be used. Additionally, when scaling the data later on, this
normality should be considered when selecting the appropriate scaler.

4.3 Creating new variables

Recalling from table 3.2 the data set columns were the following:

Table 4.1: Data set columns.
Time (0.1s) Acc x Acc y Acc z Gyro x Gyro y Gyro z Mag x Mag y Mag z

At this point in the work it was concluded that adding three new columns with the
modulus of each vector could be relevant to the study and could add new information
that could be used by the models.

The modulus was calculated by taking square root of the sum of the squares of the


CHAPTER 4. STATISTICAL ANALYSIS 41

components of the vector (the standard way of calculating a vector length). Assuming
a⃗ ∈ R3:

len(a) = |⃗a| =
√

a21 + a22 + a23 (4.1)

For example, for the magnetometer vector we would have (using values from 4.2):

len(Mag) = |m⃗| =
√

Mag2x +Mag2y +Mag2z =
√
−26.6772 +−6.1422 +−9.7732 = 29.068

(4.2)

Table 4.2 illustrates the new data set with the added length vectors on the right:

Table 4.2: Data set columns with length vectors.
Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ] len Acc [g] len Gyro [◦s−1] len Mag [µT ]

1 -0.001 0.001 0.978 -0.002 -0.0105 -0.010 -26.677 -6.142 -9.773 0.978 0.014 29.068
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

4.4 Statistical functions

One goal of this work is, given a blade pitch angle α and a wind speed υ, determine a
certain numerical output of some metric. For example: given α = 10◦ and υ = 20ms−1

what is the expected length of the acceleration vector? Answering this question allows us
to model the turbine and to predict new values that might have not been in the training
data.

The reasoning for adding these variables is to be able to predict an output for a set
of inputs. It is not possible to give an instantaneous value of the acceleration for a given
wind speed an angle for example. It is however much more interpretable and useful to
give the range or the maximum acceleration for example.

The outputs chosen were statistical measures of the columns in the table 4.2. The
statistical functions applied to the columns where:

Table 4.3: Statistical functions.
median mean
var ptp
max min
std max abs

min abs median abs

With median being the value at the central position after ordering the data.

Max is the maximum and min is the minimum.

Mean (arithmetic mean):

average = x̄ =
1

n

n∑
i=1

xi (4.3)


CHAPTER 4. STATISTICAL ANALYSIS 42

Ptp being the range:

ptp = range = max−min (4.4)

Variance (µ is the mean):

V ar(X) = σ = E[X − µ]2 = Cov(X,X) = E[X2]− E[X]2 (4.5)

Std is the standard deviation:

std = standard deviation =
√

V ar(X) =
√
σ (4.6)

Max abs, min abs, median abs and average abs:

max abs(x) = max(abs(x)) (4.7)

min abs(x) = min(abs(x)) (4.8)

median abs(x) = median(abs(x)) (4.9)

The metrics max abs, min abs and median abs were introduced manually (they are
not defined by numpy) as simplified metrics that may be easier to predict.

With these values, a new data set is produced, from now on, grouped data. These are
the columns from the new data set:

Table 4.4: Grouped data set columns.
angle windspeed median Acc x mean Acc x var Acc x ptp Acc x amax Acc x amin Acc x std Acc x . . . min abs len Magn
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Data set from table 4.4 has 48 rows (6 angles × 8 wind speeds) and 122 columns
(2 (angle and wind speed) + 10 × 12 (number of metrics from table 4.3 × the original
columns excluding angle and wind speed) for a total of 5.856 cells. It a subset of the
table will be passed as input to the supervised models later on.

4.5 3D Plots

With the new data set from the table 4.1 and the statistical functions in table 4.3 3D
plots are obtained, the three axis are: angle, wind speed and statistical function.

Some relevant plots are shown:


CHAPTER 4. STATISTICAL ANALYSIS 43

Figure 4.8: Median of the magnetometer measurements in the z axis.

Figure 4.9: Standard deviation of the acceleration in the z axis.


CHAPTER 4. STATISTICAL ANALYSIS 44

Figure 4.10: Standard deviation of the angular velocity in the y axis.

Figure 4.11: Median of the magnetometer measurements in the z axis.


CHAPTER 4. STATISTICAL ANALYSIS 45

All the figures 4.8, 4.9, 4.10 and 4.11 all suggest two observations. First of all, the
data can be clearly approximated by a plane (or even better, a curve), this leads up to the
second observation; with an increase in wind speed, the measured metric increases also
(there is a direct correlation). Figures 4.8, 4.10 and 4.11 also suggest that a higher angle
leads to a higher measurement (this is not as clear as the first claim). On the figure 4.9
it appears that at the ranges between [-10, 10] degrees there are higher values compared
to the values outside that range (it looks like an inverted U shape if we look at the figure
from the right point of view).

There are also examples of inverted observations:

Figure 4.12: Median absolute value of the magnetometer in the z axis.


CHAPTER 4. STATISTICAL ANALYSIS 46

Figure 4.13: Median value of the modulus of the magnetometer vector in the z axis.

It is clear that figures 4.12 and 4.13 can be approximated again by a plane. One
important difference is that this time the correlation is inverted (inverse correlation); the
higher the wind speed, the lower the metric measured.

Finally, other graphs did not look predictable (they seem random):


CHAPTER 4. STATISTICAL ANALYSIS 47

Figure 4.14: Median absolute value of the measurements of the magnetometer in the y
axis.

Figure 4.15: Median angular velocity measurements in the z axis.


CHAPTER 4. STATISTICAL ANALYSIS 48

With these graphs we can get a graphical representation of how the metrics change
with different wind speeds and angles. We can also see how fast they seem to scale,
which in most of the examples looked linear (and by extension can be approximated by
a plane). In contrast, other graphs look random and the metrics associated with them
will be thus very hard to predict.

4.6 Correlation matrix

Recalling the grouped data table:

Table 4.5: Grouped data set columns.
angle windspeed median Acc x mean Acc x var Acc x ptp Acc x amax Acc x amin Acc x std Acc x . . . min abs len Magn
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Using the data from table 4.5 we create a correlation matrix (discarding angle and
wind speed). Matrix from figure 4.16 is very dense because of the 120 columns from the
data set. Nonetheless, we will try to comment the matrix:


CHAPTER 4. STATISTICAL ANALYSIS 49

Figure 4.16: Average modulus of the magnetometer vector.


CHAPTER 4. STATISTICAL ANALYSIS 50

Figure 4.17: Average modulus of the magnetometer vector.


CHAPTER 4. STATISTICAL ANALYSIS 51

From figure 4.17 we highlight two areas:

1. Red: We can see several inverse correlations: std Gyro y - median abs Magn z,
median abs Gyro y - median abs Magn z, etc. This is expected as the changes in
angle will also affect the magnetometer components.

2. Purple: We can see some direct correlations: median abs len Acc - median Acc z,
max abs len Acc - amax Acc z, etc. This is expected, as the modulus of the vector
and the median should be correlated, specially with the z axis that is the one with
the larger values of the three.

Overall, the matrix is not very informative as there is a lot of redundancy on it. For
example: maximum and minimum will be correlated with the range, as it is a linear
combination of the two. Another example: All the metrics of the acceleration in an axis
will be correlated with the other metrics of the acceleration in the same axis.

There were however, more interesting parts of the figure, as the ones described in the
red area, but overall, not very informative.


Chapter 5

Periodicity analysis

A periodicity analysis was performed. The aim of this analysis was to try to find if the
turbine vibrates at a certain frequency. This analysis cannot be easily done if we were to
remove the possible outliers, because these values are erased from the data set, resulting
in gaps in the measurements.

The goals from this section are: find out if any of the metrics of the turbine present
a periodic behavior, if they do, obtain the frequency and figure out how the frequency
changes with different wind speeds and angles. Additionally, interpret the findings in
order to know if they make sense.

In order to extract the frequency from the data, a Fast Fourier Transform (FFT) was
used. The Python library used was Scipy 1.8.0 with the scipy.fftpack module and the fft
and fftfreq functions.

5.1 Test setup

The following list describes the configuration of the periodicity analysis:

• The experiment lasts 10 seconds.

• The number of samples is 99 for each metric.

• Each measurement (sample) is spaced by 100ms gaps (the sampling rate is 10Hz).

• Before the analysis the FFT library will be tested with a test function to confirm
correct usage.

• The analysis will be made for each column of the data set for a given angle and
wind speed. For example: average Acc z for a wind speed of 8.5 and an angle of 1◦.

• The mirrored output from the algorithm, on the negative x range, is ignored and
not shown.

5.2 Testing the library

First, a test on the function was made to make sure that the libraries were working and
that they were being used correctly:

53


CHAPTER 5. PERIODICITY ANALYSIS 54

Figure 5.1: FFT Test with a sin function at 2Hz, with a clear peak at x = 2.

The graph from figure 5.1 clearly represents a peak at 2Hz. The FFT has successfully
detected the only frequency at 2Hz.

The same workflow will be used in the real analysis to make sure that the data is
correctly passed to the function.

5.3 Plotting the data set Fast Fourier Transforms

5.3.1 DC Component

Most of the graphs showed a peak at x = 0, which is a probable side-effect of a DC
component. I.e.: The FFT is detecting the median of the data and plotting it at x = 0.

If we take a look at table 5.1, the mean of the len Magn column (the mean of the
modulus of the Magn vector) is 28.56. If we multiply this value by 2 we obtain 57.12
which is roughly equal to the value at x = 0 on figure 5.2:

Table 5.1: Data statistics at a wind speed of 8.5ms−1 and 30◦pitch angle.
Metric Time (0.1s) Acc x Acc y Acc z Gyro x Gyro y Gyro z Magn x Magn y Magn z len Acc len Gyro len Magn
count 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000
mean 50.000000 0.001525 -0.021226 1.033122 -0.028354 0.001044 -0.000453 -26.313516 -5.573011 -9.607244 1.033699 0.089341 28.564459
std 28.722813 0.011927 0.024573 0.060953 0.081943 0.025032 0.041457 0.289135 0.286139 0.298388 0.060954 0.042546 0.288391
min 1.000000 -0.028040 -0.079266 0.801373 -0.187839 -0.061255 -0.100001 -26.969531 -6.288750 -10.476562 0.801555 0.009376 27.976938
25% 25.500000 -0.005765 -0.040277 1.014185 -0.094118 -0.016368 -0.029460 -26.532187 -5.776875 -9.843750 1.014505 0.058281 28.377556
50% 50.000000 0.002338 -0.020795 1.031024 -0.031516 0.003900 -0.001884 -26.313516 -5.557500 -9.562500 1.031553 0.084312 28.540237
75% 74.500000 0.009515 -0.004721 1.046768 0.034030 0.017615 0.027431 -26.167734 -5.411250 -9.421875 1.047564 0.118908 28.766974
max 99.000000 0.032532 0.034198 1.258600 0.151109 0.050737 0.093170 -25.584609 -4.826250 -8.648438 1.259381 0.191606 29.395060


CHAPTER 5. PERIODICITY ANALYSIS 55

Figure 5.2: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y.

This observation: y0 ≈ 2×median(column), holds for most if not all the graphs.

Please note that in figure 5.2 there is information on the y axis in the r = (0, 5] range.
However, it is shadowed by the value at x = 0 (y0), because it is much, much, larger than
the values of the r range (ya, a ∈ r). Matplotlib adapts the axes to the range of the data.
The value y0 ≈ 57 (at x = 0, x /∈ r) is the maximum value in the data and in the y axis
range, while the rest of the data is at maximum y ≈ 0.15. This will be more clear in
graph 5.3.

In order to avoid the distortion on the y axis, we can set the y value at x = 0 to 0.
We will not destroy any information as we already know that it is approximately equal
to two times the median of the data:


CHAPTER 5. PERIODICITY ANALYSIS 56

Figure 5.3: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. DC
component removed.

If we compare figure 5.2 with 5.3 we are now able to see the information in the r
range. The input data is the same, but the peak at x = 0 has been removed.

Another option would have been to subtract the mean of the column to each row of
the data. The removal of the peak was chosen for its simplicity and to avoid destroying
information in the input data.

We will also note that in other graphs the removal at x = 0 it is not necessary. In
any case from now on, all the graphs will have the x = 0 preprocessed unless explicitly
mentioned otherwise.

In the following graphs we can see that it is not necessary to remove the value at
x = 0:


CHAPTER 5. PERIODICITY ANALYSIS 57

Figure 5.4: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y.

Figure 5.5: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. DC
component removed.


CHAPTER 5. PERIODICITY ANALYSIS 58

5.3.2 Comparison of the FFTs at different wind speeds

In this subsection we will explore the possibility of comparing the FFTs at the same angle
but different wind speeds, in order to find out if there is for example a displacement in
the frequencies to higher or lower values at higher wind speeds.

Renaming the axes

Before explaining the wind turbine expected behavior we will define the following axes:

Figure 5.6: Common naming of an aircraft in flight principal axes [7].

We then compare the axis from the figure 5.6 with the axis of our wind turbine:


CHAPTER 5. PERIODICITY ANALYSIS 59

Figure 5.7: Front view of the turbine with labeled axes, simplified.

From images 5.6 and 5.7 we describe the following mapping: Y axis is parallel to the
roll axis but in the opposite direction, X axis is parallel to the pitch axis but again the
opposite direction and the Z axis corresponds with the yaw axis (it is parallel) but it is
also in the opposite direction.

With this mapping in mind we should be able to talk about a rotation in the X axis
to be equivalent to a change in pitch, and the same with the rest of the axis (Z - yaw and
Y - roll). The polarity of the rotation is not relevant for this part.

Expected behavior of the wind turbine

The expected behavior of the wind turbine is the following:

• This description is for a wind turbine with a fixed pitch angle.

• We start with the wind turbine perfectly straight, with no wind affecting it.

• We now apply a constant amount of wind on the turbine. This wind causes the
wind turbine to tip backwards i.e.: the turbine should pitch back, with the top of
the wind turbine being displaced towards the Y axis.

• As the wind applied is constant, the wind turbine should stabilize while it is tipped
a bit backwards at a certain spot in space.


CHAPTER 5. PERIODICITY ANALYSIS 60

• The main oscillations on the wind turbine are expected to be found as a change
in pitch on the Y axis. Significant rotations in the yaw or roll axes should not be
observed as the wind is pushing towards the Y axis.

• As the wind turbine vibrates on the pitch axis, displacement on the Z axis should
also be observed. The rest of the axes should have negligible changes.

• Lastly, we expect the frequency of these changes and oscillations to increase with
higher with speeds.

With this in mind, we expect to find the following vibrations:

• Repeated displacement on the Z axis, represented by a vibration on the acceleration
of Z metric (Acc Z).

• Rotation and counter-rotation around the pitch, detected in the angular velocity of
thee X axis (Gyro X).

Acceleration on the Z axis (consequence of changes of pitch)

All the graphs of the acceleration on the z axis had to had the DC component removed.
This is probably a consequence of the frequency not being strong enough at the 0-5Hz
range. Normally, we expect the peaks in the detected frequencies from a FFT to be
proportional to the DC component. In this case, the peaks are much smaller than the
DC component. The following figure illustrates this point:

Figure 5.8: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of z. DC
component not removed.


CHAPTER 5. PERIODICITY ANALYSIS 61

A comparison of several FFTs is shown below, the pitch angle is kept constant while
the wind is not, the DC component is removed:

Figure 5.9: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of z. DC
component removed.

Figure 5.10: FFT with wind speed of 11.6 and an angle of 30◦, acceleration of z. DC
component removed.


CHAPTER 5. PERIODICITY ANALYSIS 62

Figure 5.11: FFT with wind speed of 13.8 and an angle of 30◦, acceleration of z. DC
component removed.

From figures 5.9, 5.10 and 5.11 there is no visible pattern nor displacement of the
peaks from the different wind speeds. It is possible that the wind turbine is vibrating at
a frequency along the acceleration in the z axis that is much higher than the sample rate
and therefore cannot be detected. This is also supported by the magnitude of the DC
component compared to the peaks in the FFT (≈ 0.05 amplitude versus ≈ 2.1).

As the acceleration is the derivative of the velocity, we expect the changes of this
metric to be much faster than on the velocity. We will now analyze the angular velocity,
hopefully being able to detect a pattern and clearer frequencies thanks to expected slower
changes.

Angular velocity on the X axis (changes in pitch)

The graphs from the angular velocity on the x axis (gyro x) were much more clearer. In
fact, it was not necessary to remove the DC component, indicating that the peaks are
proportional to it and that there is a high probability that the frequencies have been
detected:


CHAPTER 5. PERIODICITY ANALYSIS 63

Figure 5.12: FFT with wind speed of 8.5 and an angle of 30◦, angular velocity on x. DC
component not removed.

Figure 5.13: FFT with wind speed of 10.1 and an angle of 30◦, angular velocity on x. DC
component not removed.


CHAPTER 5. PERIODICITY ANALYSIS 64

Figure 5.14: FFT with wind speed of 11.6 and an angle of 30◦, angular velocity on x. DC
component not removed.

Figure 5.15: FFT with wind speed of 13.8 and an angle of 30◦, angular velocity on x. DC
component not removed.

From the progression of figures 5.12, 5.13, 5.14 and 5.15 two things can be observed:
The peak present at x ≈ 4.35Hz in 5.12 clearly moves right and ends up at x ≈ 4.8Hz in


CHAPTER 5. PERIODICITY ANALYSIS 65

5.15, this suggests an increase in frequency with an increase in wind speed. Additionally,
the peak also increases in height, from J(theta) ≈ 0.052 to J(theta) ≈ 0.077 in the last
figure.

This observation strongly suggests that the wind turbine is vibrating faster at higher
wind speeds, in the pitch axis as it was expected.

Rest of the metrics and axes

The rest of the FFTs did not show clear patterns nor distinct frequency detections. The
magnetometer metric was not considered useful for this analysis, as its main use would
be to get the orientation of the turbine in relation with the earth’s magnetic field.

5.3.3 Periodicity analysis results

In this chapter we successfully confirmed the expected behavior of the wind turbine.
Additionally, we managed to detect a direct correlation between the wind speed and the
frequency at which the turbine was vibrating. This finding was very clear on the angular
velocity on the x axis, but not very conclusive on the displacements on the z axis.

One last observation of the magnetometer is made:

Table 5.2: Data statistics at a wind speed of 8.5ms−1 and 30◦pitch angle.
Metric Magn x [µT ] Magn y [µT ] Magn z [µT ]
min -26.969531 -6.288750 -10.476562
max -25.584609 -4.826250 -8.648438
range 1.384922 1.4625 1.828124

From table 5.2 we can observe that the largest range is registered at the magnetometer
z axis, followed by y and then x. This greater change in the z and y components of
the magnetometer vector further confirms that the main rotation is along the x axis
(variation in pitch) and not the other axes (if the wind turbine rotates around an axis
A, the magnetometer values measured at that axis will be affected the least, because the
vector of the magnetic field will remain at the same angle with the A axis, while the other
axes will not maintain the angle).


Chapter 6

Supervised Models

6.1 Introduction

The main goal of this section is to predict, given a blade pitch angle α and a wind speed
υ, a statistical metric. This would allow us to have a general idea of the behavior of the
wind turbine at different wind speeds and angles without actually having to perform the
experiment.

For example, knowing the maximum acceleration that the wind turbine will be exposed
to at a certain wind speed will allow us to plan how strong should the structure that
supports it be, in order for it to not break and to not waste unnecessary material or
avoid harder or costlier support structures.

This is a regression problem: Given a set of real values, predict a new real value based
on previous examples. More concretely this is a supervised machine learning problem.

Several supervised linear learning models from scikit learn were used in this section
as they are already prepared to use the numpy data formats.

6.2 Data preparation

Table 6.1: Data set columns.
angle windspeed median Acc x mean Acc x var Acc x . . . median Magn z mean Magn z . . . average abs len Magn
1 8.5 0.00187 0.00153 0.00035 . . . -9.56250 -9.60014 . . . 9.60014
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Recalling the columns from earlier in table 6.1 the input vector X will be the columns
angle and wind speed and the target vector Y will be any of the other columns.

In order to have better results, lower complexity and more interpretable models, the
decision was made to have one model per predicted metric, instead of a highly complex
model that predicts all the output columns at the same time. Each model is trained to
predict one output metric.

In the case of the Neural Network, this results in two input nodes and one output
node for each neural network. The hidden layers can also be adjusted in scikit learn,

67


CHAPTER 6. SUPERVISED MODELS 68

along with other parameters such as the algorithm used, α (L2 penalty or regularization
term), etc.

For each of the learners the following steps were taken:

1. Create the X and Y input and output vectors.

2. Shuffle (randomize) the order of the rows of the data set in order to avoid repeated
biases.

3. Split the data set by rows into two sets. The training set and the test set, in 80%
and 20% proportion, respectively.

4. Scale the data.

5. Train the model using the training set.

6. Evaluate the model using the test set.

6.3 Models used

A total of 6 models were used in this section: Linear regression [39], Linear with poly-
nomial features [39][45], Ridge [40], Huber [38], Gaussian [37] and a MLP Regressor [43]
[36].

All of the models used a StandardScaler [46] before fitting the data.

The pipelines of all the models were: Scaler → Estimator. Except the polynomial
model that was: Scaler → PolynomialFeatures → Estimator [45]. With estimator being
the model (linear, gaussian, etc).

6.4 Scaler selection

The StandardScaler was chosen because it was concluded in Statistical Analysis chapter,
Distribution study section, that the data was normally distributed and that parametric
statistical methods could be used.

Sk-learn mentions the following about the StandardScaler : ”[...] might behave badly
if the individual features do not look like standard normally distributed data [...] [46].

We therefore conclude that the StandardScaler is the appropriate scaler in this work,
because of its compatibility with normally distributed data.

6.5 Hyperparameters tuning

The hyperparameter tuning was performed semi-automatically using sk-learn’s Grid-
SearchCV [42].


CHAPTER 6. SUPERVISED MODELS 69

The default parameters used in GridSearchCV were the following: scoring = None,
n jobs = None, refit = True, cv = None (5-fold cross validation) and error score =
np.nan.

For example, in the case of polynomial regression, GridSearchCV allows us to specify
an array of degrees to test with. After the testing is done, GridSearchCV has a method
that returns the best estimator. Additionally, GridSearchCV does cross-validation [34].

By semi-automatically we mean that a number of parameters were fed into the Grid-
SearchCV model selector. These parameters were selected according to the size of the
data and the complexity of it, which was thought to not be extremely high in this case 1.
For example, in the case of the degree of the polynomial model, the values fed were in the
range [2, 8], and the best results were obtained with values under 4. Another example is
the neural network, the MLPRegressor. It did not make any sense to use a model with a
lot of hidden layers and nodes, because the inputs were two and the output is only one.

6.6 Models parameters

Some models did not allow or did not have parameters to tune. Other models, however,
required adjustment in order to have acceptable performance.

If nothing is specified, the default values from sk-learn were used. For detailed docu-
mentation on the defaults, check the official documentation for the models in the previous
references (were we list the models used).

The following list details the chosen model parameters:

• Linear No changes, defaults used: fit intercept = True, normalize = False and
positive = False.

• Polynomial Degrees 2 to 4 were found to be the best performers. The default
parameters were: degree = 2, interaction only = False, include bias = True and
order = C. Degree 2 was chosen in the end for its good performance and lower
chance of overfitting. Same parameters for the model as in Linear.

• Ridge No changes, defaults used. Regularization strength α = 1.0, fit intercept =
True, normalize = False, max iter = None, tol = 10−3 (precision of the solution)
and solver = auto.

• Huber No parameters specified, defaults used. Number of samples that should be
classified as outliers ε = 1.35,max iter = 100, regularization parameter α = 0.0001,
warm start = False, fit intercept = True and tol = 10−5.

• Gaussian No tuning, defaults used. kernel = None (ConstantKernel(1.0, con-
stant value bounds=”fixed” * RBF(1.0, length scale bounds=”fixed”)), value added
to the diagonal of the kernel matrix during fitting α = 10−10, optimizer = fmin l bfgs b,
n restarts optimizer = 0, normalize y = False, and random state = None.

1Recall from the Materials and Methods chapter that the number of samples was 42768. Additionally,
the models studied have two inputs and one output.


CHAPTER 6. SUPERVISED MODELS 70

• MLPRegressor Changes: hidden layer sizes = (2, 4, 2). The rest, were de-
fault parameters: activation = relu, solver = lbfgs, L2 penalty (regularization
term) α = 0.0001, batch size = auto = min(200, n samples), max iter = 1000,
random = None, tol = 10−4, warm start = False and max fun = 15000. The
default hidden layer size was (100, ), considered to be excessive. The solver pa-
rameter lbfgs was vital for the model to perform acceptably. According to sk-learn:
”For small data sets, lbfgs can converge faster and perform better” [43]. This was
found to be the case.

The following figure illustrates the mentioned (2, 4, 2) MLPRegressor:

Figure 6.1: Representation of the (2,4,2) Neural Network used. This network has 3 hidden
layers. Illustration made thanks to NN-SVG [3].

Note from figure 6.1 the 2 nodes on the input layer (wind, angle), the (2, 4, 2) node
structure in the hidden layers (3 hidden layers, with 2, 4 and 2 nodes respectively) and
the single node on the output layer.

6.7 Results

This section will detail the overall scores obtained.

6.7.1 R2 score

All the scores were calculated using the R2 score (also known as the coefficient of deter-
mination) is calculated using the following formula [13] [41]:

R2 = 1− SSres

SStot

(6.1)

where SSres (the residual sum of squares) is:

SSres =
∑
i

(yi − fi)
2 =

∑
i

e2i (6.2)

SStot (the total sum of squares) is:

SStot =
∑
i

(yi − ȳ)2 (6.3)


CHAPTER 6. SUPERVISED MODELS 71

and ȳ (the mean of the observed data) is:

ȳ =
1

n

n∑
i=1

yi (6.4)

According to sklearn R2 ∈ (−∞, 1], where 1 is the best score possible [41]. Scores
over .75 are considered good and scores over .90 very good. Finally, scores under 0.50
will be considered very bad.

The R2 score is inversely correlated with the distance from the prediction to the real
value, i.e: the closer the prediction is to the real value (the smaller the distance), the
higher the R2 score is. If this distance is equal to 0, the R2 score will be equal to 1.

6.7.2 Overall scores

The initialization of the model parameters is random, therefore, some variance between
runs is expected. The following tables show the number of scores over a certain threshold
for four different runs:

Table 6.2: Model scores breakdown. Run 1.
Model Scores over 0.0 Scores over 0.25 Scores over 0.5 Scores over 0.75 Scores over 0.9 Scores over 0.95
Linear 82 50 23 4 1 0
Polynomial 66 54 30 8 2 0
Ridge 76 47 22 10 2 1
Huber 78 50 22 8 4 0
Gaussian 10 8 3 0 0 0
Mlpr 42 27 14 4 0 0

Table 6.3: Model scores breakdown. Run 2.
Model Scores over 0.0 Scores over 0.25 Scores over 0.5 Scores over 0.75 Scores over 0.9 Scores over 0.95
Linear 79 52 22 7 3 0
Polynomial 77 66 36 8 1 1
Ridge 75 49 29 5 2 0
Huber 75 49 26 8 2 0
Gaussian 7 5 2 1 0 0
Mlpr 31 24 14 1 0 0

Table 6.4: Model scores breakdown. Run 3.
Model Scores over 0.0 Scores over 0.25 Scores over 0.5 Scores over 0.75 Scores over 0.9 Scores over 0.95
Linear 73 52 29 10 1 0
Polynomial 77 57 36 13 3 1
Ridge 80 58 27 6 2 0
Huber 64 41 15 7 2 1
Gaussian 7 6 3 1 0 0
Mlpr 45 24 12 4 1 0


CHAPTER 6. SUPERVISED MODELS 72

Table 6.5: Model scores breakdown. Run 4.
Model Scores over 0.0 Scores over 0.25 Scores over 0.5 Scores over 0.75 Scores over 0.9 Scores over 0.95
Linear 76 49 31 11 1 0
Polynomial 76 56 34 11 1 0
Ridge 76 55 33 4 2 0
Huber 75 51 26 8 2 1
Gaussian 9 6 4 2 0 0
Mlpr 36 19 8 1 0 0

There is some amount of variance between runs but the conclusion is the following:
The Gaussian model had the worse performance of the 6. Nonetheless, later on, this

model did perform good in some limited situations (some examples will be shown later).
The next model increasing by performance is the Neural Network (MLPR), that did
perform much better than the Gaussian model but still only managed half of what the
other four better models did. Finally, The best models are the Linear, Polynomial, Ridge
and Hubber regressors, which in some situations even had an R2 score of over 0.95. An
score this high, means that the prediction is extremely close to the real value.

6.7.3 Predictions representations

The following section provides a representation of the predictions made by the best mod-
els. At least one graph is present for each model.

A note about the graphs

In some of the subgraphs of this section only one green point is visible. This is due to
the prediction being so close to the real value that the prediction is hidden as it is drawn
after the real value (drawing over it), therefore, only one green point appears.

Magnetometer

As previously stated, the Gaussian model was not very good. There were however some
instances where it did archive a very good score:


CHAPTER 6. SUPERVISED MODELS 73

Figure 6.2: Predictions of the Gaussian model of the median magnetometer values in the
z axis.

As we can see on figure 6.2 even though the overall scores from the Gaussian model
were not very good, in this case it managed a score of 0.91, which is very good. In fact
many of the real values cannot be seen as they are in the same place as the predictions
(and thus hidden because of the reason already explained).


CHAPTER 6. SUPERVISED MODELS 74

Other examples of the modulus of the magnetometer vector are shown:

Figure 6.3: Predictions of the Linear model of the length of the magnetometer vector.


CHAPTER 6. SUPERVISED MODELS 75

Figure 6.4: Predictions of the Polynomial model of the length of the magnetometer vector.

As we can see from figures 6.3 and 6.4 the score in this case is not as good (please
note it is a different metric, so it is not directly comparable).

In conclusion, the values from the magnetometer grouped data were able to be pre-
dicted. Additionally, the predictions were very close to the real values.


CHAPTER 6. SUPERVISED MODELS 76

Gyroscope

Figure 6.5: Predictions of the Huber model of the standard deviation of the gyroscope in
the y axis.


CHAPTER 6. SUPERVISED MODELS 77

Figure 6.6: Predictions of the Ridge model of the standard deviation of the gyroscope in
the y axis

Figures 6.5 and 6.6 seem to indicate that the standard deviation of the gyroscope in the
y axis is fairly predictable, as two different models managed to predict it successfully.


CHAPTER 6. SUPERVISED MODELS 78

Figure 6.7: Predictions of the MLPR model of the range of the gyroscope in the z axis.


CHAPTER 6. SUPERVISED MODELS 79

Figure 6.8: Predictions of the Polynomial model of variance of the gyroscope in the y
axis.

Figures 6.7 and 6.8 show more very good scores and figure 6.7 introduces the predic-
tions of the MLPR model.


CHAPTER 6. SUPERVISED MODELS 80

Acceleration

The acceleration vector is probably the most important metric to predict together with
the gyroscope values. The following section details several of the best performer models:

Figure 6.9: Predictions of the Linear model of the mean of the acceleration in the z axis.


CHAPTER 6. SUPERVISED MODELS 81

Figure 6.10: Predictions of the Polynomial model of the mean of the acceleration in the
z axis.


CHAPTER 6. SUPERVISED MODELS 82

Figure 6.11: Predictions of the Ridge model of the mean of the acceleration in the z axis.


CHAPTER 6. SUPERVISED MODELS 83

Figure 6.12: Predictions of the Huber model of the mean of the length of the acceleration
vector.


CHAPTER 6. SUPERVISED MODELS 84

Figure 6.13: Predictions of the MLPR model of the standard deviation of the length of
the acceleration vector.

As we can see from the previous figures, the models managed to have very good
scores, with some even breaking over 0.95 like the Polynomial model of the mean of the
acceleration in the z axis on figure 6.10.


CHAPTER 6. SUPERVISED MODELS 85

As an important remark for this section, the Gaussian model did not manage to have
an score over 0.75. This is further supported by the tables 6.2, 6.3, 6.4 and 6.5, where
the Gaussian model consistently showed the worse scores.

Average error rate

Using equation 3.8 and the predictions from the previously mentioned models, the average
error rate was calculated to be ≈ 1%.

This error rate was considered to be very low and the predictions are therefore also
considered to be very precise.

If a virtual model of the turbine were to be made, the recommended models to build it
would be: Linear, Polynomial, Ridge and Huber. MLPR model could also be considered
in some instances and the Gaussian model is strongly discouraged in this case.

Not all the graphs are shown, as there are 107 (107 models with scores over 0.75), but
this overview should give a good idea of the performance of the models.

In conclusion, with these results, we can say that a virtual model of the grouped
metrics of the wind turbine can be done, either by combining the best models for certain
metrics or just using one of the best performers for all the metrics.

Additionaly, these models can be used to predict values outside the training data
set (new pitch angle configurations and wind conditions that were not present in the
experiment). This should work fairly well with intermediate values like 15◦, but not with
values near 0◦(this is an special case were the wind turbine blades will not be rotating).

One last remark must be made, and that is that the values trying to be predicted
should not be very far from the experimental ranges, for example: 20ms−1 (the experi-
ment had a maximum of 13.8ms−1), as these values are very far from the measurements
and the models will probably fail. Some predictions are shown in the following subsection.

6.7.4 Predictions for inputs outside the experiment

Using the previous models, a table was developed that presents the predictions for wind
speeds outside the experiment, but only for models that had a score over 0.90:


CHAPTER 6. SUPERVISED MODELS 86

Table 6.6: Predictions for inputs outside the experiment, only models with scores over
0.90.
Model Metric Angle [◦] Wind speed [ms−1] Prediction
linear mean Acc z 10 15.0 1.104912
linear mean Acc z 10 18.0 1.142317
linear mean Acc z 10 20.0 1.167254
linear mean Acc z 30 15.0 1.103160
linear mean Acc z 30 18.0 1.140565
linear mean Acc z 30 20.0 1.165502
ridge mean Acc z 10 15.0 1.104175
ridge mean Acc z 10 18.0 1.140750
ridge mean Acc z 10 20.0 1.165133
ridge mean Acc z 30 15.0 1.103168
ridge mean Acc z 30 18.0 1.139742
ridge mean Acc z 30 20.0 1.164125
linear var Gyro y 10 15.0 0.004726
linear var Gyro y 10 18.0 0.006617
linear var Gyro y 10 20.0 0.007877
linear var Gyro y 30 15.0 0.005020
linear var Gyro y 30 18.0 0.006911
linear var Gyro y 30 20.0 0.008171
linear mean len Acc 10 15.0 1.111576
linear mean len Acc 10 18.0 1.149678
linear mean len Acc 10 20.0 1.175079
linear mean len Acc 30 15.0 1.110320
linear mean len Acc 30 18.0 1.148421
linear mean len Acc 30 20.0 1.173822
polynomial mean len Acc 10 15.0 1.116658
polynomial mean len Acc 10 18.0 1.161558
polynomial mean len Acc 10 20.0 1.193637
polynomial mean len Acc 30 15.0 1.109985
polynomial mean len Acc 30 18.0 1.155171
polynomial mean len Acc 30 20.0 1.187439

If we compare the outputs for mean len Acc z with the graphs from 6.12 we can see
an increase in the prediction metric (more acceleration, in the linear model, from around
1.25g at 13.8ms−1 to 1.11g at 15.0ms−1 and up to 1.17g at 20.0ms−1).

Lastly, an observation is made: The change in angle from 10◦to 30◦does not change
the acceleration metrics for the same wind speed drastically. It does however, affect the
variance of the gyroscope along the y axis. This suggests that wind speed is much more
influential in the acceleration vector than the angle, at least for these two angles.


CHAPTER 6. SUPERVISED MODELS 87

As there is no real experimental data to compare this outputs with, no further com-
mentary will be done.

6.8 Conclusions

In this section we showed a representation of the predictions of the best models. All the
models had good results, but there were some that were limited to only some metrics.

Additionally, some predictions were made for values outside the experiment data set
and an expected increase in some metrics was observed.


Chapter 7

Conclusions and Future works

7.1 Conclusions

Throughout this work we successfully were able to process the available data from the
experiments and identify some of the main metrics of the floating wind turbine.

The data processing was vital for the model success. For example, the addition of the
length vectors or the simplified metrics allowed us to find more variables to predict. The
visualization of the data was also a vital part of verifying the functioning of the code and
representing the data.

The periodicity analysis was successful and was consistent with the expected results
from the physical viewpoint. The periodicity analysis adds useful information about the
resonant frequencies of the wind turbine that can be later used to further study it.

Most of the supervised models presented did have very good scores, with the Gaussian
being the only one that was discouraged. These precise models can be used to predict
the most important metrics of the turbine in conditions that were not present in the
experiment.

7.2 Future works

If a continuation of this work were to be done, the following paths are suggested:

• Select the best overall model to predict all the metrics and therefore have a fast
approximation of the wind turbine behavior.

• Instead of one model per metric, use one single model that can be a combination
of the best models to predict all the output metrics at the same time. This model
would be more complicated than the previous one.

• Add new variables to the data set. One of the most interesting measurements could
be the instantaneous power generation. This, combined with the wind speed and
angle could be very useful. The current setup did not allow such measurements.

89


CHAPTER 7. CONCLUSIONS AND FUTURE WORKS 90

• Further explore the periodicity analysis with different sized turbines and with differ-
ent weight distributions and study how these configurations affect the most promi-
nent frequency.

• Investigate or program new libraries that allow faster sampling rate in order to
discard or find frequencies above 10Hz were the turbine might or not be vibrating,
or use different equipment for these measurements.

• Explore more complex models like LSTM (Long Short-Term memory) neural net-
works, that could allow the prediction of the model metrics in a short future time-
frame.


Appendices

91


Appendix A

Virtual environments allow the host computer to have multiple Python binaries with
different sets of libraries. This prevents conflicts that might happen if the user were
to have a global configuration with a single Python binary with projects that require
different specific package versions that are not compatible.

See https://docs.python.org/3/library/venv.html for more information on how
to create a virtual environment (venv).

After having set up the venv, you must install the necessary packages listed in the
Programming software section along with the correct versions.

92

https://docs.python.org/3/library/venv.html


Bibliography

93


Bibliography

[1] Fast fourier transform. https://en.wikipedia.org/wiki/Fast_Fourier_

transform, 2022.

[2] AkanoToE. Fast fourier transform of a cosine summation function resonating
at 10, 20, 30, 40, and 50 hz. https://en.wikipedia.org/wiki/File:Normal_

Distribution_PDF.svg, 2020.

[3] alexlenail. Nn-svg. http://alexlenail.me/NN-SVG/index.html, https://

github.com/alexlenail/NN-SVG, 2015-2022.

[4] Standford University Andrew Ng. Cost function. https://www.coursera.org/

learn/machine-learning/supplement/nhzyF/cost-function, 2011.

[5] Standford University Andrew Ng. Gradient descent in practice i - feature scal-
ing. https://www.coursera.org/learn/machine-learning/supplement/CTA0D/

gradient-descent-in-practice-i-feature-scaling, 2011.

[6] Standford University Andrew Ng. What is machine learning? https:

//www.coursera.org/learn/machine-learning/supplement/aAgxl/

what-is-machine-learning, 2011.

[7] Jrvz Auawise. An image showing all three axes. https://commons.wikimedia.org/
wiki/File:Yaw_Axis_Corrected.svg, 2010.

[8] Nicola Bodini, Julie K Lundquist, and Anthony Kirincich. Us east coast lidar mea-
surements show offshore wind turbines will encounter very low atmospheric turbu-
lence. Geophysical Research Letters, 46(10):5582–5591, 2019.

[9] BP. Bp statistical review of world energy. https://www.bp.com/content/

dam/bp/business-sites/en/global/corporate/pdfs/energy-economics/

statistical-review/bp-stats-review-2021-full-report.pdf, 2021.

[10] Subrata Chakrabarti, John Halkyard, and Cuneyt Capanoglu. Chapter 1 - histori-
cal development of offshore structures. In SUBRATA K. CHAKRABARTI, editor,
Handbook of Offshore Engineering, pages 1–38. Elsevier, London, 2005.

[11] WRI CAIT Dataset Climate Watch. Global historical emissions by sec-
tor. https://www.climatewatchdata.org/ghg-emissions?breakBy=sector&

chartType=percentage&end_year=2018&start_year=1990, 1990-2016.

[12] SciPy community. scipy.stats.shapiro. https://docs.scipy.org/doc/scipy/

reference/generated/scipy.stats.shapiro.html#r06d6d75f824a-4, 2008-
2022.

94

https://en.wikipedia.org/wiki/Fast_Fourier_transform
https://en.wikipedia.org/wiki/Fast_Fourier_transform
https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg
http://alexlenail.me/NN-SVG/index.html
https://github.com/alexlenail/NN-SVG
https://github.com/alexlenail/NN-SVG
https://www.coursera.org/learn/machine-learning/supplement/nhzyF/cost-function
https://www.coursera.org/learn/machine-learning/supplement/nhzyF/cost-function
https://www.coursera.org/learn/machine-learning/supplement/CTA0D/gradient-descent-in-practice-i-feature-scaling
https://www.coursera.org/learn/machine-learning/supplement/CTA0D/gradient-descent-in-practice-i-feature-scaling
https://www.coursera.org/learn/machine-learning/supplement/aAgxl/what-is-machine-learning
https://www.coursera.org/learn/machine-learning/supplement/aAgxl/what-is-machine-learning
https://www.coursera.org/learn/machine-learning/supplement/aAgxl/what-is-machine-learning
https://commons.wikimedia.org/wiki/File:Yaw_Axis_Corrected.svg
https://commons.wikimedia.org/wiki/File:Yaw_Axis_Corrected.svg
https://www.bp.com/content/dam/bp/business-sites/en/global/corporate/pdfs/energy-economics/statistical-review/bp-stats-review-2021-full-report.pdf
https://www.bp.com/content/dam/bp/business-sites/en/global/corporate/pdfs/energy-economics/statistical-review/bp-stats-review-2021-full-report.pdf
https://www.bp.com/content/dam/bp/business-sites/en/global/corporate/pdfs/energy-economics/statistical-review/bp-stats-review-2021-full-report.pdf
https://www.climatewatchdata.org/ghg-emissions?breakBy=sector&chartType=percentage&end_year=2018&start_year=1990
https://www.climatewatchdata.org/ghg-emissions?breakBy=sector&chartType=percentage&end_year=2018&start_year=1990
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#r06d6d75f824a-4
https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#r06d6d75f824a-4


BIBLIOGRAPHY 95

[13] Wikipedia community. Coefficient of determination. https://en.wikipedia.org/
wiki/Coefficient_of_determination, 2022.

[14] Biblioteca Matemáticas Complutense. Aspectos formales para presentar un trabajo
académico: Tesis, tfm, tfg, etc. https://youtu.be/6b00Ar2aHmY, 2021.

[15] Wind Europe. Floating offshore wind energy: a policy blueprint for europe. https:
//windeurope.org/wp-content/uploads/files/policy/position-papers/

Floating-offshore-wind-energy-a-policy-blueprint-for-Europe.pdf, 2018.

[16] Casey M Fontana, Spencer T Hallowell, Sanjay R Arwade, Don J DeGroot, Melissa E
Landon, Charles P Aubeny, Brian Diaz, Andrew T Myers, and Senol Ozmutlu.
Multiline anchor force dynamics in floating offshore wind turbines. Wind Energy,
21(11):1177–1190, 2018.

[17] Seymour Geisser and Wesley O Johnson. Modes of parametric statistical inference.
John Wiley & Sons, 2006.

[18] Ghiles. Overfitted data. https://commons.wikimedia.org/wiki/File:

Overfitted_Data.png, 2016.

[19] Andrew J Goupee, Bonjun J Koo, Richard W Kimball, Kostas F Lambrakos, and
Habib J Dagher. Experimental comparison of three floating wind turbine concepts.
Journal of Offshore Mechanics and Arctic Engineering, 136(2), 2014.

[20] Kira Grogg. Harvesting the wind: The physics of wind turbines. Physics and As-
tronomy Comps Papers, 7, 2005.

[21] Heinrich-Böll-Stiftung. Wind turbines 50 times more powerful today than
20 years ago. https://en.wikipedia.org/wiki/File:Growing_size_of_wind_

turbines.png, https://www.flickr.com/photos/boellstiftung/26614518499/
in/album-72157632531168908/, 2017.

[22] Donald E Hilt and Donald W Seegrist. Ridge, a computer program for calculating
ridge regression estimates, volume 236. Department of Agriculture, Forest Service,
Northeastern Forest Experiment . . . , 1977.

[23] Iberdrola. Floating offshore wind. https://www.iberdrola.com/innovation/

floating-offshore-wind, Accessed 2022.

[24] Our World in Data. Renewable energy generation, world. https:

//ourworldindata.org/grapher/modern-renewable-energy-consumption?

stackMode=relative&country=~OWID_WRL, 2020.

[25] InvenSense Inc. Mpu-6000 and mpu-6050 product specification, revi-
sion 3.4. https://invensense.tdk.com/wp-content/uploads/2015/02/

MPU-6000-Datasheet1.pdf, 2013.

[26] Inductiveload. A selection of normal distribution probability density functions (pdfs).
https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg, 2008.

https://en.wikipedia.org/wiki/Coefficient_of_determination
https://en.wikipedia.org/wiki/Coefficient_of_determination
https://youtu.be/6b00Ar2aHmY
https://windeurope.org/wp-content/uploads/files/policy/position-papers/Floating-offshore-wind-energy-a-policy-blueprint-for-Europe.pdf
https://windeurope.org/wp-content/uploads/files/policy/position-papers/Floating-offshore-wind-energy-a-policy-blueprint-for-Europe.pdf
https://windeurope.org/wp-content/uploads/files/policy/position-papers/Floating-offshore-wind-energy-a-policy-blueprint-for-Europe.pdf
https://commons.wikimedia.org/wiki/File:Overfitted_Data.png
https://commons.wikimedia.org/wiki/File:Overfitted_Data.png
https://en.wikipedia.org/wiki/File:Growing_size_of_wind_turbines.png
https://en.wikipedia.org/wiki/File:Growing_size_of_wind_turbines.png
https://www.flickr.com/photos/boellstiftung/26614518499/in/album-72157632531168908/
https://www.flickr.com/photos/boellstiftung/26614518499/in/album-72157632531168908/
https://www.iberdrola.com/innovation/floating-offshore-wind
https://www.iberdrola.com/innovation/floating-offshore-wind
https://ourworldindata.org/grapher/modern-renewable-energy-consumption?stackMode=relative&country=~OWID_WRL
https://ourworldindata.org/grapher/modern-renewable-energy-consumption?stackMode=relative&country=~OWID_WRL
https://ourworldindata.org/grapher/modern-renewable-energy-consumption?stackMode=relative&country=~OWID_WRL
https://invensense.tdk.com/wp-content/uploads/2015/02/MPU-6000-Datasheet1.pdf
https://invensense.tdk.com/wp-content/uploads/2015/02/MPU-6000-Datasheet1.pdf
https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg


BIBLIOGRAPHY 96

[27] Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bishan Yang,
Justin Betteridge, Andrew Carlson, Bhavana Dalvi, Matt Gardner, Bryan Kisiel,
et al. Never-ending learning. Communications of the ACM, 61(5):103–115, 2018.

[28] Eduard Muljadi and Charles P Butterfield. Pitch-controlled variable-speed wind tur-
bine generation. IEEE transactions on Industry Applications, 37(1):240–246, 2001.

[29] Walter Musial, Sandy Butterfield, and Andrew Boone. Feasibility of floating platform
systems for wind turbines. In 42nd AIAA aerospace sciences meeting and exhibit,
page 1007, 2004.

[30] NASA. Global wind speed. https://visibleearth.nasa.gov/images/56893/

global-wind-speed, October 2001.

[31] Office of Energy Efficiency and Renewable Energy. Schematic diagram of a mod-
ern horizontal-axis, three-bladed wind turbine. https://en.wikipedia.org/wiki/
File:EERE_illust_large_turbine.gif, 2006.

[32] World Health Organization. Quantitative risk assessment of the effects of climate
change on selected causes of death, 2030s and 2050s. 2014.

[33] World Health Organization. Climate change and health. https://www.who.int/

news-room/fact-sheets/detail/climate-change-and-health, 2021.

[34] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas-
sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Cross vali-
dation. https://scikit-learn.org/stable/modules/cross_validation.html#

computing-cross-validated-metrics, 2011.

[35] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas-
sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Pipelin-
ing. https://scikit-learn.org/stable/tutorial/statistical_inference/

putting_together.html, 2011.

[36] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Courna-
peau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in
Python. Journal of Machine Learning Research, 12:2825–2830, 2011.

[37] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Van-
derplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and
E. Duchesnay. sklearn.gaussian process.gaussianprocessregressor. https:

//scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.

GaussianProcessRegressor.html, https://scikit-learn.org/stable/

modules/gaussian_process.html#gaussian-process, 2011.

[38] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,
A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.

https://visibleearth.nasa.gov/images/56893/global-wind-speed
https://visibleearth.nasa.gov/images/56893/global-wind-speed
https://en.wikipedia.org/wiki/File:EERE_illust_large_turbine.gif
https://en.wikipedia.org/wiki/File:EERE_illust_large_turbine.gif
https://www.who.int/news-room/fact-sheets/detail/climate-change-and-health
https://www.who.int/news-room/fact-sheets/detail/climate-change-and-health
https://scikit-learn.org/stable/modules/cross_validation.html#computing-cross-validated-metrics
https://scikit-learn.org/stable/modules/cross_validation.html#computing-cross-validated-metrics
https://scikit-learn.org/stable/tutorial/statistical_inference/putting_together.html
https://scikit-learn.org/stable/tutorial/statistical_inference/putting_together.html
https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html
https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html
https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html
https://scikit-learn.org/stable/modules/gaussian_process.html#gaussian-process
https://scikit-learn.org/stable/modules/gaussian_process.html#gaussian-process


BIBLIOGRAPHY 97

sklearn.linear model.huberregressor. https://scikit-learn.org/stable/

modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.

linear_model.HuberRegressor, 2011.

[39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,
A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.
sklearn.linear model.linearregression. https://scikit-learn.org/stable/

modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.

linear_model.LinearRegression, 2011.

[40] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-
napeau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.linear model.ridge.
https://scikit-learn.org/stable/modules/generated/sklearn.linear_

model.Ridge.html#sklearn.linear_model.Ridge, https://scikit-learn.org/
stable/modules/linear_model.html#ridge-regression, 2011.

[41] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos,
D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.metrics.r2 score.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_

score.html, 2011.

[42] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,
A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.
sklearn.model selection.gridsearchcv. https://scikit-learn.org/stable/

modules/generated/sklearn.model_selection.GridSearchCV.html, 2011.

[43] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,
A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.
sklearn.neural network.mlpregressor. https://scikit-learn.org/stable/

modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.

neural_network.MLPRegressor,https://scikit-learn.org/stable/modules/
neural_networks_supervised.html, 2011.

[44] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon-
del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Courna-
peau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.pipeline.pipeline.score.
https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.

Pipeline.html#sklearn.pipeline.Pipeline.score, 2011.

[45] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas,
A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay.
sklearn.preprocessing.polynomialfeatures. https://scikit-learn.org/stable/

modules/generated/sklearn.preprocessing.PolynomialFeatures.html#

sklearn.preprocessing.PolynomialFeatures, 2011.

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge
https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge
https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression
https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor
https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor
https://scikit-learn.org/stable/modules/neural_networks_supervised.html
https://scikit-learn.org/stable/modules/neural_networks_supervised.html
https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline.score
https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline.score
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures


BIBLIOGRAPHY 98

[46] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel,
M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas-
sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Stan-
dard scaler. https://scikit-learn.org/stable/modules/generated/sklearn.

preprocessing.StandardScaler.html, 2011.

[47] Alexandre Quemy. Two-stage optimization for machine learning workflow. Informa-
tion Systems, 92:101483, 2020.

[48] Carl Edward Rasmussen. Gaussian processes for machine learning. http://

gaussianprocess.org/gpml/chapters/RW.pdf, 2006.

[49] Arthur L Samuel. Machine learning. The Technology Review, 62(1):42–45, 1959.

[50] TÜV SÜD. Floating windfarms. https://www.tuvsud.com/en/resource-centre/
stories/floating-windfarms, 2018.

[51] app.electricitymap.org. electricitymap, spain. https://app.electricitymap.

org/zone/ES, https://www.entsoe.eu/, 2022.

[52] Christopher K Williams and Carl Edward Rasmussen. Gaussian processes for ma-
chine learning, volume 2. MIT press Cambridge, MA, 2006.

[53] Christopher K Williams and Carl Edward Rasmussen. Gaussian processes for ma-
chine learning. MIT press Cambridge, MA, 2006.

https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html
http://gaussianprocess.org/gpml/chapters/RW.pdf
http://gaussianprocess.org/gpml/chapters/RW.pdf
https://www.tuvsud.com/en/resource-centre/stories/floating-windfarms
https://www.tuvsud.com/en/resource-centre/stories/floating-windfarms
app.electricitymap.org
https://app.electricitymap.org/zone/ES
https://app.electricitymap.org/zone/ES
https://www.entsoe.eu/


	Introduction
	Climate change
	Main polluters

	How a turbine generator works
	Parts of a wind turbine
	Physics of power generation
	Blade pitch angle

	Types of wind turbines
	Goals and specific objectives
	Main objective
	Objectives breakdown

	Relation of the work with the completed bachelor courses
	Work plan
	Software
	Text editor software
	Programming software

	Repository
	Structure of the project

	State Of The Art
	Current floating wind turbines wind farms
	Evolution of wind turbines power output
	Types of platforms for floating wind turbines
	Types of floating wind turbines anchors
	Fields of study
	Overlap between floating wind turbines and offshore turbines
	Overlap with the oil industry

	Current floating wind turbines study fields

	Materials and Methods
	Introduction to the experimental setup
	Experiment setup
	Materials: Data from experiments
	Methods
	Machine Learning
	Machine learning workflow
	Areas of machine learning
	Feature scaling
	Supervised learners
	Linear Models
	Evaluation of model performance
	Ovefitting in Machine Learning
	Hyperparameter tuning
	Pipelining
	Periodicity Study
	Data organization


	Statistical Analysis
	Data visualization
	Distribution study
	Histograms
	Examples of normal distributions
	Normality test

	Creating new variables
	Statistical functions
	3D Plots
	Correlation matrix

	Periodicity analysis
	Test setup
	Testing the library
	Plotting the data set Fast Fourier Transforms
	DC Component
	Comparison of the FFTs at different wind speeds
	Periodicity analysis results


	Supervised Models
	Introduction
	Data preparation
	Models used
	Scaler selection
	Hyperparameters tuning
	Models parameters
	Results
	R2 score
	Overall scores
	Predictions representations
	Predictions for inputs outside the experiment

	Conclusions

	Conclusions and Future works
	Conclusions
	Future works

	Appendices
	Appendix A

	Bibliography