Floating Wind Turbine Dynamics Identification Identificación de la dinámica de una turbina eólica flotante Juan Tecedor Roa Bachelor Degree in Software Engineering FACULTY OF COMPUTER SCIENCE Supervised by Matilde Santos Peñas Carlos Luis Serrano Barreto MADRID, 2021-2022 Floating Wind Turbine Dynamics Identification Identificación de la dinámica de una turbina eólica flotante Bachellor in Software Engineering Final Project Juan Tecedor Roa Supervised by Matilde Santos Peñas Carlos Luis Serrano Barreto Department of Computer Architecture and Automation Faculty of Computer Science Universidad Complutense de Madrid MADRID, 2021-2022 Acknowledgements To my supervisors, Matilde Santos Peñas and Carlos Luis Serrano Barreto from the Complutense University of Madrid, for their guiding throughout all the work, and to Enrique Sierra Garćıa from the University of Burgos for his suggestions and guiding early in the project. Without their input and knowledge this work would have been a car without wheels. To the workers at Biblioteca Complutense for their style guide in the formal aspects of writing a final bachelor project [14]. To the developers, creators and maintainers of the free-open-source software that enabled the creation of this work as well as the companies that facilitate educational personnel the partial or complete use of their licensed tools. i Abstract Climate change is one of the biggest and most worrying problems in the current world. Renewable energy sources are one of the main tools that will allow the humanity to fight against it. More precisely, floating wind turbines offer unprecedented amounts of generated power compared to their onshore or offshore (bottom-fixed) counterparts. The technology is however, at an early stage of development, with a lot of improvements to be made and countless fields of study. This project aims to study the behavior of the scale model of a floating wind turbine by elaborating several statistical models that can predict some of its most important statistical metrics. These statistical models are dependent on the wind speed and the blade pitch angle of the wind turbine. Additionally, a periodicity analysis of the wind turbine is also made in order to determine if there are frequencies associated with it at different wind speeds and pitch angles. In this work, a data preprocessing phase is carried out with the aid of statistics and graphical representations. Then, two studies are made: a periodicity analysis by several Fourier Transforms, and multiple regression supervised models. The supervised models used were: Linear Regression, Polynomial Regression, Ridge Regressor, Huber Regressor, Gaussian Regressor and a Neural Network (MLP Regressor). Most of the supervised models were very successful and could be used to create a virtual model of a wind turbine. The periodicity analysis was also successful and was consistent with the physical analysis of the wind turbine. Keywords —Wind turbine, floating, dynamics, identification, regression, neural net- works, statistical analysis, data preprocessing, data representation, Fast Fourier Trans- forms. iii Resumen El cambio climático es uno de los problemas más importantes y preocupantes actualmente en el mundo. Las enerǵıas renovables son una de las herramientas que tenemos disponibles para poder luchar contra él. En concreto, las turbinas eólicas flotantes pueden ofrecer una proporción de enerǵıa eléctrica generada sin precedentes, especialmente si las comparamos con las turbinas emplazadas en tierra o en el mar a baja profundidad cerca de la costa. Sin embargo, la tecnoloǵıa de las turbinas eólicas flotantes está en sus comienzos, con incontables mejoras por implementar y diversos campos de estudio. Este proyecto estudia el comportamiento de un modelo a escala de una turbina flotante, mediante la elaboración de varios modelos estad́ısticos que puedan predecir las métricas estad́ısticas más relevantes de la turbina. Estos modelos estad́ısticos dependen de la velocidad del viento y del ángulo de ataque de las palas de la turbina. Adicional- mente, se ha realizado también un análisis de la periodicidad de la turbina de viento para determinar qué frecuencias están asociadas con ella y a qué velocidades de viento y a qué ángulos de ataque. En este trabajo se realiza una primera fase de preprocesado de datos, con ayuda de he- rramientas estad́ısticas y representaciones gráficas. Posteriormente, se realizan dos estu- dios: un análisis de la periodicidad mediante varias Transformadas Rápidas de Fourier, y múltiples modelos supervisados de regresión. Los modelos supervisados fueron los siguientes: Regresión lineal, Regresión Polinómica, Regresión de Ridge, Regresión de Huber, Regresión Gaussiana y una red neuronal (regresor MLP). La mayoŕıa de los modelos supervisados obtuvieron resultados muy satisfactorios y podŕıan ser usados para crear un modelo virtual de la turbina de viento. El análisis de periodicidad fue también exitoso y consistente con el análisis f́ısico de la turbina de viento. Palabras clave — Turbina eólica, flotante, dinámica, identificación, regresión, redes neuronales, análisis estad́ıstico, preprocesamiento de datos, representación de datos, FFT. v Contents 1 Introduction 1 1.1 Climate change . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Main polluters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 How a turbine generator works . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 Parts of a wind turbine . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Physics of power generation . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 Blade pitch angle . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Types of wind turbines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Goals and specific objectives . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.1 Main objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.2 Objectives breakdown . . . . . . . . . . . . . . . . . . . . . . . . 8 1.5 Relation of the work with the completed bachelor courses . . . . . . . . . 8 1.6 Work plan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.7 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.7.1 Text editor software . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.7.2 Programming software . . . . . . . . . . . . . . . . . . . . . . . . 10 1.8 Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.9 Structure of the project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 State Of The Art 13 2.1 Current floating wind turbines wind farms . . . . . . . . . . . . . . . . . 13 2.2 Evolution of wind turbines power output . . . . . . . . . . . . . . . . . . 13 2.3 Types of platforms for floating wind turbines . . . . . . . . . . . . . . . . 15 2.4 Types of floating wind turbines anchors . . . . . . . . . . . . . . . . . . . 16 2.5 Fields of study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.5.1 Overlap between floating wind turbines and offshore turbines . . . 16 2.5.2 Overlap with the oil industry . . . . . . . . . . . . . . . . . . . . 16 2.6 Current floating wind turbines study fields . . . . . . . . . . . . . . . . . 17 3 Materials and Methods 19 3.1 Introduction to the experimental setup . . . . . . . . . . . . . . . . . . . 19 3.2 Experiment setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Materials: Data from experiments . . . . . . . . . . . . . . . . . . . . . . 23 3.4 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.1 Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4.2 Machine learning workflow . . . . . . . . . . . . . . . . . . . . . . 25 3.4.3 Areas of machine learning . . . . . . . . . . . . . . . . . . . . . . 26 3.4.4 Feature scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 vii CONTENTS viii 3.4.5 Supervised learners . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.6 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.4.7 Evaluation of model performance . . . . . . . . . . . . . . . . . . 30 3.4.8 Ovefitting in Machine Learning . . . . . . . . . . . . . . . . . . . 31 3.4.9 Hyperparameter tuning . . . . . . . . . . . . . . . . . . . . . . . . 31 3.4.10 Pipelining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.11 Periodicity Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.4.12 Data organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Statistical Analysis 35 4.1 Data visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Distribution study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2.1 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2.2 Examples of normal distributions . . . . . . . . . . . . . . . . . . 39 4.2.3 Normality test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.3 Creating new variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.4 Statistical functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.5 3D Plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.6 Correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5 Periodicity analysis 53 5.1 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.2 Testing the library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.3 Plotting the data set Fast Fourier Transforms . . . . . . . . . . . . . . . 54 5.3.1 DC Component . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3.2 Comparison of the FFTs at different wind speeds . . . . . . . . . 58 5.3.3 Periodicity analysis results . . . . . . . . . . . . . . . . . . . . . . 65 6 Supervised Models 67 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2 Data preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.3 Models used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.4 Scaler selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.5 Hyperparameters tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6.6 Models parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.7 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.7.1 R2 score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.7.2 Overall scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.7.3 Predictions representations . . . . . . . . . . . . . . . . . . . . . . 72 6.7.4 Predictions for inputs outside the experiment . . . . . . . . . . . 85 6.8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 7 Conclusions and Future works 89 7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 7.2 Future works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 CONTENTS ix Appendices 92 Appendix A 92 Bibliography 94 List of Figures 1.1 Global Historial CO2 Emissions by Sector [11]. . . . . . . . . . . . . . . . 2 1.2 Renewable Energy Generation [24] [9]. . . . . . . . . . . . . . . . . . . . 3 1.3 Parts of a wind turbine as seen from the back and to the left side. [31]. . 4 1.4 Three types of turbine, with the types labeled. Adapted from [50]. . . . . 6 1.5 Global Wind Speed in January and July year 2001 [30]. . . . . . . . . . 7 2.1 Average rated output of a wind turbine for selected years [21]. . . . . . . 14 2.2 Four types of floating wind turbines. Adapted from [23]. . . . . . . . . . 15 3.1 Front view of the wind turbine with labeled axes. . . . . . . . . . . . . . 20 3.2 Front view of the turbine with labeled axes, simplified. . . . . . . . . . . 21 3.3 Top view of the turbine with labeled axes, simplified. . . . . . . . . . . . 21 3.4 IMU with its axes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.5 Polarity of rotation and orientation of the IMU axes. . . . . . . . . . . . 22 3.6 Neural network example. Illustration made thanks to NN-SVG [3]. . . . . 30 3.7 Overfitted data [18]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.8 FFT of a Cosine Summation Function resonating at 10, 20, 30, 40, and 50 Hz. [2]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1 2D plots of the data corresponding to a wind speed of 8.5ms-1 and a pitch angle of 10◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 2D plots of the data corresponding to a wind speed of 13.8ms-1 and a pitch angle of 10◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Histogram of the data corresponding to a wind speed of 8.5ms-1 and a pitch angle of 10◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.4 Histogram of the data corresponding to a wind speed of 8.5ms-1 and a pitch angle of 10◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.5 Histogram of the data corresponding to a wind speed of 13.8ms-1 and a pitch angle of 30◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.6 Histogram of the data corresponding to a wind speed of 13.8ms-1 and a pitch angle of 30◦. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.7 Examples of a normal distributions with different means and variances [26]. 40 4.8 Median of the magnetometer measurements in the z axis. . . . . . . . . . 43 4.9 Standard deviation of the acceleration in the z axis. . . . . . . . . . . . . 43 4.10 Standard deviation of the angular velocity in the y axis. . . . . . . . . . 44 4.11 Median of the magnetometer measurements in the z axis. . . . . . . . . . 44 4.12 Median absolute value of the magnetometer in the z axis. . . . . . . . . . 45 4.13 Median value of the modulus of the magnetometer vector in the z axis. . 46 x LIST OF FIGURES xi 4.14 Median absolute value of the measurements of the magnetometer in the y axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.15 Median angular velocity measurements in the z axis. . . . . . . . . . . . 47 4.16 Average modulus of the magnetometer vector. . . . . . . . . . . . . . . . 49 4.17 Average modulus of the magnetometer vector. . . . . . . . . . . . . . . . 50 5.1 FFT Test with a sin function at 2Hz, with a clear peak at x = 2. . . . . 54 5.2 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. . . . . 55 5.3 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. DC component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5.4 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. . . . . 57 5.5 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. DC component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.6 Common naming of an aircraft in flight principal axes [7]. . . . . . . . . . 58 5.7 Front view of the turbine with labeled axes, simplified. . . . . . . . . . . 59 5.8 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of z. DC component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.9 FFT with wind speed of 8.5 and an angle of 30◦, acceleration of z. DC component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.10 FFT with wind speed of 11.6 and an angle of 30◦, acceleration of z. DC component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.11 FFT with wind speed of 13.8 and an angle of 30◦, acceleration of z. DC component removed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.12 FFT with wind speed of 8.5 and an angle of 30◦, angular velocity on x. DC component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.13 FFT with wind speed of 10.1 and an angle of 30◦, angular velocity on x. DC component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.14 FFT with wind speed of 11.6 and an angle of 30◦, angular velocity on x. DC component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . 64 5.15 FFT with wind speed of 13.8 and an angle of 30◦, angular velocity on x. DC component not removed. . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.1 Representation of the (2,4,2) Neural Network used. . . . . . . . . . . . . 70 6.2 Predictions of the Gaussian model of the median magnetometer values in the z axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.3 Predictions of the Linear model of the length of the magnetometer vector. 74 6.4 Predictions of the Polynomial model of the length of the magnetometer vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.5 Predictions of the Huber model of the standard deviation of the gyroscope in the y axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.6 Predictions of the Ridge model of the standard deviation of the gyroscope in the y axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.7 Predictions of the MLPR model of the range of the gyroscope in the z axis. 78 6.8 Predictions of the Polynomial model of variance of the gyroscope in the y axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.9 Predictions of the Linear model of the mean of the acceleration in the z axis. 80 6.10 Predictions of the Polynomial model of the mean of the acceleration in the z axis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.11 Predictions of the Ridge model of the mean of the acceleration in the z axis. 82 LIST OF FIGURES xii 6.12 Predictions of the Huber model of the mean of the length of the acceleration vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.13 Predictions of the MLPR model of the standard deviation of the length of the acceleration vector. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 List of Tables 1.1 Work plan table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.2 Official deadlines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3 Main project tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Python packages used. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1 Equivalence between voltages and wind speeds. . . . . . . . . . . . . . . 19 3.2 Sample of data at 130V (8.5ms-1) and 1◦. . . . . . . . . . . . . . . . . . . 23 3.3 Sample of data at 200V (13.8ms-1) and 1◦. . . . . . . . . . . . . . . . . . 24 3.4 Data at a wind speed of 8.5ms−1 and 1◦pitch angle. . . . . . . . . . . . . 24 3.5 Data at a wind speed of 13.8ms−1 and 1◦pitch angle. . . . . . . . . . . . 24 3.6 Grouped data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1 Data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Data set columns with length vectors. . . . . . . . . . . . . . . . . . . . . 41 4.3 Statistical functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.4 Grouped data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.5 Grouped data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.1 Data statistics at a wind speed of 8.5ms−1 and 30◦pitch angle. . . . . . . 54 5.2 Data statistics at a wind speed of 8.5ms−1 and 30◦pitch angle. . . . . . . 65 6.1 Data set columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6.2 Model scores breakdown. Run 1. . . . . . . . . . . . . . . . . . . . . . . 71 6.3 Model scores breakdown. Run 2. . . . . . . . . . . . . . . . . . . . . . . 71 6.4 Model scores breakdown. Run 3. . . . . . . . . . . . . . . . . . . . . . . 71 6.5 Model scores breakdown. Run 4. . . . . . . . . . . . . . . . . . . . . . . 72 6.6 Predictions for inputs outside the experiment, only models with scores over 0.90. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 xiii Chapter 1 Introduction 1.1 Climate change Climate change is one of the most important and worrying problems in the current world. Labeled as ”the biggest health threat facing humanity” by the World Health Organization [33], with expectations of 250 000 additional deaths per year between 2030 and 2050 due to climate change alone. The causes of this death toll are from: malnutrition, malaria, diarrhea and heat stress. Direct damage costs to health are estimated to reach between USD 2-4 billion per year by 2030 [32]. The main contributor to climate change is air pollution, and more concretely CO2 emissions. Therefore, we must push for reducing air pollution drastically. 1.1.1 Main polluters In order to reduce air pollution we must first identify which elements contribute the most to CO2 emissions. In the following graph we analyze the main polluters: 1 CHAPTER 1. INTRODUCTION 2 Figure 1.1: Global Historial CO2 Emissions by Sector [11]. We can see that three quarters of the CO2 emissions are due to energy generation. Electricity generation is, therefore, the first polluter, and it is where most of the work must be done. On the electricity generation mix, the main non-renewable sources represent the fol- lowing percentages: oil (31.2%), coal (27.2%) and gas (24.7%). The rest of the generation is made by renewables (5.7%), hydroelectric (6.9%) and other low-carbon generators such as nuclear (4.3%) [9]. The room for growth in renewable sources is therefore large. It must replace the main polluters such as coal, oil and gas. The following graph illustrates the accelerating growth in wind as a renewable energy source in the recent years: CHAPTER 1. INTRODUCTION 3 Figure 1.2: Renewable Energy Generation [24] [9]. In conclusion, wind as a renewable energy source is fundamental to reduce the global CO2 emissions. It is also currently in a period of rapid adoption. Wind technology is, however, still evolving technologically and there are still problems to be solved and optimizations to be made. 1.2 How a turbine generator works Wind power generation is made by the use of generators inside the wind turbines. Tur- bines extract the kinetic energy of a fluid (wind) that is passing through its blades (in this case it is wind) and converts it into a rotatory motion. This mechanical motion is fed into an electrical generator, that converts the rotatory mechanical energy into electrical energy (normally alternating voltage (AC)). 1.2.1 Parts of a wind turbine To further explain how a turbine works we present a diagram: CHAPTER 1. INTRODUCTION 4 Figure 1.3: Parts of a wind turbine as seen from the back and to the left side. [31]. From figure 1.3 we must first focus on the three main components: the tower (the col- umn that supports the nacelle), the nacelle (streamlined body that houses the mechanical and electrical components) and the blades. Further decomposition of components can be made. We highlight the generator and the shafts that connect the rotor to the generator. Finally, note the pitch arrows, indi- cating that the angle at which the turbine blades encounter the wind can be changed. 1.2.2 Physics of power generation The amount of energy generated by a wind turbine depends on several factors [20]. First, we take into account the kinetic energy of the wind: Ek = 1 2 mv2 (1.1) Equation 1.1 is the standard kinetic energy equation. Knowing that ρ (density) is expressed in [kgm−3], velocity in [ms−1], time in [s] and A area in [m2] we can substitute m [kg] in the previous equation: m = ( kg m3 )( m s )(s)(m2) = ρvtA = kg (1.2) E = 1 2 mv2 = 1 2 (Avtρ)v2 = 1 2 Avtρv2 = 1 2 Atρv3 (1.3) Power is expressed in [W ] (watts) which is the same as [Et−1] (Joules per second). From equation 1.3: P = E t = 1 2 Aρv3 (1.4) CHAPTER 1. INTRODUCTION 5 From equation 1.4 we conclude that wind power is proportionally dependent on the area swept by the blades of the turbine, density of the air and wind speed (last one is cubed, therefore much more important). However, it must be noted that equation 1.4 is a simplification for ideal conditions, where a turbine can extract all the energy from the wind. In real life, power generated will be much lower [20]. Nonetheless, equation 1.4 is still useful as it introduces us to the concepts and variables involved. In conclusion, in order to maximize the power output from a turbine we have to take into account the previous variables. From equation 1.4 the A term, the area, can be changed reasonably, building larger turbines with longer blades, that sweep a bigger area. The second term, p, the air density, can hardly be changed, as it depends in several factors such as temperature, humidity or pressure. The last and most important term v is cubed, we therefore have to carefully consider where to build the wind turbines (each geographical location will have an average wind speed). The turbine should be built in areas with high wind speeds. 1.2.3 Blade pitch angle In the previous section we concluded that wind speed was the determinant factor in how much power was generated. There is another factor to consider, and that is the blade pitch angle. Pitch controlled turbines allow the controller to change the angle at which the blade contacts the wind, instead of having a fixed, determined angle. A 0◦pitch is considered to have the blade is parallel to the wind, with no rotatory motion induced. Positive angles increase this pitch from the 0◦reference, and negative values indicate the same motion but in the opposite direction, resulting in a contrary rotation compared to the positive values. The range of rotation of the blades in a wind turbine is usually [0, 90] degrees (from parallel to the wind to perpendicular to it). Pitch control can therefore be used to control the power generated by the turbine and to protect it by keeping its rotor speed in a controllable limit as well as optimizing the energy generation [28]. 1.3 Types of wind turbines There are three types of turbines depending on their physical location: onshore (placed inland), offshore (installed in a platform with a solid foundation that is fixed to the sea bed) and floating (a floating platform that is prevented from drifting by the use of cables and anchoring devices). Figure 1.4 illustrates this: CHAPTER 1. INTRODUCTION 6 Figure 1.4: Three types of turbine, with the types labeled. Adapted from [50]. This third type of turbine is specially important at depths beyond 60m, where the feasibility of a platform that runs down until the sea bed is no longer economically viable or feasible [19]. These turbines are normally placed much farther from the coast than their offshore counterparts. However, this geographical location poses several problems that increase cost and the difficulty of deployment. Some of these problems are: • Harsher weather compared to inland or offshore counterparts (with some remarks in the advantages section). • Unstable, material fatigue-prone behavior due to constant oscillation and vibra- tions. • Greatly increased maintenance costs due to the difficulty of reaching the platform by the maintenance personnel and its non-static behavior. • Power lines have to be resistant to the underwater environment, resulting in more expenses. What are then the motivations for using offshore technology? • Higher wind speeds compared to onshore installations, leading to increased power generation. CHAPTER 1. INTRODUCTION 7 Figure 1.5: Global Wind Speed in January and July year 2001 [30]. From figure 1.5 note that the light shaded areas are located in the sea and in some situations close to the coast, but very rarely inland. As a general rule, average wind speed increases with distance from the coast [23]. • Large areas without obstructions or restrictions on where to build the turbine, allowing large projects. This property is known as fetch [23]. • More stable and predictable wind, with reduced turbulence and wind-shear [19] [8]. 1.4 Goals and specific objectives 1.4.1 Main objective The main objective of this work is to create a virtual model of a floating wind turbine from the experimental data. The model should be able to reproduce the behavior of the turbine, outputting variables with values that are identical to the real model. Additionally, it will extrapolate predictions for values that are not present in the data, that can be later on verified experimentally. This virtual model would allow us to, up to a certain degree, know how a turbine will behave at different wind conditions or blade pitch angle configurations, without actually needing to measure, build or be physically near a real wind turbine. More precisely, the virtual models studied in this work are dependent on the wind speed and on the pitch angle of the blades of the turbine. Given these two variables, the models output a metric of the wind turbine such as median acceleration. CHAPTER 1. INTRODUCTION 8 Another approach in this work is made by studying the data and performing a peri- odicity analysis. This method further compliments the virtual model by detecting if the turbine is vibrating at a certain frequency. 1.4.2 Objectives breakdown This work aims to identify the behavior of a floating wind turbine, given a data set of experiments made on a scale model of a floating wind turbine. More precisely: • Preprocess the data automatically. Select and extract the correct variables that are informative for the models. • Perform statistical analysis in the data in order to know which type of distribution is the data the closest to. Using this information, select the correct algorithms that are designed for the type of data. • Visualize the data. Plot the necessary graphs that allow an easy visualization of the data. • Using machine learning and other techniques (supervised models), identify the sys- tem, i.e.: find the best function (linear, plane, curve, etc) that best fits the data. For a given set of inputs: wind speed and pitch angle. • Study if there is a periodic behavior in the turbine. Find out if it is vibrating or not and at what frequencies if it is. If it is vibrating, find out if the frequency changes with different wind speeds. • Extrapolate the findings of the models into new wind speeds, pitch angles and parameters, verify if the trained models successfully predict different situations and evaluate the results. 1.5 Relation of the work with the completed bache- lor courses The following list details the relation of a part of the work with the courses taken through- out the bachelor: • How a turbine works. Fundamentals of electricity and electronics helped with the basic electrical knowledge and in understanding how power is generated elec- tromagnetically by the generator. • Data preprocessing and supervised models. The most important courses to understand how to preprocess the data and understand the models: Cloud and Big Data, Machine Learning and Big Data and Linear algebra. • Statistical analysis. Applied statistics provided the foundation for the statistical analysis section. • Periodicity analysis. Calculus allowed some understanding of how a Fast Fourier Transform works. Additionally, all the subjects related to programming and data structures were fun- damental. CHAPTER 1. INTRODUCTION 9 1.6 Work plan The following tables detail the work plan and the official deadlines. Dates are in YYYY- MM-DD format (ISO 8601): Table 1.1: Work plan table. Task Start date End date Duration [days] Project 2021-10-07 2022-07-06 272 Data preparation, filtering and processing* 2021-10-10 2022-02-07 120 Make LATEXtemplate, write introduction 2022-02-08 2022-02-22 14 Present preliminary results, discuss them 2022-02-22 2022-02-25 3 Periodicity analysis 2022-02-26 2022-03-10 12 Investigate and program prediction models 2022-03-10 2022-03-25 15 Present results, discuss them 2022-03-25 2022-03-26 1 Model refinement, tuning and discussion 2022-03-26 2022-03-30 5 Code refactor 2022-03-30 2022-03-31 1 Work on data representation** 2022-04-01 2022-04-15 14 Work on writing the main document for draft 2022-04-16 2022-04-25 9 Code refactor 2022-04-25 2022-05-02 7 Prepare for draft, resolve erratas 2022-05-02 2022-05-16 14 Submit draft 2022-05-17 2022-05-17 1 Improvements after draft feedback 2022-05-18 2022-05-29 11 Final submission 2022-05-30 2022-05-30 1 Prepare presentation 2022-05-30 2022-06-05 7 Presentations 2022-06-06 2022-06-09 4 *Low-amount of work expected to be done because of an overload of ECTS credits amount. **Downtime contemplated due to vacation period. Table 1.2: Official deadlines. Task Date Draft submission 2022-05-17 Publication of accepted/rejected submissions 2022-05-25 Final submission 2022-05-30 Public presentation of the projects 2022-06-06 Tables 1.1 and 1.2 offer a detailed overview of the initially planed work plan. A more simplified overview is shown below. The main parts of the project were the following: CHAPTER 1. INTRODUCTION 10 Table 1.3: Main project tasks. Manual data inspection Work plan elaboration Data preprocessing Data representation Statistical analysis Supervised models Periodicity study Write report Refinements Presentation 1.7 Software 1.7.1 Text editor software This document is written in LATEXusing the TeXstudio 4.2.2 editor. 1.7.2 Programming software All of the work was done in Python version 3.8.10 inside the PyCharm 2022.1 (Community Edition) IDE (Integrated Development Environment). PyCharm was chosen because it enables the common debugging features (breakpoints, watches, etc), DataFrame visualization with just one click and it is free to use. Python was selected for its numerous packages and the ease of use it provides for data analysis. A creation of a virtual environment is highly recommended 1. The following table details the Python packages used and their versions: Table 1.4: Python packages used. Package Version Motivation or description numpy 1.22.3 Ease of use of arrays and matrices, among other mathematical and statistical functions such as average, median, variance, etc. pandas 1.4.2 Data analysis and manipulation. Used to load, store and manipulate the data throughout the work. matplotlib 3.5.1 Graphs, graphical data representation. scipy 1.8.0 Fast Fourier Transforms. scikit-learn 1.0.2 Supervised Models seaborn 0.11.2 Correlation matrices. xlrd 2.0.1 Reading Excel files. 1.8 Repository The code used to process all the data is publicly and freely available on my GitHub repository: https://github.com/JuanTecedor/FloatingWindTurbinesDynamicsIdentificationPublic. 1See Appendix A. https://github.com/JuanTecedor/FloatingWindTurbinesDynamicsIdentificationPublic CHAPTER 1. INTRODUCTION 11 1.9 Structure of the project The following list details the contents of each chapter: • Chapter 1. Introduction. Offers an introduction to wind turbines and what floating wind turbines are, how they generate power, what affects the power gen- eration, what can they modify to alter the power generation and what types of turbines there are. The goals, work plan, repository and this structure of the project are also detailed. • Chapter 2. State of the Art. An overview of the current technology related to floating wind turbines is described. A breakdown of multiple fields of study related to floating wind turbines is also offered. • Chapter 3. Materials and Methods. This chapter introduces us to the ex- periment performed and the sources of the data. The different techniques used throughout the work are also discussed. • Chapter 4. Statistical Analysis. The processing of the data is started in this chapter, with the aid of several plots in order to study the type of data and its distribution by the use of statistical methods. • Chapter 5. Periodicity Analysis. The frequency analysis of the turbine metrics is detailed with several plots in this chapter along with the conclusions reached. • Chapter 6. Supervised Models. This is the most important chapter of the work. In this part, the supervised models are presented along with the reasons for their selection and tuning. The results and predictions from these models are shown also. • Chapter 7. Conclusions and Future work. This chapter concludes the work with an overview of the conclusions reached throughout the chapters and offers future extensions of the project in the Future work section. Additionally, an appendix is present: • Appendix A. Describes the importance of using a virtual environments and offers resources on how to create them. This is fundamental to reproduce and run the code that does the data processing, model training, Fourier analysis, etc. Finally, a bibliography with all the citations and references is located at the last pages of the work. Chapter 2 State Of The Art 2.1 Current floating wind turbines wind farms As we learned in the introduction, wind speed is stronger in the ocean than on land [23]. Floating wind turbines are, by themselves, an evolution of a offshore turbine, allowing the installation of turbines on deeper locations. We may ask ourselves, what types of offshore wind turbine farms are there? The current and projected wind farms average capacity is approximately 25MW, with some projected farms with as little as 1MW and up to 88MW [15]. To put this into perspective, the Spain’s peak electricity consumption is approximately in the 30GW range [51], equivalent to 1200 25MW wind farms or 15000 2MW wind turbines. As we can see, the power generated by a turbine is quite adequate but proportionally low compared to other sources. If we were to use only wind turbines, we would need thousands of them, with the footprint and economical costs associated with it. This one of the factors as to why the efficiency of a wind turbine is such a big focus. 2.2 Evolution of wind turbines power output From the previous section we made a prediction assuming the average wind turbine has a power output of 2MW. We may ask ourselves if the median turbine has increased in power throughout the decades. The following figure illustrates the average rated output of a wind turbine for a given year: 13 CHAPTER 2. STATE OF THE ART 14 Figure 2.1: Average rated output of a wind turbine for selected years [21]. From figure 2.1 we can clearly see the jump in rated output from the years 1990 to 2000, multiplying the output by 33 times. Later on, between the years 2000 and 2010, the output multiplied by 1.2 times and between the years 2010 and 2016 by 1.42 times, which is still a very good increase. The total increase in power from 1990 to 2016 is 56.96 times (2848kW / 50kW). The trend in average rated output from the year 2000 onward looks faster than linear but it will very likely slow down due to manufacturing limitations, transport logistics (after manufacture) and practical constraints. We can also conclude that modern turbines are more efficient and larger. These improvements in power are not obviously free, as the height of the turbine and the radius of its rotors is also increasing. Nonetheless, the most recent, larger turbines are more efficient. This graphic further verifies the equation 1.4 (from the introduction), where one of the determining factors in the power output of a turbine is the area and thus by increasing area swept by the blades, we increase the power outputted. CHAPTER 2. STATE OF THE ART 15 2.3 Types of platforms for floating wind turbines One of the most influential components on the turbine dynamics is the platform that supports it. Different designs trade off stability for cost or ease of manufacture. The following subsection gives an overview of the main types. There are at least four types of platforms that can support a turbine [23]: Figure 2.2: Four types of floating wind turbines. Adapted from [23]. • Barge This design maximizes the surface area in contact with the water. The platform is wider than its height, which increases stability. This type resembles the design philosophy of a ship and results in a low amount of draft 1. • Semi-Submersible This model minimizes the surface area that touches the water by maximizing volume. Due to manufacturing limitations, a sphere that would be the ideal shape, is not feasible. The platform is composed then by a set of cylinders or simpler shapes, placed a certain distance apart. The separation between these buoyant elements provides stability. • Spar This concept is designed around densities. A large mass is placed at the deepest part, while a hollow structure completes the platform up to the top. This density distribution makes the model float straight. This design has the most draft compared with the rest. Buoyancy is provided by the lightest materials while the dense mass at the bottom gives stability and prevents rotatory motions. • Tensioned Legs Platform (TLP) This structure is the most complex of the four and the newest development. The platform has excess buoyancy and has to be kept under the waterline by a set of cables that are connected to a set of weights at the seabed and the arms of the platform at the other end. The distribution and separation of the extremes of these arms stabilize the platform [10]. This design seeks to minimize manufacturing costs. 1The draft of a ship is the vertical distance between the waterline and the deepest point of the hull of the ship (the keel). CHAPTER 2. STATE OF THE ART 16 2.4 Types of floating wind turbines anchors Another component that influences the stability of the turbine and its behavior is the anchoring system. There are several studies being made on the types of anchors for the floating turbines, as well as their placement [29] or a even a shared anchor concept in order to reduce cost [16]. Depending on the seabed, there are at least four types of anchoring systems. The following section offers an overview of the state of the art systems currently [23]: • Dragging anchors This type is similar to a boat anchor, supporting tension in only one direction. • Suction buckets Only appropriate for sandy seabeds. The counter force is enabled by suctioning forces that keep the bucket in place. • Drilled piles Same principle than in fixed foundations. Large metal cilinders are forced into the seabed or drilled in the case of harder bottoms suck as rocks. • Gravity anchors A simple, massive and heavy mass (such as a concrete block) is deposited on the seabed. This type of anchor has a very large footprint but the lowest complexity in terms of installation procedures. 2.5 Fields of study 2.5.1 Overlap between floating wind turbines and offshore tur- bines As we explained in the introduction, floating turbines allow the placement of generators in places where constructing a foundation is no longer economically feasible. In the process, floating wind turbines additionally enable stronger winds thanks to the distance from the coast. However, thanks to recent developments, floating wind turbines can be placed in shallow waters. This is specially useful in shallow seabeds that allow the placement of a fixed-foundation turbine but prohibit it because of the type of seabed [23]. This advancement blurs the line between the usual depths were each turbine is normally placed, but allows more options and flexibility. 2.5.2 Overlap with the oil industry Before floating wind turbines designs were adopted, the oil industry had already devel- oped floating platforms that enabled them to extract oil at deeper seabeds. Not all the knowledge and designs can be transferred. This is due to the fact that the economical feasibility approach is very different for a single oil platform than for dozens of turbines in a wind farm. Nonetheless, when studying floating wind turbines, prior knowledge from the oil in- dustry can be extracted, specially in anchoring and platform studies [23] [10]. The oil CHAPTER 2. STATE OF THE ART 17 industry can therefore also be considered in some aspects state of the art in relation to platform and anchoring technology. 2.6 Current floating wind turbines study fields Several studies have been made about the interactions of the turbine with the medium. These are the main fields of study: • Platform design As we learned in previous sections, several types of platform are available and are being studied and have a significant impact on the turbine stability and thus, affect cost and efficiency. It is also imperative in these designs that the turbine does not flip or fall, i.e: it is safe and stable enough. • Mooring design (anchoring also in study) This field solves problems related to fixing the turbine anchors to the seabed. As we learned previously, the chosen design is heavily dependent on the seabed type. • Turbine control, strategies planning In order to achieve maximum power gen- eration and safe operation, some parameters of the turbine can be tweaked, such as the pitch angle. Studies in this area optimize these parameters. • Aerodynamic modeling This area studies the aerodynamic properties of the turbine, with a special focus on the shape of the blades. An efficient blade design, that extracts the most power is key to the wind generation. • Hydrodynamic modeling This area studies the influence of the hydrodynamic factors in the floating turbines, mainly the water. • Wind and wave modeling, forecasting This field focuses on the prediction of the wind and waves. With the information and models extracted from the studies we can find the ideal geographical spots for the turbines. Chapter 3 Materials and Methods 3.1 Introduction to the experimental setup The experimental data was measured using a scale model of the wind turbine that will be described shortly. The turbine was subjected to different wind speeds at various blade pitch angles. The wind speed was controlled by regulating the voltage in a wind generator and the blade pitch was changed by adjusting its own mechanism. The distance from the wind generator to the turbine was not changed during experiments in order to have consistent measurements. The experimental data was stored with the wind speed in relation to voltage. As voltage alone is not even a measure of power (watts could be much more informative for example) a transformation to a more useful metric is needed. Thankfully, the data included an equivalence. The following table shows the equivalence between voltages and wind speeds 1: Table 3.1: Equivalence between voltages and wind speeds. Voltage [V] Wind speed [ms-1] 130 8,5 140 9,3 150 10,1 160 11 170 11,6 180 12,4 190 13,1 200 13,8 The values from table 3.1 will be used later on to convert the voltages of the generator into wind speed at the turbine. 1Interestingly enough, the relation between voltage and wind speed is almost linear: wind = 0.07571× voltage− 1.34285. In any case, it is more precise to use wind speed. 19 CHAPTER 3. MATERIALS AND METHODS 20 3.2 Experiment setup Instead of having the turbine floating on a platform, a set of springs was placed around the square perimeter of the platform to support it and imitate a turbine floating on the water. This configuration should provide a similar behavior to a real floating platform. The turbine was placed inside a wind tunnel that had the set of winds from table 3.1 calibrated. The following figure illustrates the wind turbine setup that was inside the wind tunnel: Figure 3.1: Front view of the wind turbine with labeled axes. All axes are perpendicular. The Z axis is perpendicular to the ground plane, the Y axis is parallel to the axis of the turbine, pointing towards the back, and the X axis is pointing right, perpendicular to the Y and Z axes. Note some of the springs visible at the right of the photograph. On figure 3.1, a set of springs is visible at the right of the photograph, with blue circles along the contact points on the platform. On the left, next to the turbine, the Arduino micro-controller (a blue PCB2 with a black square chip on it and a USB type B port) can be seen. The Arduino is responsible of registering the measurements from the IMU 3 and sending them to a computer. Further to the left and to the back, with a green cable connecting to it, almost at the edge of the platform, a second blue PCB can partially be seen. This blue PCB is the IMU. The IMU is responsible of measuring acceleration, angular velocity and magnetic flux density. Note that the choice of all the axes representations and configurations is the same throughout all the work. This configuration is also the exact same than the one given by 2Printed Circuit Board. 3Inertial Measurement Unit, a type of sensor. Explained later more in depth. CHAPTER 3. MATERIALS AND METHODS 21 the sensor documentation and by the experiment setup. I.e.: The axes are labeled in such a way that they are in the same configuration that as sensor uses for its measurements. A simplified diagram of the turbine configuration is shown in figures 3.2 and 3.3: Figure 3.2: Front view of the turbine with labeled axes, simplified. Note the springs on the bottom of the platform. Figure 3.3: Top view of the turbine with labeled axes, simplified. CHAPTER 3. MATERIALS AND METHODS 22 The IMU is the sensor responsible of the measurements. It will later be detailed. For now, just know that it measures metrics in relations to a direction, and therefore we need axes. Two diagrams of IMU axes and its polarities are also shown, for illustration purposes and axis labeling: Figure 3.4: IMU with its axes. Figure 3.5: Polarity of rotation and orientation of the IMU axes. Adapted from [25]. CHAPTER 3. MATERIALS AND METHODS 23 Again, the turbine axes, diagram axes and IMU axes are all the same throughout the work. This on purpose to avoid confusion. Wind turbine components • Microcontroller: Arduino Mega. • Inertial Measurement Unit: MPU6050. • Electric motor: Brushless motor 22M-1000 GPMG4500. • Wind turbine blade dimensions: 10×2 centimeters. • Pitch angle range of the turbine blades: -30 to 30 degrees. • Platform dimensions: 16×24 centimeters. 3.3 Materials: Data from experiments A total of 48 simulations were carried out, each of which lasting 10 seconds. The 48 experiments correspond to the following combination of voltages: 130, 140, 150, 160, 170, 180, 190 and 200 [V] and the following blade pitch angles: -30, -20, -10, 10, 20 and 30 [degrees]. Each experiment lasted 10 seconds. Instantaneous measurements were taken at a frequency of 10Hz or every 100ms, for a total of 99 measurements. Each of these mea- surements consisted of a measurement of the acceleration [g]4, angular velocity [◦s−1] and magnetic flux density [µT ]5 for each axis (x, y, z). In total, 8 voltages (or wind speeds) were used, 6 pitch angles and 99 measurements were made, totaling 99 ∗ 8 ∗ 6 = 4752 rows of data. This, in combination with the three measurements made (acceleration, angular velocity and magnetic flux density), for every axis, equates to 4752 ∗ 3 ∗ 3 = 42768 samples. The following tables illustrate how the data looks like before any preprocessing, after it is loaded. Acc x is the acceleration for the x axis, Gyro y is the angular velocity for the y axis and Mag z is the magnetic flux density for the z axis, finally Time (0.1s) is the timestamp at which the row was stored (in tenths of a second). Two tables are shown, both have the same pitch angle but they have different wind speeds. Table 3.2: Sample of data at 130V (8.5ms-1) and 1◦. Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ] 1 -0,001062012 0,000982666 0,978607178 -0,002361081 -0,010571808 -0,010266725 -26,67796875 -6,1425 -9,7734375 2 -0,010552979 -0,00043335 1,0246521 0,041690331 0,010730982 -0,008038288 -26,16773438 -5,338125 -9,7734375 ... ... ... ... ... ... ... ... ... ... 99 -0,00824585 -0,018615723 1,048773193 0,004443608 -0,017681582 -0,026210657 -26,31351563 -5,630625 -9,7734375 41g = 9.8ms−2 5The data sheet from [25] did not specify units for the magnetometer. However, as the Earth’s magnetic field ranges from 25 to 65 µT and the data from the experiment was in this range, we will consider it to be in µT . CHAPTER 3. MATERIALS AND METHODS 24 Table 3.3: Sample of data at 200V (13.8ms-1) and 1◦. Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ] 1 -0.062823 0.010199 1.154712 0.018504 -0.092069 -0.017973 -25.803281 -5.630625 -8.085938 2 0.322821 0.267554 1.694971 0.024261 0.116064 0.197668 -26.386406 -4.680000 -7.875000 ... ... ... ... ... ... ... ... ... ... 99 -0.085931 -0.320795 0.890576 -0.100731 0.017615 -0.048097 -25.220156 -5.265000 -7.382812 Tables 3.4 and 3.5 show some statistical metrics from pandas.describe() function in order to visualize the ranges, maximums, minimums, etc of the data6: Table 3.4: Data at a wind speed of 8.5ms−1 and 1◦pitch angle. Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ] len Acc len Gyro len Magn count 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 mean 50.000000 0.001532 -0.025455 1.031984 -0.023991 -0.000034 0.002218 -26.532924 -5.578182 -9.600142 1.033129 0.089261 28.766090 std 28.722813 0.018780 0.037758 0.045014 0.075233 0.026225 0.055913 0.268019 0.355799 0.307522 0.045542 0.044851 0.278839 min 1.000000 -0.048370 -0.131812 0.938586 -0.185862 -0.062211 -0.144265 -27.188203 -6.435000 -10.546875 0.940113 0.014925 28.027332 25% 25.500000 -0.009393 -0.048160 1.003128 -0.075707 -0.016375 -0.030296 -26.750859 -5.850000 -9.773438 1.003812 0.055962 28.603734 50% 50.000000 0.001868 -0.020959 1.034302 -0.018928 -0.001645 -0.004550 -26.532187 -5.557500 -9.562500 1.035235 0.086649 28.745159 75% 74.500000 0.014850 0.000546 1.060159 0.026682 0.016899 0.039144 -26.386406 -5.338125 -9.421875 1.062211 0.110022 28.952389 max 99.000000 0.044537 0.061096 1.144220 0.127100 0.063192 0.124381 -25.657500 -3.948750 -8.789062 1.151195 0.226626 29.424713 Table 3.5: Data at a wind speed of 13.8ms−1 and 1◦pitch angle. Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ] len Acc len Gyro len Magn count 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 mean 50.000000 0.027833 -0.030619 1.100040 -0.041565 0.004803 0.001589 -25.751742 -5.556761 -8.289062 1.113743 0.159670 27.623939 std 28.722813 0.093319 0.140280 0.263003 0.105575 0.062644 0.114191 0.316227 0.396620 0.426263 0.261994 0.064237 0.303083 min 1.000000 -0.203320 -0.346960 0.684369 -0.306463 -0.131929 -0.215256 -26.459297 -7.458750 -10.195312 0.698817 0.038939 26.800792 25% 25.500000 -0.031522 -0.118387 0.926773 -0.125084 -0.040072 -0.089940 -25.985508 -5.703750 -8.507812 0.941567 0.115168 27.418520 50% 50.000000 0.022394 -0.037952 1.013049 -0.046399 0.009802 0.004165 -25.803281 -5.557500 -8.296875 1.025440 0.152568 27.638048 75% 74.500000 0.078992 0.059320 1.198538 0.037200 0.051533 0.079461 -25.548164 -5.338125 -8.015625 1.208403 0.194613 27.837288 max 99.000000 0.322821 0.334290 1.694971 0.210176 0.129700 0.341136 -24.782813 -4.680000 -7.101562 1.746060 0.347568 28.295480 From tables 3.4 and 3.5 note count = 99.0 (the number of measurements). The std rows also gives an idea of the standard deviation of each column of the data. We can also see the mean, max and min changing with the two wind speeds. The 25%, 50% and 75% rows are percentiles. As expected, the 50% percentile from the acceleration on the z axis is approximately 1, this is due to the gravity. Finally, we will highlight that overall the mean acceleration and the angular velocities increased from table 3.4 to table 3.5. The experimental data was stored in 6 files with the .xls extension (Microsoft Excel), with the filenames corresponding to the angles, and within each file at least 8 sheets were present with the corresponding voltages. Thankfully, pandas and/or python has a package xlrd that can open excel files and navigate excel sheets (or more precisely .xls files). 3.4 Methods 3.4.1 Machine Learning There are several definitions of what machine learning is: 6len(⃗a) = ||⃗a|| refers to length, magnitude or norm. Not the dimension. For example, by len Acc we mean the length, magnitude or norm of the acceleration vector. CHAPTER 3. MATERIALS AND METHODS 25 ”Machine learning is the field of study that gives computers the ability to learn without being explicitly programmed.” [49]. This quote from Arthur Samuel is from 1959, some consider it outdated and informal. Quoting a more modern definition from Tom Mitchell: ”A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” [6] [27]. In this case, the experience E is the data set of the measurements of the turbine, the tasks T are to predict the outputs and the measure P is how close it can predict the outputs compared to the real experiments. 3.4.2 Machine learning workflow The common machine learning workflow can be split in to following two phases [47]: 1. Transform the data in such a way that it is in the correct format for the machine learning algorithm. 2. Select the machine learning algorithms that best fit the problem that is trying to be solved and tune its parameters. Part 1 can be further split into three parts: • Data preprocessing. Process and perform the necessary transformations in such a way that the data is prepared to be automatically processed by a program or an application. • Feature extraction. Process the data in order to extract useful feature candidates to further use down the line. • Feature selection. Discard variables that are redundant, for example having the following two attributes in the same data set: size-f [feet] and size-m [meters] is detrimental. One of these two variables is redundant, and should be removed. Other transformations could be handling noise and missing values or creating new attributes by combining others. This phase is very determinant on the performance of our model. If the inputs and outputs are not correlated, our models will hopelessly fail 7. Finally, part 2 can be split into two parts: • Algorithm selection. The selection part involves the use of the programmer’s knowledge in order to apply the correct algorithm to the correct problem type. For example, there are algorithms that need large data sets in order to start to converge, and will fail with small amounts of data. The programmer is responsible for selecting the correct model and having the knowledge to use it. 7Or in a more informal way: ”garbage in, garbage out”, if the input data is nonsense or contains no information, the output data will also contain no information or it will be nonsensical. CHAPTER 3. MATERIALS AND METHODS 26 • Parameter tuning or configuration. This final part is done by re-running the algorithm and also tweaking it manually (with the aid of cross-validation8). Some models have parameters that change the learning rate and it sometimes has to be manually changed as the defaults do not work. This is a process of trial and error. 3.4.3 Areas of machine learning In the following subsection, the three main areas of the machine learning field are de- scribed. In the case of this work, only supervised learning is relevant, but there are many more: • Supervised learning. In this area, the type of problem allows the training data to have examples comprised of pairs of inputs and outputs. This means that the data set shows the correct output for a given output. The relation between the inputs and the outputs is inferred by the algorithm. • Unsupervised learning. In unsupervised problems the data only contains the inputs, and the algorithm must find patterns and structure in the data by, for example, grouping it. • Reinforcement learning. In this field, the algorithm learns by selecting the actions that maximize the notion of a cumulative reward. It is inspired in the behavioral psychology. A more in depth introduction to the supervised models workflow will be given below. Formulation of the Machine Learning problem We will now define the naming for the different elements that conform the model: • Examples and features: Let n,m ∈ N, n = number of features and m = number of examples. • Input variables: This is a matrix Xm×n ∈ Rm×n. Each row of this matrix is an example, a vector x⃗(i) ∈ Rn that has n features. • Output variable or target space: Ym×1 ∈ Rm×1 or more clearly y⃗ ∈ Rm. In order to find a relation, y⃗(i) should be dependent on x⃗(i). These are the known outputs for an input X. • Model parameters: θ the dimensions of this matrix is dependent on the concrete model used, it can be a vector or a matrix. • Hypothesis function: h(θ) This function, given parameters θ and an input vector x⃗(i), tries to make the closest guess to the correct output. The concrete implemen- tation is dependent on the model. Forr example, in linear regression the hypothesis function could be the straight line equation h(θ) = θ1x+ θ0. • Hypothesis h(x(ij)): An instantiation of the hypothesis function. 8Cross-validation gives us an idea on how well the model generalizes, we will look at it later on. CHAPTER 3. MATERIALS AND METHODS 27 • Data set D ∈ Rm×(n+1) this is the data set that contains all the data plus an extra column of ones. This column of ones is the bias term. It allows the algorithm to have a basis of bias from which to start from. Recalling from the example of the hypothesis function h(θ) = θ1x+ θ0 it allows us to have the θ0 term instead of only having h(θ) = θ1x. In this work, n = 2, corresponding with the wind speed and the angle target variables. In order to measure the accuracy of the hypotheses, a cost function is introduced. One example of this cost function could be: J(θ) = 1 2m ∑m i=1 (h(xi)− yi) 2 [4]. This formula is very similar to the mean squared error (MSE). By minimizing J(θ) we can improve the performance of the algorithm, because we are reducing the distance (and thus the error) between the predictions (hypotheses) and the target values. 3.4.4 Feature scaling The ranges of our features should ideally be similar. Ideally: −0.5 ≤ x⃗(i) ≤ 0.5 [5]. This requirement makes the minimization of the cost function much faster and additionally reduces cumulative errors when multiplying large floating-point numbers. Floating-point numbers are not appropriate for large numbers, the larger the number, the less precise the value is. One way of scaling data is using the Standard Scaler [36]: z = x− µ σ (3.1) Where z is the new value, x is the value that is scaled, µ is the mean and σ is the standard deviation. This scaler is appropriate for features distributed in a standard normal distribution. 3.4.5 Supervised learners A total of 6 machine learning models were used: Linear regression [39], Linear with polynomial features [39] [45], Ridge [40], Huber [38], Gaussian [37] and a MLP Regressor 9 [43] [36]. 3.4.6 Linear Models Linear Regressor Linear regression is the simplest regressor. In this case the model minimizes the residual sum of squares between the observed targets in the data [39]. It assumes a linear relation between the input values (X) and the output values (Y). y = ax+ b (3.2) The model will find the best a and b for equation 3.2 that best fits the data. a and b are vectors. Equation 3.3 can be used to make an hypothesis function: 9An MLP Regressor is a type of Neural Network. CHAPTER 3. MATERIALS AND METHODS 28 h(θ) = θ1x+ θ0 (3.3) Please note that equations 3.2 and 3.3 are just examples and may not be appropriate for the dimensions of the data. Polynomial Regressor Sometimes the data will not have a linear correlation. In these cases, polynomial regres- sion is extremely useful. One example of a polynomial regression could be the following: h(θ) = θ1x+ θ2x 2 + θ3x 3 + θ0 (3.4) In Python, Sklearn facilitates the use of polynomial regression with the Polynomi- alFeatures class [45]: Calling PolynomialFeatures with degree = 2, sklearn returns: [1, a, b, a2, ab, b2]. This added complexity has however some trade offs that will be discussed later. Ridge Regressor Ridge regression or Tikhonov regularization is useful when the independent variables are highly correlated [22]. It addresses some of the problems of the ordinary least squares used by linear regression by imposing a penalty on the size of the coefficients (l2 regularization) [40]. It minimizes the following objective function [40]: ||y − θx||22 + α||θ||22 (3.5) Where α is the regularization term, θ are the weights, x is the training data and y is the target values. This model was chosen for comparison of performance with the previous two in case the variables were highly correlated. Huber Regressor This is a linear regression model that is robust to outliers by making sure that the loss function is not heavily influenced by the outliers while not completely ignoring them [38]. It optimizes the squared loss for the samples where [38]:∣∣∣∣y − θx σ ∣∣∣∣ < ϵ (3.6) and the absolute loss for the samples where [38]:∣∣∣∣y − θx σ ∣∣∣∣ > ϵ (3.7) CHAPTER 3. MATERIALS AND METHODS 29 Where σ and θ are the parameters to be optimized. σ makes sure that if y is scaled, ϵ does not need to be rescaled too (this archives the same robustness) [38], θ is the hypothesis, x is the training data and y is the target values. This model was again chosen for a comparison with the mentioned linear models and because of its robustness with outliers, in order to know if the outliers were influencing the models. Gaussian Process Regressor This is a non-parametric bayesian regressor [37] [52] [48]. The implementation of the equation is based on the algorithm 2.1 of Gaussian Pro- cesses for Machine Learning [53], page 19. In contrast to the previous learners, it makes probabilistic predictions, i.e: it infers a probability distribution from the data. This model was chosen because of its different nature from the previous models, in the hope of observing different results. Neural Network (MLP Regressor) This model optimizes the squared error using gradient descent of a neural network [43]. The number of hidden layers can be customized. In this case, the solver lbfgs (op- timizer in the family of quasi-Newton methods) will be used because it converges faster and performs better for smaller datasets [43]. Structure of a Neural Network A Neural Network is composed of a set of input, hidden and output nodes. In the case of this work the input nodes could be the wind speed and angle and the output nodes could be the acceleration. The following figure represents an example neural network: CHAPTER 3. MATERIALS AND METHODS 30 Figure 3.6: Neural network example. Illustration made thanks to NN-SVG [3]. Note the 2 input nodes, the two hidden layers with 4 and 8 nodes and the output layer with three nodes. The number of nodes in each layer is extremely important as it determines the di- mensions of the matrices that will be used inside the neural network in order to make the calculations. It was chosen because it is very different from the rest of the models and could have better results. 3.4.7 Evaluation of model performance Normally, 80% of the data set is used for training the models (train set). The 20% of the data set remaining is the test set and it is piped into the sklearn.pipeline.Pipeline.score function [44] in order to find out the performance of the model. This function returns a float with the score of the model. Note that depending on the model (for example R2 score) it can be negative, i.e: score ∈ (−∞, 1]. It is not possible to say what a good score is in general. For example: A neural network is getting a score of 95% when trying to predict if a train is late. This score looks great. However, the neural network is always predicting ”no” and therefore guessing correct most of the time just by saying always ”no”, because the trains are rarely late (it is also probably ignoring the input data). There are several techniques and different scoring metrics in order to avoid these pitfalls. For this work, however we will start considering scores at and above 50 (≥50%) and consider scores excellent when over or at 90 (≥90%). In order to avoid the previously mentioned pitfall (the one with the train delay esti- mation) and the difficulty when picking a good score, we will also use distances and error. For example, if the predicted value is 10 and the real value is 10.8 we could calculate the error using the following formula: CHAPTER 3. MATERIALS AND METHODS 31 E = |r − p| |p| × 100 = |10.8− 10| |10| × 100 = 8% (3.8) Where r is the real value and p is the prediction. Again, we cannot fix an overall acceptable or good error at this point in the work. However, we will consider a ≈5% error to be acceptable and ≤1% to be excellent. Models with errors ≫5% will be considered useless. 3.4.8 Ovefitting in Machine Learning The introduction of non-linear, more complex equations has its disadvantages. The main one is that we can make a model ”memorize” the data. Figure 3.7: Overfitted data [18]. In figure 3.7 we can see two lines: in black, a straight line that could represent a traditional linear regression and in blue a curved line that represents a polynomial of 10 degrees that has been made to fit perfectly the data. The first disadvantage is the complexity introduced by a 10 degree polynomial, leading to increased run time, due to having to calculate far more parameters compared to the linear model. However, the most worrying problem is the following: The blue line has an almost perfect fit, far more precise than the black line. However, this model will fall apart once new points are introduced, i.e.: it will fail to generalize, it will probably be much more imprecise than the linear counterpart once enough new points are introduced. There are several ways of managing this problem, one being examining the models manually or penalizing each time the model increases complexity. Nevertheless, in this work the main way used was dividing the data into two sets. We will use approximately 80% of the data to train our models, and the 20% left will be used to score how good or how bad the models are. The data of this sets will also be shuffled in order to avoid repeatedly creating a biased subset. 3.4.9 Hyperparameter tuning Hyperparameter tuning is the selection of the optimal parameters for the models. One example could be choosing the best degree of the polynomial in a polynomial regressor CHAPTER 3. MATERIALS AND METHODS 32 model. In order to pick the best parameters, an assortment of parameter combinations was manually made and piped into sk-learn’s GridSearchCV [42] [34]. The GridSearchCV model selector has a method that returns the best estimator. After some manual checks (to make sure that the parameters make sense), one of the best estimators was chosen for each model. 3.4.10 Pipelining All of the aforementioned subsections detail a complex workflow. Fortunately, sk-learn offers us a Pipe that can be set up for a group of models or only one model [35]. The concrete pipeline usage will is detailed in the Supervised Models section, but a general pipe example is offered (adapted from [35]): 1. Scaler StandardScaler() 2. Classifier or classifiers PCA(), LogisticRegression() After the pipe setup, only a call to fit and score or the necessary methods is needed instead of several individual calls and parameter passings for each component. 3.4.11 Periodicity Study Another attribute studied in this work is the periodicity of the turbine. The objective of this part is to find out if the turbine is vibrating in some predictable way. And if it is vibrating, how the frequency might change for different wind speeds or pitch angles. The problem is therefore to extract frequencies from a data set that consists of mea- surements made at a certain frequency. This is enabled by Fourier analysis. Fourier analysis Fourier analysis studies the decomposition of a function into the sum of several, simpler trigonometric functions. This process itself is called a Fourier Transform. One example of an application of a Fourier Transform is to detect and remove high frequencies from a recording that may be irrelevant or distracting. The Fourier Transform will be able to, from the signal with all the mixed sounds, detect the peak in the high frequency, allowing for it to be removed. In order to use Fourier Analysis, the data must be equally spaced. Some approaches exist for unequally spaced data but it is outside of the scope of this work. Fast Fourier Transform A Fast Fourier Transform is an algorithm that computes the Discrete Fourier Transform of a sequence [1]. The ”Fast” keyword in FFT is a variant of the algorithm that reduces the run time complexity from O(N2) in a Discrete Fourier Transform (DFT) to O(N logN) [1]. CHAPTER 3. MATERIALS AND METHODS 33 Fast Fourier Transform output representation The output from the FFT is ultimately a representation in the frequency domain. Figure 3.8 shows an example: Figure 3.8: FFT of a Cosine Summation Function resonating at 10, 20, 30, 40, and 50 Hz. [2]. Note from the bottom graph of figure 3.8 the peaks at x = 10, x = 20, x = 30, x = 40 and x = 50; the FFT has successfully detected the individual frequencies that where mixed in the top graph. An FFT analysis will be performed on all the metrics in the data later on, in order to find out if the turbine is vibrating in a predictable manner. 3.4.12 Data organization The data was distributed in several excel files and inside them, in different sheets. The first step in the program was to load the data using Pandas ’s read excel function. This function returns a DataFrame which is similar to an SQL table. In order to organize the 48 tables (8 wind speeds and 6 angles), each DataFrame was placed inside a Python dictionary that contained another dictionary. The structure of the nested dictionaries was the following: data = (key = angle : value = dict1) (3.9) dict1 = (key = wind speed : value = DataFrame) (3.10) CHAPTER 3. MATERIALS AND METHODS 34 DataFrame from equation 3.10 would contain the data from the turbine at an angle of angle and a wind speed of wind speed. This type of data structure allowed O(1) access time (because it is a Python dictio- nary) to the DataFrames while it also preserved an user-friendly access method. I.e: It is very easy to request the data from the turbine for a given angle and wind speed. Further down the work, the supervised models required matrices of data without a breakdown by angle and wind speed. As the data structure described in the previous paragraphs did not prove to be ap- propriate, the new data structure that was adopted combined all the preprocessed data in a single DataFrame. The DataFrame will be detailed in the following chapters. In any case for the purposes of this section the following table is shown detailing the columns: Table 3.6: Grouped data set columns. angle windspeed median Acc x mean Acc x var Acc x ptp Acc x amax Acc x amin Acc x std Acc x . . . min abs len Magn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The data from table 3.6 can be easily accessed (again in O(1) access time): columns can be selected in order to use them as input or outputs for the supervised models and this columns can also be subdivided in order to divide the data set into test and train sets. We will also highlight that table 3.6 has 48 rows (recall from the previous paragraphs 8 wind speeds and 6 angles) and 122 columns (the column count will be explained later on). Chapter 4 Statistical Analysis 4.1 Data visualization The first step made in the work after loading the data was to visualize the data. All the plots were generated in Python using Matplotlib. In any statistical study it is always recommended to visualize the data as very different data sets can have the same statistical metrics. The following figures show a graph of two 10 second experiments at the same angle of 10◦. One graph is for a wind speed of 8.5ms-1 and the next for 13.8ms-1: Figure 4.1: 2D plots of the data corresponding to a wind speed of 8.5ms-1 and a pitch angle of 10◦. 35 CHAPTER 4. STATISTICAL ANALYSIS 36 Figure 4.2: 2D plots of the data corresponding to a wind speed of 13.8ms-1 and a pitch angle of 10◦. From figure 4.1 we can see some extreme values that might suggest the presence of outliers, specially in the magnetometer (”Magn”) column. The most obvious one being in the ”Magn y” subplot. In figure 4.2 we point out again extreme values, the most remarkable one in the ”Magn z” subplot. Not much more information from these graphs can be extracted visually. We can however see that in graph 4.2 compared to the graph 4.1 the range of values for every subplot except some (like in Magn x or Magn z) has increased greatly. This suggests that wind speed is affecting the variables and is correlated with them. Later on in the work, it was concluded that the outlier removal was not necessary, and that in fact it might remove useful information and therefore it may be harmful. This conclusion was reached after the supervised models performed very good without the possible outliers removal, and after the normality analysis of the data, discussed later on the work. From these graphs not much information can be extracted. We can however conclude that there are no missing or erroneous values in the data set. The existence or absence of outliers is not clear but it is unimportant. CHAPTER 4. STATISTICAL ANALYSIS 37 4.2 Distribution study 4.2.1 Histograms In order to be able to use parametric methods we first have to observe the distribution of the data. Parametric statistics assume that the data is modeled by a probability distribution. If the data is normally distributed, some models will work much better than others. First, a set of histograms was obtained. The there are four histograms, allowing the comparison of two different angles, and for the same angle, two wind speeds: Figure 4.3: Histogram of the data corresponding to a wind speed of 8.5ms-1 and a pitch angle of 10◦. CHAPTER 4. STATISTICAL ANALYSIS 38 Figure 4.4: Histogram of the data corresponding to a wind speed of 8.5ms-1 and a pitch angle of 10◦. Figure 4.5: Histogram of the data corresponding to a wind speed of 13.8ms-1 and a pitch angle of 30◦. CHAPTER 4. STATISTICAL ANALYSIS 39 Figure 4.6: Histogram of the data corresponding to a wind speed of 13.8ms-1 and a pitch angle of 30◦. From figures 4.3, 4.4, 4.5 and 4.6 it can be seen that most of the graphs are comparable to a normal distribution. Examples of a normal distribution shape are: from figure 4.3 (Acc x, Acc y), figure 4.4 (Gyro x, Gyro y), figure 4.5 (Gyro x, Acc y) and from figure 4.6 (Gyro x, Gyro y). Examples of a dubious (or not) normal distribution shape are: from figure 4.3 (Magn y, Magn z), figure 4.4 (Acc x, Acc y), figure 4.5 (Gyro z, Magn y, Magn z) and from figure 4.6 (Acc y, Acc z). 4.2.2 Examples of normal distributions Figure 4.7 shows several normal distributions with different means and variances (and therefore shapes): CHAPTER 4. STATISTICAL ANALYSIS 40 Figure 4.7: Examples of a normal distributions with different means and variances [26]. 4.2.3 Normality test A more formal test of normality is needed. The Shapiro-Wilk test for normality presents the null hypothesis that the data came from a normally distributed population. The chance of rejecting the null hypothesis when it is true is close to 5% regardless of sample size [17] [12]. The test was run with α = 5% (alpha level). The results were: passed = 310, failed = 122, total = 432 with a pass rate of 71.76% (i.e.: the null hypothesis was rejected 28.24% of the times). We therefore conclude that the data is mostly normally distributed, and that para- metric statistical methods can be used. Additionally, when scaling the data later on, this normality should be considered when selecting the appropriate scaler. 4.3 Creating new variables Recalling from table 3.2 the data set columns were the following: Table 4.1: Data set columns. Time (0.1s) Acc x Acc y Acc z Gyro x Gyro y Gyro z Mag x Mag y Mag z At this point in the work it was concluded that adding three new columns with the modulus of each vector could be relevant to the study and could add new information that could be used by the models. The modulus was calculated by taking square root of the sum of the squares of the CHAPTER 4. STATISTICAL ANALYSIS 41 components of the vector (the standard way of calculating a vector length). Assuming a⃗ ∈ R3: len(a) = |⃗a| = √ a21 + a22 + a23 (4.1) For example, for the magnetometer vector we would have (using values from 4.2): len(Mag) = |m⃗| = √ Mag2x +Mag2y +Mag2z = √ −26.6772 +−6.1422 +−9.7732 = 29.068 (4.2) Table 4.2 illustrates the new data set with the added length vectors on the right: Table 4.2: Data set columns with length vectors. Time [0.1s] Acc x [g] Acc y [g] Acc z [g] Gyro x [◦s−1] Gyro y [◦s−1] Gyro z [◦s−1] Mag x [µT ] Mag y [µT ] Mag z [µT ] len Acc [g] len Gyro [◦s−1] len Mag [µT ] 1 -0.001 0.001 0.978 -0.002 -0.0105 -0.010 -26.677 -6.142 -9.773 0.978 0.014 29.068 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 Statistical functions One goal of this work is, given a blade pitch angle α and a wind speed υ, determine a certain numerical output of some metric. For example: given α = 10◦ and υ = 20ms−1 what is the expected length of the acceleration vector? Answering this question allows us to model the turbine and to predict new values that might have not been in the training data. The reasoning for adding these variables is to be able to predict an output for a set of inputs. It is not possible to give an instantaneous value of the acceleration for a given wind speed an angle for example. It is however much more interpretable and useful to give the range or the maximum acceleration for example. The outputs chosen were statistical measures of the columns in the table 4.2. The statistical functions applied to the columns where: Table 4.3: Statistical functions. median mean var ptp max min std max abs min abs median abs With median being the value at the central position after ordering the data. Max is the maximum and min is the minimum. Mean (arithmetic mean): average = x̄ = 1 n n∑ i=1 xi (4.3) CHAPTER 4. STATISTICAL ANALYSIS 42 Ptp being the range: ptp = range = max−min (4.4) Variance (µ is the mean): V ar(X) = σ = E[X − µ]2 = Cov(X,X) = E[X2]− E[X]2 (4.5) Std is the standard deviation: std = standard deviation = √ V ar(X) = √ σ (4.6) Max abs, min abs, median abs and average abs: max abs(x) = max(abs(x)) (4.7) min abs(x) = min(abs(x)) (4.8) median abs(x) = median(abs(x)) (4.9) The metrics max abs, min abs and median abs were introduced manually (they are not defined by numpy) as simplified metrics that may be easier to predict. With these values, a new data set is produced, from now on, grouped data. These are the columns from the new data set: Table 4.4: Grouped data set columns. angle windspeed median Acc x mean Acc x var Acc x ptp Acc x amax Acc x amin Acc x std Acc x . . . min abs len Magn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Data set from table 4.4 has 48 rows (6 angles × 8 wind speeds) and 122 columns (2 (angle and wind speed) + 10 × 12 (number of metrics from table 4.3 × the original columns excluding angle and wind speed) for a total of 5.856 cells. It a subset of the table will be passed as input to the supervised models later on. 4.5 3D Plots With the new data set from the table 4.1 and the statistical functions in table 4.3 3D plots are obtained, the three axis are: angle, wind speed and statistical function. Some relevant plots are shown: CHAPTER 4. STATISTICAL ANALYSIS 43 Figure 4.8: Median of the magnetometer measurements in the z axis. Figure 4.9: Standard deviation of the acceleration in the z axis. CHAPTER 4. STATISTICAL ANALYSIS 44 Figure 4.10: Standard deviation of the angular velocity in the y axis. Figure 4.11: Median of the magnetometer measurements in the z axis. CHAPTER 4. STATISTICAL ANALYSIS 45 All the figures 4.8, 4.9, 4.10 and 4.11 all suggest two observations. First of all, the data can be clearly approximated by a plane (or even better, a curve), this leads up to the second observation; with an increase in wind speed, the measured metric increases also (there is a direct correlation). Figures 4.8, 4.10 and 4.11 also suggest that a higher angle leads to a higher measurement (this is not as clear as the first claim). On the figure 4.9 it appears that at the ranges between [-10, 10] degrees there are higher values compared to the values outside that range (it looks like an inverted U shape if we look at the figure from the right point of view). There are also examples of inverted observations: Figure 4.12: Median absolute value of the magnetometer in the z axis. CHAPTER 4. STATISTICAL ANALYSIS 46 Figure 4.13: Median value of the modulus of the magnetometer vector in the z axis. It is clear that figures 4.12 and 4.13 can be approximated again by a plane. One important difference is that this time the correlation is inverted (inverse correlation); the higher the wind speed, the lower the metric measured. Finally, other graphs did not look predictable (they seem random): CHAPTER 4. STATISTICAL ANALYSIS 47 Figure 4.14: Median absolute value of the measurements of the magnetometer in the y axis. Figure 4.15: Median angular velocity measurements in the z axis. CHAPTER 4. STATISTICAL ANALYSIS 48 With these graphs we can get a graphical representation of how the metrics change with different wind speeds and angles. We can also see how fast they seem to scale, which in most of the examples looked linear (and by extension can be approximated by a plane). In contrast, other graphs look random and the metrics associated with them will be thus very hard to predict. 4.6 Correlation matrix Recalling the grouped data table: Table 4.5: Grouped data set columns. angle windspeed median Acc x mean Acc x var Acc x ptp Acc x amax Acc x amin Acc x std Acc x . . . min abs len Magn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Using the data from table 4.5 we create a correlation matrix (discarding angle and wind speed). Matrix from figure 4.16 is very dense because of the 120 columns from the data set. Nonetheless, we will try to comment the matrix: CHAPTER 4. STATISTICAL ANALYSIS 49 Figure 4.16: Average modulus of the magnetometer vector. CHAPTER 4. STATISTICAL ANALYSIS 50 Figure 4.17: Average modulus of the magnetometer vector. CHAPTER 4. STATISTICAL ANALYSIS 51 From figure 4.17 we highlight two areas: 1. Red: We can see several inverse correlations: std Gyro y - median abs Magn z, median abs Gyro y - median abs Magn z, etc. This is expected as the changes in angle will also affect the magnetometer components. 2. Purple: We can see some direct correlations: median abs len Acc - median Acc z, max abs len Acc - amax Acc z, etc. This is expected, as the modulus of the vector and the median should be correlated, specially with the z axis that is the one with the larger values of the three. Overall, the matrix is not very informative as there is a lot of redundancy on it. For example: maximum and minimum will be correlated with the range, as it is a linear combination of the two. Another example: All the metrics of the acceleration in an axis will be correlated with the other metrics of the acceleration in the same axis. There were however, more interesting parts of the figure, as the ones described in the red area, but overall, not very informative. Chapter 5 Periodicity analysis A periodicity analysis was performed. The aim of this analysis was to try to find if the turbine vibrates at a certain frequency. This analysis cannot be easily done if we were to remove the possible outliers, because these values are erased from the data set, resulting in gaps in the measurements. The goals from this section are: find out if any of the metrics of the turbine present a periodic behavior, if they do, obtain the frequency and figure out how the frequency changes with different wind speeds and angles. Additionally, interpret the findings in order to know if they make sense. In order to extract the frequency from the data, a Fast Fourier Transform (FFT) was used. The Python library used was Scipy 1.8.0 with the scipy.fftpack module and the fft and fftfreq functions. 5.1 Test setup The following list describes the configuration of the periodicity analysis: • The experiment lasts 10 seconds. • The number of samples is 99 for each metric. • Each measurement (sample) is spaced by 100ms gaps (the sampling rate is 10Hz). • Before the analysis the FFT library will be tested with a test function to confirm correct usage. • The analysis will be made for each column of the data set for a given angle and wind speed. For example: average Acc z for a wind speed of 8.5 and an angle of 1◦. • The mirrored output from the algorithm, on the negative x range, is ignored and not shown. 5.2 Testing the library First, a test on the function was made to make sure that the libraries were working and that they were being used correctly: 53 CHAPTER 5. PERIODICITY ANALYSIS 54 Figure 5.1: FFT Test with a sin function at 2Hz, with a clear peak at x = 2. The graph from figure 5.1 clearly represents a peak at 2Hz. The FFT has successfully detected the only frequency at 2Hz. The same workflow will be used in the real analysis to make sure that the data is correctly passed to the function. 5.3 Plotting the data set Fast Fourier Transforms 5.3.1 DC Component Most of the graphs showed a peak at x = 0, which is a probable side-effect of a DC component. I.e.: The FFT is detecting the median of the data and plotting it at x = 0. If we take a look at table 5.1, the mean of the len Magn column (the mean of the modulus of the Magn vector) is 28.56. If we multiply this value by 2 we obtain 57.12 which is roughly equal to the value at x = 0 on figure 5.2: Table 5.1: Data statistics at a wind speed of 8.5ms−1 and 30◦pitch angle. Metric Time (0.1s) Acc x Acc y Acc z Gyro x Gyro y Gyro z Magn x Magn y Magn z len Acc len Gyro len Magn count 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 99.000000 mean 50.000000 0.001525 -0.021226 1.033122 -0.028354 0.001044 -0.000453 -26.313516 -5.573011 -9.607244 1.033699 0.089341 28.564459 std 28.722813 0.011927 0.024573 0.060953 0.081943 0.025032 0.041457 0.289135 0.286139 0.298388 0.060954 0.042546 0.288391 min 1.000000 -0.028040 -0.079266 0.801373 -0.187839 -0.061255 -0.100001 -26.969531 -6.288750 -10.476562 0.801555 0.009376 27.976938 25% 25.500000 -0.005765 -0.040277 1.014185 -0.094118 -0.016368 -0.029460 -26.532187 -5.776875 -9.843750 1.014505 0.058281 28.377556 50% 50.000000 0.002338 -0.020795 1.031024 -0.031516 0.003900 -0.001884 -26.313516 -5.557500 -9.562500 1.031553 0.084312 28.540237 75% 74.500000 0.009515 -0.004721 1.046768 0.034030 0.017615 0.027431 -26.167734 -5.411250 -9.421875 1.047564 0.118908 28.766974 max 99.000000 0.032532 0.034198 1.258600 0.151109 0.050737 0.093170 -25.584609 -4.826250 -8.648438 1.259381 0.191606 29.395060 CHAPTER 5. PERIODICITY ANALYSIS 55 Figure 5.2: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. This observation: y0 ≈ 2×median(column), holds for most if not all the graphs. Please note that in figure 5.2 there is information on the y axis in the r = (0, 5] range. However, it is shadowed by the value at x = 0 (y0), because it is much, much, larger than the values of the r range (ya, a ∈ r). Matplotlib adapts the axes to the range of the data. The value y0 ≈ 57 (at x = 0, x /∈ r) is the maximum value in the data and in the y axis range, while the rest of the data is at maximum y ≈ 0.15. This will be more clear in graph 5.3. In order to avoid the distortion on the y axis, we can set the y value at x = 0 to 0. We will not destroy any information as we already know that it is approximately equal to two times the median of the data: CHAPTER 5. PERIODICITY ANALYSIS 56 Figure 5.3: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. DC component removed. If we compare figure 5.2 with 5.3 we are now able to see the information in the r range. The input data is the same, but the peak at x = 0 has been removed. Another option would have been to subtract the mean of the column to each row of the data. The removal of the peak was chosen for its simplicity and to avoid destroying information in the input data. We will also note that in other graphs the removal at x = 0 it is not necessary. In any case from now on, all the graphs will have the x = 0 preprocessed unless explicitly mentioned otherwise. In the following graphs we can see that it is not necessary to remove the value at x = 0: CHAPTER 5. PERIODICITY ANALYSIS 57 Figure 5.4: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. Figure 5.5: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of y. DC component removed. CHAPTER 5. PERIODICITY ANALYSIS 58 5.3.2 Comparison of the FFTs at different wind speeds In this subsection we will explore the possibility of comparing the FFTs at the same angle but different wind speeds, in order to find out if there is for example a displacement in the frequencies to higher or lower values at higher wind speeds. Renaming the axes Before explaining the wind turbine expected behavior we will define the following axes: Figure 5.6: Common naming of an aircraft in flight principal axes [7]. We then compare the axis from the figure 5.6 with the axis of our wind turbine: CHAPTER 5. PERIODICITY ANALYSIS 59 Figure 5.7: Front view of the turbine with labeled axes, simplified. From images 5.6 and 5.7 we describe the following mapping: Y axis is parallel to the roll axis but in the opposite direction, X axis is parallel to the pitch axis but again the opposite direction and the Z axis corresponds with the yaw axis (it is parallel) but it is also in the opposite direction. With this mapping in mind we should be able to talk about a rotation in the X axis to be equivalent to a change in pitch, and the same with the rest of the axis (Z - yaw and Y - roll). The polarity of the rotation is not relevant for this part. Expected behavior of the wind turbine The expected behavior of the wind turbine is the following: • This description is for a wind turbine with a fixed pitch angle. • We start with the wind turbine perfectly straight, with no wind affecting it. • We now apply a constant amount of wind on the turbine. This wind causes the wind turbine to tip backwards i.e.: the turbine should pitch back, with the top of the wind turbine being displaced towards the Y axis. • As the wind applied is constant, the wind turbine should stabilize while it is tipped a bit backwards at a certain spot in space. CHAPTER 5. PERIODICITY ANALYSIS 60 • The main oscillations on the wind turbine are expected to be found as a change in pitch on the Y axis. Significant rotations in the yaw or roll axes should not be observed as the wind is pushing towards the Y axis. • As the wind turbine vibrates on the pitch axis, displacement on the Z axis should also be observed. The rest of the axes should have negligible changes. • Lastly, we expect the frequency of these changes and oscillations to increase with higher with speeds. With this in mind, we expect to find the following vibrations: • Repeated displacement on the Z axis, represented by a vibration on the acceleration of Z metric (Acc Z). • Rotation and counter-rotation around the pitch, detected in the angular velocity of thee X axis (Gyro X). Acceleration on the Z axis (consequence of changes of pitch) All the graphs of the acceleration on the z axis had to had the DC component removed. This is probably a consequence of the frequency not being strong enough at the 0-5Hz range. Normally, we expect the peaks in the detected frequencies from a FFT to be proportional to the DC component. In this case, the peaks are much smaller than the DC component. The following figure illustrates this point: Figure 5.8: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of z. DC component not removed. CHAPTER 5. PERIODICITY ANALYSIS 61 A comparison of several FFTs is shown below, the pitch angle is kept constant while the wind is not, the DC component is removed: Figure 5.9: FFT with wind speed of 8.5 and an angle of 30◦, acceleration of z. DC component removed. Figure 5.10: FFT with wind speed of 11.6 and an angle of 30◦, acceleration of z. DC component removed. CHAPTER 5. PERIODICITY ANALYSIS 62 Figure 5.11: FFT with wind speed of 13.8 and an angle of 30◦, acceleration of z. DC component removed. From figures 5.9, 5.10 and 5.11 there is no visible pattern nor displacement of the peaks from the different wind speeds. It is possible that the wind turbine is vibrating at a frequency along the acceleration in the z axis that is much higher than the sample rate and therefore cannot be detected. This is also supported by the magnitude of the DC component compared to the peaks in the FFT (≈ 0.05 amplitude versus ≈ 2.1). As the acceleration is the derivative of the velocity, we expect the changes of this metric to be much faster than on the velocity. We will now analyze the angular velocity, hopefully being able to detect a pattern and clearer frequencies thanks to expected slower changes. Angular velocity on the X axis (changes in pitch) The graphs from the angular velocity on the x axis (gyro x) were much more clearer. In fact, it was not necessary to remove the DC component, indicating that the peaks are proportional to it and that there is a high probability that the frequencies have been detected: CHAPTER 5. PERIODICITY ANALYSIS 63 Figure 5.12: FFT with wind speed of 8.5 and an angle of 30◦, angular velocity on x. DC component not removed. Figure 5.13: FFT with wind speed of 10.1 and an angle of 30◦, angular velocity on x. DC component not removed. CHAPTER 5. PERIODICITY ANALYSIS 64 Figure 5.14: FFT with wind speed of 11.6 and an angle of 30◦, angular velocity on x. DC component not removed. Figure 5.15: FFT with wind speed of 13.8 and an angle of 30◦, angular velocity on x. DC component not removed. From the progression of figures 5.12, 5.13, 5.14 and 5.15 two things can be observed: The peak present at x ≈ 4.35Hz in 5.12 clearly moves right and ends up at x ≈ 4.8Hz in CHAPTER 5. PERIODICITY ANALYSIS 65 5.15, this suggests an increase in frequency with an increase in wind speed. Additionally, the peak also increases in height, from J(theta) ≈ 0.052 to J(theta) ≈ 0.077 in the last figure. This observation strongly suggests that the wind turbine is vibrating faster at higher wind speeds, in the pitch axis as it was expected. Rest of the metrics and axes The rest of the FFTs did not show clear patterns nor distinct frequency detections. The magnetometer metric was not considered useful for this analysis, as its main use would be to get the orientation of the turbine in relation with the earth’s magnetic field. 5.3.3 Periodicity analysis results In this chapter we successfully confirmed the expected behavior of the wind turbine. Additionally, we managed to detect a direct correlation between the wind speed and the frequency at which the turbine was vibrating. This finding was very clear on the angular velocity on the x axis, but not very conclusive on the displacements on the z axis. One last observation of the magnetometer is made: Table 5.2: Data statistics at a wind speed of 8.5ms−1 and 30◦pitch angle. Metric Magn x [µT ] Magn y [µT ] Magn z [µT ] min -26.969531 -6.288750 -10.476562 max -25.584609 -4.826250 -8.648438 range 1.384922 1.4625 1.828124 From table 5.2 we can observe that the largest range is registered at the magnetometer z axis, followed by y and then x. This greater change in the z and y components of the magnetometer vector further confirms that the main rotation is along the x axis (variation in pitch) and not the other axes (if the wind turbine rotates around an axis A, the magnetometer values measured at that axis will be affected the least, because the vector of the magnetic field will remain at the same angle with the A axis, while the other axes will not maintain the angle). Chapter 6 Supervised Models 6.1 Introduction The main goal of this section is to predict, given a blade pitch angle α and a wind speed υ, a statistical metric. This would allow us to have a general idea of the behavior of the wind turbine at different wind speeds and angles without actually having to perform the experiment. For example, knowing the maximum acceleration that the wind turbine will be exposed to at a certain wind speed will allow us to plan how strong should the structure that supports it be, in order for it to not break and to not waste unnecessary material or avoid harder or costlier support structures. This is a regression problem: Given a set of real values, predict a new real value based on previous examples. More concretely this is a supervised machine learning problem. Several supervised linear learning models from scikit learn were used in this section as they are already prepared to use the numpy data formats. 6.2 Data preparation Table 6.1: Data set columns. angle windspeed median Acc x mean Acc x var Acc x . . . median Magn z mean Magn z . . . average abs len Magn 1 8.5 0.00187 0.00153 0.00035 . . . -9.56250 -9.60014 . . . 9.60014 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Recalling the columns from earlier in table 6.1 the input vector X will be the columns angle and wind speed and the target vector Y will be any of the other columns. In order to have better results, lower complexity and more interpretable models, the decision was made to have one model per predicted metric, instead of a highly complex model that predicts all the output columns at the same time. Each model is trained to predict one output metric. In the case of the Neural Network, this results in two input nodes and one output node for each neural network. The hidden layers can also be adjusted in scikit learn, 67 CHAPTER 6. SUPERVISED MODELS 68 along with other parameters such as the algorithm used, α (L2 penalty or regularization term), etc. For each of the learners the following steps were taken: 1. Create the X and Y input and output vectors. 2. Shuffle (randomize) the order of the rows of the data set in order to avoid repeated biases. 3. Split the data set by rows into two sets. The training set and the test set, in 80% and 20% proportion, respectively. 4. Scale the data. 5. Train the model using the training set. 6. Evaluate the model using the test set. 6.3 Models used A total of 6 models were used in this section: Linear regression [39], Linear with poly- nomial features [39][45], Ridge [40], Huber [38], Gaussian [37] and a MLP Regressor [43] [36]. All of the models used a StandardScaler [46] before fitting the data. The pipelines of all the models were: Scaler → Estimator. Except the polynomial model that was: Scaler → PolynomialFeatures → Estimator [45]. With estimator being the model (linear, gaussian, etc). 6.4 Scaler selection The StandardScaler was chosen because it was concluded in Statistical Analysis chapter, Distribution study section, that the data was normally distributed and that parametric statistical methods could be used. Sk-learn mentions the following about the StandardScaler : ”[...] might behave badly if the individual features do not look like standard normally distributed data [...] [46]. We therefore conclude that the StandardScaler is the appropriate scaler in this work, because of its compatibility with normally distributed data. 6.5 Hyperparameters tuning The hyperparameter tuning was performed semi-automatically using sk-learn’s Grid- SearchCV [42]. CHAPTER 6. SUPERVISED MODELS 69 The default parameters used in GridSearchCV were the following: scoring = None, n jobs = None, refit = True, cv = None (5-fold cross validation) and error score = np.nan. For example, in the case of polynomial regression, GridSearchCV allows us to specify an array of degrees to test with. After the testing is done, GridSearchCV has a method that returns the best estimator. Additionally, GridSearchCV does cross-validation [34]. By semi-automatically we mean that a number of parameters were fed into the Grid- SearchCV model selector. These parameters were selected according to the size of the data and the complexity of it, which was thought to not be extremely high in this case 1. For example, in the case of the degree of the polynomial model, the values fed were in the range [2, 8], and the best results were obtained with values under 4. Another example is the neural network, the MLPRegressor. It did not make any sense to use a model with a lot of hidden layers and nodes, because the inputs were two and the output is only one. 6.6 Models parameters Some models did not allow or did not have parameters to tune. Other models, however, required adjustment in order to have acceptable performance. If nothing is specified, the default values from sk-learn were used. For detailed docu- mentation on the defaults, check the official documentation for the models in the previous references (were we list the models used). The following list details the chosen model parameters: • Linear No changes, defaults used: fit intercept = True, normalize = False and positive = False. • Polynomial Degrees 2 to 4 were found to be the best performers. The default parameters were: degree = 2, interaction only = False, include bias = True and order = C. Degree 2 was chosen in the end for its good performance and lower chance of overfitting. Same parameters for the model as in Linear. • Ridge No changes, defaults used. Regularization strength α = 1.0, fit intercept = True, normalize = False, max iter = None, tol = 10−3 (precision of the solution) and solver = auto. • Huber No parameters specified, defaults used. Number of samples that should be classified as outliers ε = 1.35,max iter = 100, regularization parameter α = 0.0001, warm start = False, fit intercept = True and tol = 10−5. • Gaussian No tuning, defaults used. kernel = None (ConstantKernel(1.0, con- stant value bounds=”fixed” * RBF(1.0, length scale bounds=”fixed”)), value added to the diagonal of the kernel matrix during fitting α = 10−10, optimizer = fmin l bfgs b, n restarts optimizer = 0, normalize y = False, and random state = None. 1Recall from the Materials and Methods chapter that the number of samples was 42768. Additionally, the models studied have two inputs and one output. CHAPTER 6. SUPERVISED MODELS 70 • MLPRegressor Changes: hidden layer sizes = (2, 4, 2). The rest, were de- fault parameters: activation = relu, solver = lbfgs, L2 penalty (regularization term) α = 0.0001, batch size = auto = min(200, n samples), max iter = 1000, random = None, tol = 10−4, warm start = False and max fun = 15000. The default hidden layer size was (100, ), considered to be excessive. The solver pa- rameter lbfgs was vital for the model to perform acceptably. According to sk-learn: ”For small data sets, lbfgs can converge faster and perform better” [43]. This was found to be the case. The following figure illustrates the mentioned (2, 4, 2) MLPRegressor: Figure 6.1: Representation of the (2,4,2) Neural Network used. This network has 3 hidden layers. Illustration made thanks to NN-SVG [3]. Note from figure 6.1 the 2 nodes on the input layer (wind, angle), the (2, 4, 2) node structure in the hidden layers (3 hidden layers, with 2, 4 and 2 nodes respectively) and the single node on the output layer. 6.7 Results This section will detail the overall scores obtained. 6.7.1 R2 score All the scores were calculated using the R2 score (also known as the coefficient of deter- mination) is calculated using the following formula [13] [41]: R2 = 1− SSres SStot (6.1) where SSres (the residual sum of squares) is: SSres = ∑ i (yi − fi) 2 = ∑ i e2i (6.2) SStot (the total sum of squares) is: SStot = ∑ i (yi − ȳ)2 (6.3) CHAPTER 6. SUPERVISED MODELS 71 and ȳ (the mean of the observed data) is: ȳ = 1 n n∑ i=1 yi (6.4) According to sklearn R2 ∈ (−∞, 1], where 1 is the best score possible [41]. Scores over .75 are considered good and scores over .90 very good. Finally, scores under 0.50 will be considered very bad. The R2 score is inversely correlated with the distance from the prediction to the real value, i.e: the closer the prediction is to the real value (the smaller the distance), the higher the R2 score is. If this distance is equal to 0, the R2 score will be equal to 1. 6.7.2 Overall scores The initialization of the model parameters is random, therefore, some variance between runs is expected. The following tables show the number of scores over a certain threshold for four different runs: Table 6.2: Model scores breakdown. Run 1. Model Scores over 0.0 Scores over 0.25 Scores over 0.5 Scores over 0.75 Scores over 0.9 Scores over 0.95 Linear 82 50 23 4 1 0 Polynomial 66 54 30 8 2 0 Ridge 76 47 22 10 2 1 Huber 78 50 22 8 4 0 Gaussian 10 8 3 0 0 0 Mlpr 42 27 14 4 0 0 Table 6.3: Model scores breakdown. Run 2. Model Scores over 0.0 Scores over 0.25 Scores over 0.5 Scores over 0.75 Scores over 0.9 Scores over 0.95 Linear 79 52 22 7 3 0 Polynomial 77 66 36 8 1 1 Ridge 75 49 29 5 2 0 Huber 75 49 26 8 2 0 Gaussian 7 5 2 1 0 0 Mlpr 31 24 14 1 0 0 Table 6.4: Model scores breakdown. Run 3. Model Scores over 0.0 Scores over 0.25 Scores over 0.5 Scores over 0.75 Scores over 0.9 Scores over 0.95 Linear 73 52 29 10 1 0 Polynomial 77 57 36 13 3 1 Ridge 80 58 27 6 2 0 Huber 64 41 15 7 2 1 Gaussian 7 6 3 1 0 0 Mlpr 45 24 12 4 1 0 CHAPTER 6. SUPERVISED MODELS 72 Table 6.5: Model scores breakdown. Run 4. Model Scores over 0.0 Scores over 0.25 Scores over 0.5 Scores over 0.75 Scores over 0.9 Scores over 0.95 Linear 76 49 31 11 1 0 Polynomial 76 56 34 11 1 0 Ridge 76 55 33 4 2 0 Huber 75 51 26 8 2 1 Gaussian 9 6 4 2 0 0 Mlpr 36 19 8 1 0 0 There is some amount of variance between runs but the conclusion is the following: The Gaussian model had the worse performance of the 6. Nonetheless, later on, this model did perform good in some limited situations (some examples will be shown later). The next model increasing by performance is the Neural Network (MLPR), that did perform much better than the Gaussian model but still only managed half of what the other four better models did. Finally, The best models are the Linear, Polynomial, Ridge and Hubber regressors, which in some situations even had an R2 score of over 0.95. An score this high, means that the prediction is extremely close to the real value. 6.7.3 Predictions representations The following section provides a representation of the predictions made by the best mod- els. At least one graph is present for each model. A note about the graphs In some of the subgraphs of this section only one green point is visible. This is due to the prediction being so close to the real value that the prediction is hidden as it is drawn after the real value (drawing over it), therefore, only one green point appears. Magnetometer As previously stated, the Gaussian model was not very good. There were however some instances where it did archive a very good score: CHAPTER 6. SUPERVISED MODELS 73 Figure 6.2: Predictions of the Gaussian model of the median magnetometer values in the z axis. As we can see on figure 6.2 even though the overall scores from the Gaussian model were not very good, in this case it managed a score of 0.91, which is very good. In fact many of the real values cannot be seen as they are in the same place as the predictions (and thus hidden because of the reason already explained). CHAPTER 6. SUPERVISED MODELS 74 Other examples of the modulus of the magnetometer vector are shown: Figure 6.3: Predictions of the Linear model of the length of the magnetometer vector. CHAPTER 6. SUPERVISED MODELS 75 Figure 6.4: Predictions of the Polynomial model of the length of the magnetometer vector. As we can see from figures 6.3 and 6.4 the score in this case is not as good (please note it is a different metric, so it is not directly comparable). In conclusion, the values from the magnetometer grouped data were able to be pre- dicted. Additionally, the predictions were very close to the real values. CHAPTER 6. SUPERVISED MODELS 76 Gyroscope Figure 6.5: Predictions of the Huber model of the standard deviation of the gyroscope in the y axis. CHAPTER 6. SUPERVISED MODELS 77 Figure 6.6: Predictions of the Ridge model of the standard deviation of the gyroscope in the y axis Figures 6.5 and 6.6 seem to indicate that the standard deviation of the gyroscope in the y axis is fairly predictable, as two different models managed to predict it successfully. CHAPTER 6. SUPERVISED MODELS 78 Figure 6.7: Predictions of the MLPR model of the range of the gyroscope in the z axis. CHAPTER 6. SUPERVISED MODELS 79 Figure 6.8: Predictions of the Polynomial model of variance of the gyroscope in the y axis. Figures 6.7 and 6.8 show more very good scores and figure 6.7 introduces the predic- tions of the MLPR model. CHAPTER 6. SUPERVISED MODELS 80 Acceleration The acceleration vector is probably the most important metric to predict together with the gyroscope values. The following section details several of the best performer models: Figure 6.9: Predictions of the Linear model of the mean of the acceleration in the z axis. CHAPTER 6. SUPERVISED MODELS 81 Figure 6.10: Predictions of the Polynomial model of the mean of the acceleration in the z axis. CHAPTER 6. SUPERVISED MODELS 82 Figure 6.11: Predictions of the Ridge model of the mean of the acceleration in the z axis. CHAPTER 6. SUPERVISED MODELS 83 Figure 6.12: Predictions of the Huber model of the mean of the length of the acceleration vector. CHAPTER 6. SUPERVISED MODELS 84 Figure 6.13: Predictions of the MLPR model of the standard deviation of the length of the acceleration vector. As we can see from the previous figures, the models managed to have very good scores, with some even breaking over 0.95 like the Polynomial model of the mean of the acceleration in the z axis on figure 6.10. CHAPTER 6. SUPERVISED MODELS 85 As an important remark for this section, the Gaussian model did not manage to have an score over 0.75. This is further supported by the tables 6.2, 6.3, 6.4 and 6.5, where the Gaussian model consistently showed the worse scores. Average error rate Using equation 3.8 and the predictions from the previously mentioned models, the average error rate was calculated to be ≈ 1%. This error rate was considered to be very low and the predictions are therefore also considered to be very precise. If a virtual model of the turbine were to be made, the recommended models to build it would be: Linear, Polynomial, Ridge and Huber. MLPR model could also be considered in some instances and the Gaussian model is strongly discouraged in this case. Not all the graphs are shown, as there are 107 (107 models with scores over 0.75), but this overview should give a good idea of the performance of the models. In conclusion, with these results, we can say that a virtual model of the grouped metrics of the wind turbine can be done, either by combining the best models for certain metrics or just using one of the best performers for all the metrics. Additionaly, these models can be used to predict values outside the training data set (new pitch angle configurations and wind conditions that were not present in the experiment). This should work fairly well with intermediate values like 15◦, but not with values near 0◦(this is an special case were the wind turbine blades will not be rotating). One last remark must be made, and that is that the values trying to be predicted should not be very far from the experimental ranges, for example: 20ms−1 (the experi- ment had a maximum of 13.8ms−1), as these values are very far from the measurements and the models will probably fail. Some predictions are shown in the following subsection. 6.7.4 Predictions for inputs outside the experiment Using the previous models, a table was developed that presents the predictions for wind speeds outside the experiment, but only for models that had a score over 0.90: CHAPTER 6. SUPERVISED MODELS 86 Table 6.6: Predictions for inputs outside the experiment, only models with scores over 0.90. Model Metric Angle [◦] Wind speed [ms−1] Prediction linear mean Acc z 10 15.0 1.104912 linear mean Acc z 10 18.0 1.142317 linear mean Acc z 10 20.0 1.167254 linear mean Acc z 30 15.0 1.103160 linear mean Acc z 30 18.0 1.140565 linear mean Acc z 30 20.0 1.165502 ridge mean Acc z 10 15.0 1.104175 ridge mean Acc z 10 18.0 1.140750 ridge mean Acc z 10 20.0 1.165133 ridge mean Acc z 30 15.0 1.103168 ridge mean Acc z 30 18.0 1.139742 ridge mean Acc z 30 20.0 1.164125 linear var Gyro y 10 15.0 0.004726 linear var Gyro y 10 18.0 0.006617 linear var Gyro y 10 20.0 0.007877 linear var Gyro y 30 15.0 0.005020 linear var Gyro y 30 18.0 0.006911 linear var Gyro y 30 20.0 0.008171 linear mean len Acc 10 15.0 1.111576 linear mean len Acc 10 18.0 1.149678 linear mean len Acc 10 20.0 1.175079 linear mean len Acc 30 15.0 1.110320 linear mean len Acc 30 18.0 1.148421 linear mean len Acc 30 20.0 1.173822 polynomial mean len Acc 10 15.0 1.116658 polynomial mean len Acc 10 18.0 1.161558 polynomial mean len Acc 10 20.0 1.193637 polynomial mean len Acc 30 15.0 1.109985 polynomial mean len Acc 30 18.0 1.155171 polynomial mean len Acc 30 20.0 1.187439 If we compare the outputs for mean len Acc z with the graphs from 6.12 we can see an increase in the prediction metric (more acceleration, in the linear model, from around 1.25g at 13.8ms−1 to 1.11g at 15.0ms−1 and up to 1.17g at 20.0ms−1). Lastly, an observation is made: The change in angle from 10◦to 30◦does not change the acceleration metrics for the same wind speed drastically. It does however, affect the variance of the gyroscope along the y axis. This suggests that wind speed is much more influential in the acceleration vector than the angle, at least for these two angles. CHAPTER 6. SUPERVISED MODELS 87 As there is no real experimental data to compare this outputs with, no further com- mentary will be done. 6.8 Conclusions In this section we showed a representation of the predictions of the best models. All the models had good results, but there were some that were limited to only some metrics. Additionally, some predictions were made for values outside the experiment data set and an expected increase in some metrics was observed. Chapter 7 Conclusions and Future works 7.1 Conclusions Throughout this work we successfully were able to process the available data from the experiments and identify some of the main metrics of the floating wind turbine. The data processing was vital for the model success. For example, the addition of the length vectors or the simplified metrics allowed us to find more variables to predict. The visualization of the data was also a vital part of verifying the functioning of the code and representing the data. The periodicity analysis was successful and was consistent with the expected results from the physical viewpoint. The periodicity analysis adds useful information about the resonant frequencies of the wind turbine that can be later used to further study it. Most of the supervised models presented did have very good scores, with the Gaussian being the only one that was discouraged. These precise models can be used to predict the most important metrics of the turbine in conditions that were not present in the experiment. 7.2 Future works If a continuation of this work were to be done, the following paths are suggested: • Select the best overall model to predict all the metrics and therefore have a fast approximation of the wind turbine behavior. • Instead of one model per metric, use one single model that can be a combination of the best models to predict all the output metrics at the same time. This model would be more complicated than the previous one. • Add new variables to the data set. One of the most interesting measurements could be the instantaneous power generation. This, combined with the wind speed and angle could be very useful. The current setup did not allow such measurements. 89 CHAPTER 7. CONCLUSIONS AND FUTURE WORKS 90 • Further explore the periodicity analysis with different sized turbines and with differ- ent weight distributions and study how these configurations affect the most promi- nent frequency. • Investigate or program new libraries that allow faster sampling rate in order to discard or find frequencies above 10Hz were the turbine might or not be vibrating, or use different equipment for these measurements. • Explore more complex models like LSTM (Long Short-Term memory) neural net- works, that could allow the prediction of the model metrics in a short future time- frame. Appendices 91 Appendix A Virtual environments allow the host computer to have multiple Python binaries with different sets of libraries. This prevents conflicts that might happen if the user were to have a global configuration with a single Python binary with projects that require different specific package versions that are not compatible. See https://docs.python.org/3/library/venv.html for more information on how to create a virtual environment (venv). After having set up the venv, you must install the necessary packages listed in the Programming software section along with the correct versions. 92 https://docs.python.org/3/library/venv.html Bibliography 93 Bibliography [1] Fast fourier transform. https://en.wikipedia.org/wiki/Fast_Fourier_ transform, 2022. [2] AkanoToE. Fast fourier transform of a cosine summation function resonating at 10, 20, 30, 40, and 50 hz. https://en.wikipedia.org/wiki/File:Normal_ Distribution_PDF.svg, 2020. [3] alexlenail. Nn-svg. http://alexlenail.me/NN-SVG/index.html, https:// github.com/alexlenail/NN-SVG, 2015-2022. [4] Standford University Andrew Ng. Cost function. https://www.coursera.org/ learn/machine-learning/supplement/nhzyF/cost-function, 2011. [5] Standford University Andrew Ng. Gradient descent in practice i - feature scal- ing. https://www.coursera.org/learn/machine-learning/supplement/CTA0D/ gradient-descent-in-practice-i-feature-scaling, 2011. [6] Standford University Andrew Ng. What is machine learning? https: //www.coursera.org/learn/machine-learning/supplement/aAgxl/ what-is-machine-learning, 2011. [7] Jrvz Auawise. An image showing all three axes. https://commons.wikimedia.org/ wiki/File:Yaw_Axis_Corrected.svg, 2010. [8] Nicola Bodini, Julie K Lundquist, and Anthony Kirincich. Us east coast lidar mea- surements show offshore wind turbines will encounter very low atmospheric turbu- lence. Geophysical Research Letters, 46(10):5582–5591, 2019. [9] BP. Bp statistical review of world energy. https://www.bp.com/content/ dam/bp/business-sites/en/global/corporate/pdfs/energy-economics/ statistical-review/bp-stats-review-2021-full-report.pdf, 2021. [10] Subrata Chakrabarti, John Halkyard, and Cuneyt Capanoglu. Chapter 1 - histori- cal development of offshore structures. In SUBRATA K. CHAKRABARTI, editor, Handbook of Offshore Engineering, pages 1–38. Elsevier, London, 2005. [11] WRI CAIT Dataset Climate Watch. Global historical emissions by sec- tor. https://www.climatewatchdata.org/ghg-emissions?breakBy=sector& chartType=percentage&end_year=2018&start_year=1990, 1990-2016. [12] SciPy community. scipy.stats.shapiro. https://docs.scipy.org/doc/scipy/ reference/generated/scipy.stats.shapiro.html#r06d6d75f824a-4, 2008- 2022. 94 https://en.wikipedia.org/wiki/Fast_Fourier_transform https://en.wikipedia.org/wiki/Fast_Fourier_transform https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg http://alexlenail.me/NN-SVG/index.html https://github.com/alexlenail/NN-SVG https://github.com/alexlenail/NN-SVG https://www.coursera.org/learn/machine-learning/supplement/nhzyF/cost-function https://www.coursera.org/learn/machine-learning/supplement/nhzyF/cost-function https://www.coursera.org/learn/machine-learning/supplement/CTA0D/gradient-descent-in-practice-i-feature-scaling https://www.coursera.org/learn/machine-learning/supplement/CTA0D/gradient-descent-in-practice-i-feature-scaling https://www.coursera.org/learn/machine-learning/supplement/aAgxl/what-is-machine-learning https://www.coursera.org/learn/machine-learning/supplement/aAgxl/what-is-machine-learning https://www.coursera.org/learn/machine-learning/supplement/aAgxl/what-is-machine-learning https://commons.wikimedia.org/wiki/File:Yaw_Axis_Corrected.svg https://commons.wikimedia.org/wiki/File:Yaw_Axis_Corrected.svg https://www.bp.com/content/dam/bp/business-sites/en/global/corporate/pdfs/energy-economics/statistical-review/bp-stats-review-2021-full-report.pdf https://www.bp.com/content/dam/bp/business-sites/en/global/corporate/pdfs/energy-economics/statistical-review/bp-stats-review-2021-full-report.pdf https://www.bp.com/content/dam/bp/business-sites/en/global/corporate/pdfs/energy-economics/statistical-review/bp-stats-review-2021-full-report.pdf https://www.climatewatchdata.org/ghg-emissions?breakBy=sector&chartType=percentage&end_year=2018&start_year=1990 https://www.climatewatchdata.org/ghg-emissions?breakBy=sector&chartType=percentage&end_year=2018&start_year=1990 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#r06d6d75f824a-4 https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.shapiro.html#r06d6d75f824a-4 BIBLIOGRAPHY 95 [13] Wikipedia community. Coefficient of determination. https://en.wikipedia.org/ wiki/Coefficient_of_determination, 2022. [14] Biblioteca Matemáticas Complutense. Aspectos formales para presentar un trabajo académico: Tesis, tfm, tfg, etc. https://youtu.be/6b00Ar2aHmY, 2021. [15] Wind Europe. Floating offshore wind energy: a policy blueprint for europe. https: //windeurope.org/wp-content/uploads/files/policy/position-papers/ Floating-offshore-wind-energy-a-policy-blueprint-for-Europe.pdf, 2018. [16] Casey M Fontana, Spencer T Hallowell, Sanjay R Arwade, Don J DeGroot, Melissa E Landon, Charles P Aubeny, Brian Diaz, Andrew T Myers, and Senol Ozmutlu. Multiline anchor force dynamics in floating offshore wind turbines. Wind Energy, 21(11):1177–1190, 2018. [17] Seymour Geisser and Wesley O Johnson. Modes of parametric statistical inference. John Wiley & Sons, 2006. [18] Ghiles. Overfitted data. https://commons.wikimedia.org/wiki/File: Overfitted_Data.png, 2016. [19] Andrew J Goupee, Bonjun J Koo, Richard W Kimball, Kostas F Lambrakos, and Habib J Dagher. Experimental comparison of three floating wind turbine concepts. Journal of Offshore Mechanics and Arctic Engineering, 136(2), 2014. [20] Kira Grogg. Harvesting the wind: The physics of wind turbines. Physics and As- tronomy Comps Papers, 7, 2005. [21] Heinrich-Böll-Stiftung. Wind turbines 50 times more powerful today than 20 years ago. https://en.wikipedia.org/wiki/File:Growing_size_of_wind_ turbines.png, https://www.flickr.com/photos/boellstiftung/26614518499/ in/album-72157632531168908/, 2017. [22] Donald E Hilt and Donald W Seegrist. Ridge, a computer program for calculating ridge regression estimates, volume 236. Department of Agriculture, Forest Service, Northeastern Forest Experiment . . . , 1977. [23] Iberdrola. Floating offshore wind. https://www.iberdrola.com/innovation/ floating-offshore-wind, Accessed 2022. [24] Our World in Data. Renewable energy generation, world. https: //ourworldindata.org/grapher/modern-renewable-energy-consumption? stackMode=relative&country=~OWID_WRL, 2020. [25] InvenSense Inc. Mpu-6000 and mpu-6050 product specification, revi- sion 3.4. https://invensense.tdk.com/wp-content/uploads/2015/02/ MPU-6000-Datasheet1.pdf, 2013. [26] Inductiveload. A selection of normal distribution probability density functions (pdfs). https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg, 2008. https://en.wikipedia.org/wiki/Coefficient_of_determination https://en.wikipedia.org/wiki/Coefficient_of_determination https://youtu.be/6b00Ar2aHmY https://windeurope.org/wp-content/uploads/files/policy/position-papers/Floating-offshore-wind-energy-a-policy-blueprint-for-Europe.pdf https://windeurope.org/wp-content/uploads/files/policy/position-papers/Floating-offshore-wind-energy-a-policy-blueprint-for-Europe.pdf https://windeurope.org/wp-content/uploads/files/policy/position-papers/Floating-offshore-wind-energy-a-policy-blueprint-for-Europe.pdf https://commons.wikimedia.org/wiki/File:Overfitted_Data.png https://commons.wikimedia.org/wiki/File:Overfitted_Data.png https://en.wikipedia.org/wiki/File:Growing_size_of_wind_turbines.png https://en.wikipedia.org/wiki/File:Growing_size_of_wind_turbines.png https://www.flickr.com/photos/boellstiftung/26614518499/in/album-72157632531168908/ https://www.flickr.com/photos/boellstiftung/26614518499/in/album-72157632531168908/ https://www.iberdrola.com/innovation/floating-offshore-wind https://www.iberdrola.com/innovation/floating-offshore-wind https://ourworldindata.org/grapher/modern-renewable-energy-consumption?stackMode=relative&country=~OWID_WRL https://ourworldindata.org/grapher/modern-renewable-energy-consumption?stackMode=relative&country=~OWID_WRL https://ourworldindata.org/grapher/modern-renewable-energy-consumption?stackMode=relative&country=~OWID_WRL https://invensense.tdk.com/wp-content/uploads/2015/02/MPU-6000-Datasheet1.pdf https://invensense.tdk.com/wp-content/uploads/2015/02/MPU-6000-Datasheet1.pdf https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg BIBLIOGRAPHY 96 [27] Tom Mitchell, William Cohen, Estevam Hruschka, Partha Talukdar, Bishan Yang, Justin Betteridge, Andrew Carlson, Bhavana Dalvi, Matt Gardner, Bryan Kisiel, et al. Never-ending learning. Communications of the ACM, 61(5):103–115, 2018. [28] Eduard Muljadi and Charles P Butterfield. Pitch-controlled variable-speed wind tur- bine generation. IEEE transactions on Industry Applications, 37(1):240–246, 2001. [29] Walter Musial, Sandy Butterfield, and Andrew Boone. Feasibility of floating platform systems for wind turbines. In 42nd AIAA aerospace sciences meeting and exhibit, page 1007, 2004. [30] NASA. Global wind speed. https://visibleearth.nasa.gov/images/56893/ global-wind-speed, October 2001. [31] Office of Energy Efficiency and Renewable Energy. Schematic diagram of a mod- ern horizontal-axis, three-bladed wind turbine. https://en.wikipedia.org/wiki/ File:EERE_illust_large_turbine.gif, 2006. [32] World Health Organization. Quantitative risk assessment of the effects of climate change on selected causes of death, 2030s and 2050s. 2014. [33] World Health Organization. Climate change and health. https://www.who.int/ news-room/fact-sheets/detail/climate-change-and-health, 2021. [34] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Cross vali- dation. https://scikit-learn.org/stable/modules/cross_validation.html# computing-cross-validated-metrics, 2011. [35] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Pipelin- ing. https://scikit-learn.org/stable/tutorial/statistical_inference/ putting_together.html, 2011. [36] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Courna- peau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. [37] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Van- derplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.gaussian process.gaussianprocessregressor. https: //scikit-learn.org/stable/modules/generated/sklearn.gaussian_process. GaussianProcessRegressor.html, https://scikit-learn.org/stable/ modules/gaussian_process.html#gaussian-process, 2011. [38] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. https://visibleearth.nasa.gov/images/56893/global-wind-speed https://visibleearth.nasa.gov/images/56893/global-wind-speed https://en.wikipedia.org/wiki/File:EERE_illust_large_turbine.gif https://en.wikipedia.org/wiki/File:EERE_illust_large_turbine.gif https://www.who.int/news-room/fact-sheets/detail/climate-change-and-health https://www.who.int/news-room/fact-sheets/detail/climate-change-and-health https://scikit-learn.org/stable/modules/cross_validation.html#computing-cross-validated-metrics https://scikit-learn.org/stable/modules/cross_validation.html#computing-cross-validated-metrics https://scikit-learn.org/stable/tutorial/statistical_inference/putting_together.html https://scikit-learn.org/stable/tutorial/statistical_inference/putting_together.html https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html https://scikit-learn.org/stable/modules/gaussian_process.html#gaussian-process https://scikit-learn.org/stable/modules/gaussian_process.html#gaussian-process BIBLIOGRAPHY 97 sklearn.linear model.huberregressor. https://scikit-learn.org/stable/ modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn. linear_model.HuberRegressor, 2011. [39] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.linear model.linearregression. https://scikit-learn.org/stable/ modules/generated/sklearn.linear_model.LinearRegression.html#sklearn. linear_model.LinearRegression, 2011. [40] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour- napeau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.linear model.ridge. https://scikit-learn.org/stable/modules/generated/sklearn.linear_ model.Ridge.html#sklearn.linear_model.Ridge, https://scikit-learn.org/ stable/modules/linear_model.html#ridge-regression, 2011. [41] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.metrics.r2 score. https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_ score.html, 2011. [42] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.model selection.gridsearchcv. https://scikit-learn.org/stable/ modules/generated/sklearn.model_selection.GridSearchCV.html, 2011. [43] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.neural network.mlpregressor. https://scikit-learn.org/stable/ modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn. neural_network.MLPRegressor,https://scikit-learn.org/stable/modules/ neural_networks_supervised.html, 2011. [44] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blon- del, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Courna- peau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.pipeline.pipeline.score. https://scikit-learn.org/stable/modules/generated/sklearn.pipeline. Pipeline.html#sklearn.pipeline.Pipeline.score, 2011. [45] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. sklearn.preprocessing.polynomialfeatures. https://scikit-learn.org/stable/ modules/generated/sklearn.preprocessing.PolynomialFeatures.html# sklearn.preprocessing.PolynomialFeatures, 2011. https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html#sklearn.linear_model.HuberRegressor https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html#sklearn.linear_model.LinearRegression https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html#sklearn.linear_model.Ridge https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html#sklearn.neural_network.MLPRegressor https://scikit-learn.org/stable/modules/neural_networks_supervised.html https://scikit-learn.org/stable/modules/neural_networks_supervised.html https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline.score https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html#sklearn.pipeline.Pipeline.score https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html#sklearn.preprocessing.PolynomialFeatures BIBLIOGRAPHY 98 [46] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Pas- sos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Stan- dard scaler. https://scikit-learn.org/stable/modules/generated/sklearn. preprocessing.StandardScaler.html, 2011. [47] Alexandre Quemy. Two-stage optimization for machine learning workflow. Informa- tion Systems, 92:101483, 2020. [48] Carl Edward Rasmussen. Gaussian processes for machine learning. http:// gaussianprocess.org/gpml/chapters/RW.pdf, 2006. [49] Arthur L Samuel. Machine learning. The Technology Review, 62(1):42–45, 1959. [50] TÜV SÜD. Floating windfarms. https://www.tuvsud.com/en/resource-centre/ stories/floating-windfarms, 2018. [51] app.electricitymap.org. electricitymap, spain. https://app.electricitymap. org/zone/ES, https://www.entsoe.eu/, 2022. [52] Christopher K Williams and Carl Edward Rasmussen. Gaussian processes for ma- chine learning, volume 2. MIT press Cambridge, MA, 2006. [53] Christopher K Williams and Carl Edward Rasmussen. Gaussian processes for ma- chine learning. MIT press Cambridge, MA, 2006. https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html http://gaussianprocess.org/gpml/chapters/RW.pdf http://gaussianprocess.org/gpml/chapters/RW.pdf https://www.tuvsud.com/en/resource-centre/stories/floating-windfarms https://www.tuvsud.com/en/resource-centre/stories/floating-windfarms app.electricitymap.org https://app.electricitymap.org/zone/ES https://app.electricitymap.org/zone/ES https://www.entsoe.eu/ Introduction Climate change Main polluters How a turbine generator works Parts of a wind turbine Physics of power generation Blade pitch angle Types of wind turbines Goals and specific objectives Main objective Objectives breakdown Relation of the work with the completed bachelor courses Work plan Software Text editor software Programming software Repository Structure of the project State Of The Art Current floating wind turbines wind farms Evolution of wind turbines power output Types of platforms for floating wind turbines Types of floating wind turbines anchors Fields of study Overlap between floating wind turbines and offshore turbines Overlap with the oil industry Current floating wind turbines study fields Materials and Methods Introduction to the experimental setup Experiment setup Materials: Data from experiments Methods Machine Learning Machine learning workflow Areas of machine learning Feature scaling Supervised learners Linear Models Evaluation of model performance Ovefitting in Machine Learning Hyperparameter tuning Pipelining Periodicity Study Data organization Statistical Analysis Data visualization Distribution study Histograms Examples of normal distributions Normality test Creating new variables Statistical functions 3D Plots Correlation matrix Periodicity analysis Test setup Testing the library Plotting the data set Fast Fourier Transforms DC Component Comparison of the FFTs at different wind speeds Periodicity analysis results Supervised Models Introduction Data preparation Models used Scaler selection Hyperparameters tuning Models parameters Results R2 score Overall scores Predictions representations Predictions for inputs outside the experiment Conclusions Conclusions and Future works Conclusions Future works Appendices Appendix A Bibliography