Vol.:(0123456789)1 3

Medical & Biological Engineering & Computing 
https://doi.org/10.1007/s11517-022-02630-z

ORIGINAL ARTICLE

GA‑MADRID: design and validation of a machine learning tool 
for the diagnosis of Alzheimer’s disease and frontotemporal dementia 
using genetic algorithms

Fernando García‑Gutierrez1 · Josefa Díaz‑Álvarez2  · Jordi A. Matias‑Guiu1 · Vanesa Pytel1 · Jorge Matías‑Guiu1 · 
María Nieves Cabrera‑Martín1 · José L. Ayala3

Received: 16 February 2022 / Accepted: 29 June 2022 
© The Author(s) 2022

Abstract
Artificial Intelligence aids early diagnosis and development of new treatments, which is key to slow down the progress of the 
diseases, which to date have no cure. The patients’ evaluation is carried out through diagnostic techniques such as clinical 
assessments neuroimaging techniques, which provide high-dimensionality data. In this work, a computational tool is pre-
sented that deals with the data provided by the clinical diagnostic techniques. This is a Python-based framework implemented 
with a modular design and fully extendable. It integrates (i) data processing and management of missing values and outliers; 
(ii) implementation of an evolutionary feature engineering approach, developed as a Python package, called PyWinEA using 
Mono-objective and Multi-objetive Genetic Algorithms (NSGAII); (iii) a module for designing predictive models based on 
a wide range of machine learning algorithms; (iv) a multiclass decision stage based on evolutionary grammars and Bayesian 
networks. Developed under the eXplainable Artificial Intelligence and open science perspective, this framework provides 
promising advances and opens the door to the understanding of neurodegenerative diseases from a data-centric point of view. 
In this work, we have successfully evaluated the potential of the framework for early and automated diagnosis with neuro-
images and neurocognitive assessments from patients with Alzheimer’s disease (AD) and frontotemporal dementia (FTD).

Keywords Alzheimer’s disease · Frontotemporal dementia · Neurodegenerative diseases · Machine learning · Artificial 
Intelligence

1 Introduction

Artificial Intelligence (AI) provides innovative solutions 
to solve complex real-world problems. Machine learning 
(ML) is one of its most representative branches with the 
fastest growing. The health sector frequently generates a 

large volume of highly dimensional data as those produced 
by neuroimaging techniques, such as magnetic resonance 
imaging (MRI) and positron emission tomography (PET) 
[10]; ML algorithms help on providing diagnosis, decisions, 
or even predictions related to the health status of patients.

 * Josefa Díaz-Álvarez 
 mjdiaz@unex.es

 Fernando García-Gutierrez 
 ga.gu.fernando@gmail.com

 Jordi A. Matias-Guiu 
 jordimatiasguiu@hotmail.com

 Vanesa Pytel 
 vanesa.pytel@gmail.com

 Jorge Matías-Guiu 
 matiasguiu@gmail.com

 María Nieves Cabrera-Martín 
 mncabreram@hotmail.com

 José L. Ayala 
 jayala@ucm.es

1 Departments of Neurology, Hospital Clinico San Carlos, 
San Carlos Research Health Institute (IdISSC), Universidad 
Complutense, Madrid, Spain

2 Department of Computer Architecture and Communications, 
Centro Universitario de Mérida, Universidad de 
Extremadura, Mérida, Spain

3 Department of Computer Architecture and Automation, 
Universidad Complutense, Madrid, Spain

http://orcid.org/0000-0003-2105-3905
http://crossmark.crossref.org/dialog/?doi=10.1007/s11517-022-02630-z&domain=pdf


 Medical & Biological Engineering & Computing

1 3

Adjusting the hyperparameters of ML algorithms to get 
the best performance is not a trivial task; it requires exper-
tise [36]. Therefore, ML models need to be endowed with 
explainability and transparency on the basis of the eXplain-
able Artificial Intelligence (XAI) paradigm [28], which will 
generate confidence and reliability in the results. This fact 
is connected to AI democratization [35] and the open sci-
ence perspective, where sharing and collaborating are two 
essential objectives.

The interest of the scientific and medical community in 
providing solutions based on AI to enhance and assist in the 
diagnosis, prevention and/or development of new treatments 
has increased significantly [45]. Despite the assistance, cau-
tion is needed to prevent any unintended though negative 
consequences that may occur, for instance if some data are 
not contextualised [9].

Mentioning some scientific literature in this domain, [45] 
analysed the potential of AI and ML for the medicine field, 
and identified changes and challenges to reach accurate and 
comprehensive diagnosis. [1] presented a review of different 
solutions, approaches and perspectives of AI and ML, espe-
cially for the healthcare sector. Authors included a critical 
vision, where they pointed out some issues to be improved 
in order to guarantee the privacy and data security, and to 
enhance accuracy. Recently, [52] provided a review on cur-
rent computational approaches applied in the spectrum of 
neurodegenerative diseases.

Focusing on neurodegenerative diseases and Python-
developed studies, [29] performed a ML-based analysis to 
perform data-driven diagnosis of dementia and used post-
mortem confirmed cases as a gold-standard; [43] imple-
mented a pipeline based on a DeepSymNet architecture to 
detect the AD progression pattern. Recently, [53] applied the 
feature engineering to build voice biomarkers and improve 
the early detection of Parkinson disease. [8] analysed a group 
of individuals diagnosed with both behavioural and language 
variants FTD, using a deep learning algorithm. [17] assessed 
18F-2-fluoro-2-deoxy-D-glucose positron emission tomogra-
phy (18F-FDG PET) brain images from Alzheimer’s Disease 
Neuroimaging Initiative (ADNI) dataset and a retrospective 
independent test set through a convolutional neural network 
of InceptionV3. [14] tackled the classification of Alzhei-
mer’s disease into four classes using 3D Diffusion Tensor 
Imaging (3D-DTI) processing.

Neurodegenerative diseases include a wide spectrum of 
disorders with different clinical manifestations and patho-
logical patterns, where an early accurate diagnosis is chal-
lenging. AD is one of the most prevalent [20] and causes 
a progressive and irreversible brain damage that prevents 
patients from performing daily activities. FTD is the third 
cause of dementia, particularly the behavioral variant 
(bvFTD), and its onset occurs at middle-age [20]. In this 

work, we specifically focus on AD and FTD, although other 
neurodegenerative diseases could be similarly addressed by 
our proposal.

The assessment of patients who suffer from neuro-
degenerative disorders entails the application of neuro-
imaging techniques, neuropsychological tests and the 
clinical histories [23]. Neuropsychological tests assess 
the cognitive function affected by AD and FTD. Among 
the neuroimaging techniques, 18F-FDG PET is a mini-
mally Invasive technique. 18F-FDG PET gives a map of 
brain coordinates associated to metabolism rates, which 
measure the alterations of glucose consumption in the 
brain. The presence of alterations in brain metabolism 
has proven to be a useful biomarker for early diagnosis of 
AD and FTD [7, 26, 39]. These techniques provide a large 
volume of data [10], which require experts to be trained 
in their analysis and interpretation, but the risk of inac-
curate diagnoses is real, especially considering the need 
of early detection of these disorders [18]. In this con-
text, ML techniques are a reliable alternative for design 
decision-making models that support specialists in the 
early diagnosis of the disease, monitoring and designing 
personalized treatments [45], where accuracy is extremely 
important.

Last decade, many researchers have demonstrated their 
potential for supporting decisions-making in the clinical 
arena [11, 25, 34]. However, to the best of our knowledge, 
we cannot find any other framework in the literature that 
targets the fully automated diagnosis of AD and FTD from 
multiple and heterogeneous data sources. The proposed 
computational tool embodies all the required steps to deal 
with the data modelling process.

Thus, it integrates the following functionalities:

1. An automate methodology for dataset preprocessing, 
including imputation techniques to deal with missing, 
outliers and categorization of nominal variables.

2. A feature engineering module implemented by means 
of evolutionary algorithms to extract the most relevant 
features for the diagnosis.

3. A meta-model based on evolutionary grammars and 
Bayesian networks (BN) for multi-class classification.

4. A basic visualization tool.
5. Different tools for assessing the results.

This work is structured as follows. Firstly, the framework 
designed is presented. Subsequently, results of the tests 
using the proposed AI-based tool are summarized and the 
following section discusses the results of the test case. Next, 
the conclusion are presented. Finally, the general methodol-
ogy is described.


Medical & Biological Engineering & Computing 

1 3

2  Methods

This is a Python-based framework that makes the data 
modelling easier to be computed and it is fully extendable 
thanks to its modular design from the data-driven point of 
view. According to the general scheme presented in Fig. 1, 
this Python-based framework provides resources to address 
data pre-processing, feature selection, a wide set of machine 
learning models, different AI-based modelling strategies 
with mono-objective and multi-objective evolutionary algo-
rithms. It also implements a multiclass classification model 
using EG or Bayesian classifiers. In addition, it provides 
graphical evaluation tools based on different metrics to asses 
the results obtained.

Focusing on supplying a fast, robust and reliable AI-based 
tool, the proposed framework is able to deal with differ-
ent datasets, including cognitive evaluation, neuroimaging 
techniques, and the patients’ history to help in the diagnosis 

of AD and FTD, two neurodegenerative diseases that may 
present similar symptoms and cognitive and behavioral defi-
cits. Although episodic memory dysfunction is one of the 
cognitive hallmarks of AD, FTD usually presents also these 
symptoms. Similarly, behavioral deficits are increasingly 
recognized symptoms in AD [24, 41]. The management and 
organisation of data are carried out through a relational data-
base, particularly MySQL. Data are structured in indexed 
tables that ensure the accessibility, availability and simplifies 
the data preprocessing. Additionally, relational databases 
and processes are implemented to easily incorporate new 
data and guarantee the data integrity.

As aforementioned, this framework manages three 
types of data: (1) Demographic Data provide variables that 
describe the sample; (2) Cognitive Test Data contain vari-
ables associated with cognitive tests, where each cognitive 
test provides a rating scale and scores to identify specific 
kind of cognitive problems and abilities. These tests gather 

Fig. 1  General scheme of the AI framework proposed, including data pre-processing, feature engineering and IA-based modelling


 Medical & Biological Engineering & Computing

1 3

information about the following cognitive function: memory, 
visuospatial, executive, attention and language. (3) Brain 
Metabolism Data include the brain hypometabolism data 
from the FDG-PET analysis.

Regarding the brain regions, this framework considers 
two different atlases, the Brodmann’s atlas (47 regions)
[6] and the Automated Anatomical Labelling (AAL) atlas 
(90 regions)[54]. Data related to the brain metabolism 
are divided into qualitative and quantitative. Quantitative 
data are defined by the number of hypometabolic voxels in 
a given region. A voxel is a 3D unit of an image, which 
can be associated with a single value, such as metabolism. 
Hypometabolic voxels are computed through the voxel-
based mapping analysis against a healthy control group. 
The qualitative data indicate whether a certain area is hypo-
metabolic or not. Although the number of voxels needed to 
consider a regions as hypometabolic may vary, in this study 
we selected a threshold of 1 voxel in each region. Therefore, 
a region was defined as hypometabolic when it has one or 
more hypometabolic voxels. Although we agree that this 
is a very limiting threshold, the purpose of this work is to 
present a parameterizable computing framework, in which 
this threshold, as many other parameters, can be selected by 
the expert user in order to meet its clinical goals. The clini-
cal value of the results obtained by the use of our proposed 
framework is out of the scope of this publication, but is has 
been already proven in [27].

In order to reduce the effort to reproduce experiments, 
adapt the implementation to the XAI perspective and gain 
trust and reliability, both data and the developed script to 
process data are available on https:// github. com/ green discb 
io/ neuro_ Minin gAndM odeli ng/ tree/ Diagn ostic_ aid_ model 
on request from computational and clinical researchers 1, 
where all necessary explanations are provided.

The aim of this work is the development of the compu-
tational framework, which is widely customisable and scal-
able. In this publication, we do not target the accuracy of the 
clinical assessment provided by the tool and presented in 
publications like [27], but we discuss around a case of study 
to show the functionalities of the AI-based tool.

This computational framework has been evaluated using 
a dataset, which includes cognitive and PET data from 329 
patients (171 AD, 72 bvFTD and 87 Healthy controls. As 
this work is focused on the presentation of the computa-
tional framework, we use our own dataset because the data 
labelling is controlled. Although, this framework has been 
designed to be able to work with publicly available data-
sets. This comprehensive dataset is structured using differ-
ent combinations in order to present a comprehensive and 

consistent study. Patients included in this study had a neu-
roimaging compatible with FDG-PET meeting the current 
diagnostic criteria [2, 38, 46]. The diagnosis was confirmed 
after over two years of follow-up. Spouses and volunteers 
were recruited as Healthy Controls meeting the following 
criteria: (1) absence of cognitive impairment, according to a 
MMSE score ≥ 27 and Clinical Dementia Rating of 0 (Mor-
ris, 1993); (2) absence of functional impairment measured 
by Functional Activities Questionnaire scores of 0 [40]. The 
exclusion criteria were as follows: (1) prior or current his-
tory of other neurological diseases (e.g. stroke, brain tumour, 
seizures); (2) history of psychiatric disease, alcohol or psy-
chotropic drugs abuse; (3) visual, hearing, or any physical 
problem with a negative impact on test performance.

Regarding data, the Institutional Research Ethics Com-
mittee from Hospital Clinico San Carlos approved the 
research protocol with the 1964 Helsinki declaration and 
its later amendments. Informed consent was obtained from 
all individual participants included in the study or their 
caregivers.

Once the dataset is defined the data preprocessing and 
feature selection tasks are carried out. Subsequently, AI-
based modelling strategies can be launched, and finally, 
analysing the results obtained through the available metrics 
in this framework.

3  Results

This section presents the framework design. The code is 
made available through the GitHub and Pypi platforms. Fig-
ure 1 represents the general scheme of the proposed frame-
work, which is divided into three different parts: (i) Data 
pre-processing, (ii) Feature engineering, and (iii) AI-based 
modelling.

3.1  Data pre‑processing

Considering the specifications in Section 2, the database has 
been structured according to the following layout, each brain 
atlas has two associated tables, one with hypometabolism 
quantitative data and the other with qualitative data. On the 
other hand, cognitive evaluations are subdivided into screen-
ing and specific tests. Within the specific test there are either 
raw scores (specific_raw) or scores corrected according to 
gender, age and years of education (specific_corrected).

Data pre-processing includes all the tasks described 
below.

Data cleaning This task analyses data and eliminates vari-
ables that are neither irrelevant o implicit in the data. Thus, 
PET date, date of birth, age of disease onset, date of visit, 
read/write and Mini Mental State Examination (MMSE) 

1 Data that support this study are available on request from research-
ers

https://github.com/greendiscbio/neuro_MiningAndModeling/tree/Diagnostic_aid_model
https://github.com/greendiscbio/neuro_MiningAndModeling/tree/Diagnostic_aid_model


Medical & Biological Engineering & Computing 

1 3

were excluded. It also examines the brain data for inconsist-
encies or incoherencies, e.g. 9 instances with normal brain 
metabolism which are classified as AD or FTD patients. This 
information is presented to the user in order to request an 
action on those instances and/or variables..

Processing of missing values This task is responsible for 
identifying empty values from the available dataset. It also 
handles the missing data imputation task. The applied impu-
tation techniques depend on each given dataset and predic-
tion model to be used, so its applicability to another dataset 
should be analysed. In this framework, missing values impu-
tation was carried out using the non-parametric MissForest 
imputation technique [50], which is able to identify non-
linear and complex relationships between variables. Miss-
Forest is an extension of the MICE methods that apply a 
multivariate and iterative imputation [4], and gives more 
realistic results than other parametric techniques [50].

Categorization of nominal variables This task is responsi-
ble for applying encoding techniques to nominal variables. 
One Hot Coding is the most frequently used coding scheme, 
which transforms a single variable with “n” different values 
into “n” binary variables. Each binary variable represents a 
single value and the presence is indicated with a 1 and the 
absence with a 0.

Since the first step of the analysis consisted of a selection 
of characteristics and each variable in the one hot vector rep-
resents a new characteristic, it was not necessary to remove 
a variable to avoid multi-collinearity problems.

3.2  Feature selection

In high dimensionality problems, identifying the most 
relevant attributes is a crucial step when modelling data 
through ML and the problem is an NP problem [13]. Reduc-
ing the dimensionality enhances interpretability, a key aspect 
under the XAI perspective, makes clinical diagnosis easier, 
improves the performance of classification models, reduces 
the computational cost and prevents the models from overfit-
ting [48]. This task aims to remove irrelevant and overlap-
ping features from the whole set of features, while retaining 
the most relevant ones. Hybrid approaches using wrapping 
techniques, and heuristic and metaheuristic search strategies 
[32, 59] are very efficient to explore the feature space. Fea-
ture selection via evolutionary algorithms [58], as one of the 
most popular metaheuristic, is selected for the implemented 
computational tool.

This AI-based tool integrates the feature selection 
through the PyWinEA module, a Python package devel-
oped on the top of the scikit-learn library that implements 
the most widely used genetic algorithms. This module is 

capable of working with data provided by current evalua-
tion and diagnostic techniques. PyWinEA package has been 
endowed with a basic GA and MOEA (NSGAII) to explore 
the feature space. These techniques and their use along this 
work are introduced below.

3.2.1  Evolutionary algorithms

Evolutionary algorithms (EA) are population-based tech-
niques inspired by the process of natural selection. They 
evolve a population of individuals, that represent potential 
solutions. Individuals will experiment variations to simulate 
the genetic changes, which guide the evolutionary process.

EAs show a high exploratory capacity, including dis-
continuous search spaces with a lower tendency for local 
maxima. This work considered a maximization problem 
given the interest in improving the models performance. 
The PyWinEA package defines the genotype of the indi-
vidual as an array of integer values of variable length. Each 
integer represents an attribute, and the mapping process 
consists of substituting the integer with the values asso-
ciated with the attribute. The fitness function is given by 
the classification model and its classification performance. 
PyWinEA implements two stochastic selection operators: 
fitness proportional selection and tournament selection, and 
two survivor selection strategies: elitism and annihilation. 
Finally, the mutation and recombination operators, random 
resetting and one-point crossover, are implemented as vari-
ation operators.

3.2.2  Multiobjective evolutionary algorithms

Most real problems require more than one metric to evaluate 
the quality of a potential solution. Frequently, there are sev-
eral objectives to maximize and usually, they are conflicting 
objectives. The optimal solutions in multi-objective optimi-
sation deal with the domination concept, which determines 
the non-dominated front of solutions also called Pareto’s 
front [19, 195–198]

Consequently, when there are two objective functions that 
are contradictory (e.g. the classification performance and the 
number of characteristics in the subset), a unique solution 
may not dominate the rest. In this situation, we are interested 
in finding the set of non-dominated solutions that are closest 
to the optimal Pareto’s front.

One of the most used MOEAs is the NSGAII [15], which 
has been implemented in PyWinEA. Solutions in the optimal 
Pareto’s front were evaluated by the hypervolume indicator 
(IH), which has been applied using the inclusion-exclusion 
algorithm [57]. IH is a unitary measure defined in [5] as “the 
d-dimensional volume of the hole-free orthogonal polytope”.

A set of supervised classification algorithms has been 
used to evaluate the quality of the solutions in the feature 


 Medical & Biological Engineering & Computing

1 3

Fig. 2  Structure of the PyWinEA package used for feature selection This package is available through PyPi and GitHub


Medical & Biological Engineering & Computing 

1 3

selection process implemented in PyWinEA Fig. 2, and to 
develop (ML)-based solutions.

3.3  ML‑based solutions

This section presents the methodology used in developing 
several learning models to assist clinicians in the diagnosis 
of AD and FTD.

3.3.1  Machine learning models

This computational tool integrates several classification 
models to provide clinicians with a widely comparative 
framework. In this light, different classifiers and their per-
formance can be analyzed using the features selected by the 
EA algorithm approach. Although any parameter of the clas-
sification algorithms can be adjusted, for each classifier we 

Fig. 3  Meta-model scheme considering a problem with three classes A, B and C. The modeling strategy takes the output of the binary classifiers 
of the previous layer and the class assigned to an example will be the one with the highest value

Fig. 4  Genotype to phenotype mapping process following the syntax described in the grammar. The next node to be chosen during the mapping 
process is determined by the genotype codon module and ends when a terminal node is reached


 Medical & Biological Engineering & Computing

1 3

only mention the most significant ones when addressing this 
particular problem.

• Bernoulli naive Bayes. This model allows to adjust the 
prior probabilities of each class and the smoothing of the 
variance.

• Support Vector Machines. The RBF (Radial Basis 
Function) was used as a kernel function and the γ and C 
parameters were adjusted.

• K-Nearest Neighbors. Different number of neighbours 
and distance metrics were explored.

• Decision Trees. Alternative ways of partitioning the 
nodes (using the best split given by the Gini criterion 
or by randomly partitioning the nodes), the maximum 
depth, the minimum number of samples in each split and 
the minimum number of samples to declare a node as a 
leaf were the adjusted hyperparameters.

In addition, three ensembles based on decision trees were 
used. For each one, the number of base estimators and their 
hyperparameters were tuned:

• Random Forest.
• AdaBoost. Different learning rates were considered.
• Gradient Boosting. The learning rate, the fraction of 

samples used to train each of the estimators as well as 
the loss function were adjusted.

Four functionalities were also developed: (1) Graphical 
evaluation using training and validation; (2) General func-
tionalities such as loading datasets and exception control; (3) 
Graphical representations among several classification mod-
els; (4) Performance evaluation using accuracy, F1-score, 
precision, recall, sensitivity and specificity metrics, as well 
as receiver operating characteristic (ROC) curve and clas-
sification errors.

Using these functionalities, every classification model 
provides graphical resources to evaluate the perfor-
mance of the results, thus the confusion matrix, accuracy, 
F1-score, precision, recall, learning rate, sensitivity, speci-
ficity, the area under Receive Operating Characteristics 
(ROC) curve and the classification errors are graphically 
represented.

Fig. 5  Grammar to handle the 
genotype to phenotype mapping 
process


Medical & Biological Engineering & Computing 

1 3

The proposed classifiers cover most of the problems that 
can be defined with the data processed in the Section 3.1 
section. Moreover, a new multiclass classification strategy 
for cognitive tests is described below.

3.3.2  Meta‑model strategy

This work explores a new high quality strategy to improve 
the classification performance especially designed for cogni-
tive tests when tackling One vs Rest problems. It integrates 
the information provided by each binary classifier into a 
multiclass single model.

The proposed meta-model is a two-layers design, as pre-
sented in Fig. 3, according to a stacking strategy [56]. The 
first layer is responsible for the binary classification, oper-
ating in a different feature space and using characteristics 
selected during the feature engineering process. This layer 
uses SVMs as binary classifiers and forwards their results 
to the second layer that generates a multiclass output. The 
second layer applies a modeling strategy based on evolution-
ary grammars or Bayesian networks. In this model, each of 
the binary classifiers of the first layer operates in a different 
feature space. Features selected during the feature selection 
phase were used. Additionally, every binary classifier was 
trained using different examples, which were driven by the 
binary problem addressed.

Every target class is associated with one or more binary 
classifiers. The classification process generates a probabil-
ity or binary value, which indicates the class that a sample 

Fig. 6  Methodology designed 
for the development and valida-
tion of EG as modeling strategy. 
A class stratification following 
a CV scheme was implemented. 
If classes were imbalanced, 
classes would be balanced to the 
minority class

Table 1  Parameters used for the development of evolutionary gram-
mars using PonyGE2

a  The crossover strategy is analogous to the one-point operator but by 
mixing tree structures. The mutation operator is applied only to the 
population resulting from the crossover

Parameter Parameter setting

Algorithm NSGAII [15]
Population size 300
Elite size 30
Generations 1500
Crossover subtreea 
Crossover probability 0.9
Mutation subtreea 
Mutation events 1
Selection proportion 0.5
Fitness 1 F1
Fitness 2 Minimizing the 

number of 
nodes

Maximum derivation tree initialization depth 10
Maximum derivation tree depth 15
Initialization strategy PI grow [21]


 Medical & Biological Engineering & Computing

1 3


Medical & Biological Engineering & Computing 

1 3

belongs to. The modeling strategy associates the results of 
the binary classifiers to a single real value. The highest value 
will identify the final class.

Figure 3 considers a problem with three classes A, B 
and C and three binary classifiers CAvsB, CAvsC and CBvsC, 
which use different characteristics to perform the classifica-
tion. Given a training dataset T, the first step consists on 
the generation of three datasets T1, T2 and T3. The dataset 
T1 associated with CAvsB is composed of the characteristics 
selected for the A vs B problem and the examples labelled 
with classes A and B excluding the examples belonging to 
C. The same is applied to datasets T2 and T3.

During the prediction phase, we will have a modelling 
strategy associated to each class. The modelling strategy 
associated to class A will receive the outputs of classifiers 
CAvsC and CAvsB, the same for the rest of the classes and their 
associated classifiers. The class selected will be decided 
upon the modelling strategy that provides the highest value.

Evolutionary grammars as a modelling strategy Evolu-
tionary grammars (EG) are part of EAs and an approach 
to genetic programming. Solutions are generated using a 
grammar representation. EG has obtained promising results 
in many domains such as the prediction of migraine crisis 
[42] or glucose levels [12, 30, 55].

Representing the genotype with an array of integer or 
binary values, the genotype-to-phenotype decoding uses a 
Backus Naur Form (BNF) grammar [47]. Figure 4 describes 
an example of the mapping process. A grammar is represented 
by the tuple {N,T,P,S} where N and T are the non-terminal 
and terminal symbols, respectively; P are the production rules 
applied to generate T from N, and S is the initial expression. 
The result is a tree structure where S represents the root, N 
the intermediate nodes, P the potential paths and T the leaves.

Figure 5 shows the grammar used to define the geno-
type-to-phenotype mapping process, where the gender and 
age variables are not included to avoid bias. The variable x 
refers to the set of predictions made by the binary classifiers 
of the previous layer, therefore the index indicates the posi-
tion of the output of the algorithm associated with a given 
binary problem.

Figure 6 shows the methodology followed by the pro-
posed meta-model using EG as modelling strategy. The steps 
are described below:

1. The dataset was divided into 5 disjunct datasets with class 
stratification following a cross-validation (CV) scheme.

2. One of the datasets is reserved independently for the 
validation process. With the remaining four, the binary 
classification phase is launched for 10 iterations with a 
5-CV scheme. The predictions of the binary classifica-
tion models generate a new dataset.

3. If classes were unbalanced, at this point they would be 
balanced to the minority class. This is carried out by ran-
domly removing predictions from the majority classes 
until all classes are balanced.

4. The grammar development uses 50% of the dataset sam-
ples for training and 50% for testing. This process can be 
defined as a new supervised classification problem

5. This grammar is integrated into the model as a model-
ling strategy. The validation of the meta-model is per-
formed with the independent dataset from the step 1.

6. The steps from 2 to 5 are repeated for each of the 5 
separate folds in step 1.

The described procedure allows to make an approxima-
tion of the generalization capacity of the meta-model that 
incorporates EG in the second layer. Based on the approach 
of [44] and given its influence on AD [3], the methodology 
described was repeated after introducing the gender and age 
variables into the prediction dataset and including them in 
the grammars. Thus, the production rule VII was modified 
to include the gender (x[5]) and age (x[6]) variables:

Fig. 7  The structure of the database designed. Tables brodmann_
qualitative/quantitative and aal_qualitative/quantitative, correspond-
ing to the brain metabolism data have been shortened. Complete data 
are available on request on GitHub

◂

2 Supplementary Material is provided for this multiclass meta-model

The grammar was implemented using the Python package 
PonyGE2 [22]. Table 1 shows the default selected param-
eters, although other parameter values can be applied.

Bayesian networks as a modelling strategy Bayesian net-
works represent a sub-type of probabilistic graphical models. 
This type of model uses directed acyclic graphs (DAG) to 
represent the probabilistic relationship between variables. 
Nodes correspond to variables and an arc between two nodes 

shows the dependency relationship. In this type of models, 
every node is associated to a local probability distribution, 
which is usually specified by a conditional probability table 
(CPT), and depends on its parents [33, 42–92]. Each node 
receives an input and gives the probability distribution of the 
variable associated to the node, as an output 2.


 Medical & Biological Engineering & Computing

1 3

In this computational tool, the meta-model based on 
Bayesian networks was implemented on the top of Pome-
granate library [49]. Each node in the network corresponds 
to a binary classifier associated with a given problem. Thus, 
the Bayesian network allows to model the joint probability 
distribution of the output of the binary classifiers by assign-
ing a probability to each possible combination of outputs. 
The two steps required to build a Bayesian network include 
learning the structure and determining the probability dis-
tribution associated with each node based on the data. The 
structure was determined using a score-base approach, 
applying dynamic programming and the A* algorithm in 
order to maximize the probability of the data given the 
model by means of maximum likelihood estimation.

The dataset generated by the grammars during the step 
1 was used for the learning network. Predictions were bina-
rized by rounding up to the nearest integer. A Bayesian 
network was developed to model the joint probability for 
each of the classes in such a way that, the label assigned to 
a new example corresponds to the class whose associated 

Bayesian network, given the evidence (binary classifier out-
puts), yields the maximum probability.

4  Discussion in a case of study

This section presents some outcomes that can be achieved 
by the proposed framework in a particular case study of 
neurological diseases: clinical diagnosis of AD and FTD. 
A description of data in this study is presented in Section 2. 
Although, this study is not focused on the clinical analysis 
of AD and FTD by means of the proposed framework, we 

present a case of study using PET data in order to show 
the functionalities of the tool. Particularly, data preprocess-
ing phase, feature engineering using NSGAII, classifica-
tion using different ML algorithms, multiclass meta-model 
with EG and Bayesian networks, and some of the graphical 
resources to outline the results. We expect that, with this 
case of the study, the reader will understand the capabilities 
of the proposed computing framework and will be able to 
value the potential of the tool in its clinical practice.

4.1  Data preprocessing

Regarding the data pre-processing described in Section 3.1, 
data were structured in a relational MySQL database shown 
in Fig. 7, which can be extendable as needed.

Once the database was ready, data were analysed within 
the cleaning process and removed irrelevant data. Then, 
the analysis of the missing values was carried out, which 
represented 11.28% in the database. After its identification, 
the Missforest imputation technique was applied with 100 

Table 2  Selected parameters for the NSGAII algorithm used to carry 
out the feature selection

a  k = 2, winners = 1 without replacement
b  5 repetitions of 5-CV with class stratification
c  Defined by equation: 1 − Length (Individual)

Num. Features
  

Algorithm parameter Parameter setting

Mutation strategy Random Resetting
Selection strategy Tournament  selectiona 
Fitness 1 Accuracy or  F1b 
Fitness 2 Number of  featuresc 
Number of different initializations 2

Fig. 8  Comparative results of 
the Feature engineering using 
NSGAII, Naive Bayes and 
SVM algorithms as the fitness 
function. X axis represents 
the experiments addressed, 
and Y axis is the number of 
features obtained. Block means 
Demographic data + Cognitive 
test data without separating 
the scores associated with the 
same test


Medical & Biological Engineering & Computing 

1 3

as the maximum number of interactions and the following 
parameters settings: Mean for the initial imputation; 1e − 03 
as early stopping; 50 as number of trees (default parameters 
for decision trees); Mean squared error as the evaluation cri-
terion of each partition, and random for splitting each node.

The last step was dealing with nominal variables follow-
ing the methodology presented in Section 3.1. The impu-
tation generates real values, which will be rounded to the 
nearest integer. Next, the one hot coding scheme is applied 
and as many variables as different values were added.

4.2  Features engineering

The aforementioned use case was addressed by bi-objective 
MOEA approach, previously described in Section 3, and 
a customization of hyperparameters as shown in Table 2, 
where the two objective fitness function are also described. 
10 iterations of 5-CV were run and the performance of the 
best subset obtained for each classifier was evaluated. The 
algorithm were run for 10 iterations of 5-CV and the per-
formance of the best subset obtained for each classifier was 
evaluated.

The NSGAII MOEAs obtain several solutions as part 
of the Pareto Front. The set of features selected for each 
potential solution can be visualized by the physicians to vali-
date the clinical impact. Figure 8 shows results for several 
datasets: Demographic, Cognitive Test, and Brain Metabo-
lism Data. Table 3 shows the features selected with SVM 
as fitness function for AD and FTD vs HC. In this example, 
Bayesian classifiers obtain the best results, with an average 
reduction of features of 91.52% compared to 87.97% for 
SVMs classifiers. Considering that the reduction percentage 

reached for both classifiers is really high, it is necessary to 
evaluate the performance each individual with Bayesian and 
SVMs classifiers as fitness function.

The solutions provided by the feature engineering 
approach are fed to the ML-based phase: classifiers and 
meta-model using EG as in Fig. 1, described in Section 3. 
For each problem, only one of the solutions provided in 
the feature selection phase has been selected for testing. 
Accuracy and F1-score as more qualified metrics have been 
selected for the analysis. We remind the reader that this work 
does not focus on the clinical analysis but on the possibili-
ties opened by the developed tool. Hence, selected problems 
from the case of study will be presented in order to evaluate 
such capabilities of the computational tool.

As a result of the evaluation tests, the SVMs classifiers 
performed slightly better than the Bayesian classifiers as 
shown in Fig. 9. Table 4 presents the average values of the 
metrics used to assess the solutions from the Pareto front 
obtained with NSGA II, and applied in this case of study. 
High values of sensitivity and specificity indicate the reli-
ability in predicting positive and negative cases, respec-
tively. The significance of these results is evaluated using 
the p − value in Table 5. Very small p − values confirm the 
reliability of the study. According to the results, our ML-
based tool is able to clearly differentiate between individuals 
with AD, FTD and healthy controls, especially when PET 
data are provided. A slightly lower performance is obtained 
working with cognitive dataset. Although cognitive test per-
formance is closely associated with the brain metabolism 
of some regions, not all brain regions are covered during 
the neuropsychological examinations [16, 31, 37]. In addi-
tion, other factors such as cognitive reserve may limit the 

Table 3  AD − FTD vs HC: 
Features selected using NSGA 
II with SVM as fitness function 
for Cognitive, Block, PET and 
PET +Cognitive

Cognitive Block PET PET +Cognitive

ace3_fluidity education_years o1_l f1m_l
rocf_type_3min_4 cbtt_direct brodmann_47 brodmann_35
mst_direct cbtt_indirect f1m_l pcl_r
tmt_a tmt_a brodmann_35 sma_l
fcsrt_lt tmt_b brodmann_37 put_r
rocf_30min fcsrt_l1 cau_l
education_years fcsrt_lt brodmann_19

fcsrt_total fcsrt_dif_free
fcsrt_dif_free ace3_total
fcsrt_dif_total fcsrt_dif_total
addensbrook sdmt
ace3_total rocf_type_copy_4
ace3_attention rocf_type_3min_7
ace3_memory rocf_type_copy_3
ace3_fluidity
ace3_language
ace3_visospatial


 Medical & Biological Engineering & Computing

1 3

diagnostic capacity of neuropsychological examination in 
some cases [51].

Moreover, this tool provides information about the evolu-
tion of the feature engineering process by means of a graphi-
cal representation of the evolution of the convergence of the 
MOEAs. For instance and related to the case of use, Fig. 10 
represents the MOEA convergence for the PET datatests 
addressed using Naive Bayes and SVMs.

In addition, this tool also supplies different visual sup-
port representations to evaluate the performance of the 
classifiers. It implements the receiver operating charac-
teristic (ROC), the confusion matrix, and a comparative 
graphical representation of the variation in classification 
performance among the different classifiers with respect 
to the best result obtained and with the best feature subset 

during the featured engineering phase (Fig. 9). Fig. 11 
shows the variation performance for the case of use, 
where the hyperparameters were adjusted using a grid 
search strategy. Particularly, regularization parameters λ 
and C for SVM, the loss function (binomial deviance or 
exponential), the percentage of examples used to train the 
base models of the ensemble 3 and the number of charac-
teristics 4. SVM and Gradient Boosting obtained the best 
performance with F1 − score = 0.925, although the rest 
of algorithms also reached high values. Table 6 shows the 
p-values computed from the performance metrics along 
all iterations. It can be seen that the obtained values are 
much smaller than 0.05 and we can state that the results 
are significant.

One of the most important challenges for the clinical 
experts is the interpretability of the models. In this light, 
decision tree models have been developed to provide this 
capability to clinicians. This kind of algorithms provide a 
clear and simple set of rules that allow to distinguish between 
different clinical conditions. Figure 12 represents the decision 
tree for the case of use that we are presenting to give insights 

Fig. 9  Classification perfor-
mance achieved by one of the 
best feature subsets given by 
the NSGAII for each of the 
algorithms used to evaluate the 
fitness, when individuals with 
AD, FTD and healthy controls 
were evaluated. Cognitive C. 
denotes groupings of scores 
from the same cognitive test; 
Cognitive. I. considers each of 
the scores independently

Table 4  Pareto front assessment 
resulting from NSGA II

Dataset Accuracy Precision Sensitivity Specificity F1−score

Naive Bayes
  Cognitive C. 0.849 0.914 0.879 0.766 0.895
  Cognitive I. 0.867 0.908 0.913 0.741 0.910
  PET 0.881 0.919 0.918 0.782 0.918
  PET + Cognitive 0.902 0.954 0.912 0.875 0.932

Support Vector Machine
  Cognitive C. 0.837 0.941 0.832 0.852 0.882
  Cognitive I. 0.872 0.927 0.898 0.800 0.912
  PET 0.885 0.919 0.924 0.782 0.921
  PET + Cognitive 0.926 0.966 0.933 0.906 0.949

3 This parameter generates a behavior analogous to bagging helping 
to reduce the variance of the model
4 This refers to the attributes used to train each of the base classifiers. 
This parameter generates a behavior similar to randomization. The 
parameters taken were 

√

num_features or log2(num_features)


Medical & Biological Engineering & Computing 

1 3

about the functionalities of this framework. This result was 
validated by expert neurologists who agreed on the clini-
cal significance. According to the expert neurologists, in the 
decision tree, several key areas in the pathophysiology of AD 
and/or FTD are included. Specifically, regions in the frontal 
lobe (frontal superior medial gyrus and inferior frontal gyrus/
Brodmann area 47), the temporal cortex (Brodmann area 37) 
and occipital lobe. According to the tree, the hypometabolism 

of any of these areas suggests the presence of a neurodegen-
erative disorder, while a normal metabolism in all areas is 
required to be classified as control.

4.3  Meta‑models

This meta-model was designed to work with independ-
ent cognitive tests scores. The result is a multiclass 

Table 5  P−value for metrics 
applied to assess results from 
NSGA II

P−value

Dataset Accuracy Precision Sensitivity Specificity F1−score

Naive Bayes
  Cognitive Ċ 3.02E-25 1.83E-16 2.76E-21 1.50E-15 9.14E-25
  Cognitive I. 1.14E-27 2.27E-24 2.81E-22 3.82E-23 9.54E-28
  PET 2.19E-28 1.25E-27 1.72E-19 1.91E-26 9.72E-28
  PET + Cognitive 7.10E-29 3.72E-22 3.52E-23 8.59E-21 3.88E-28

Support Vector Machine
  Cognitive C. 3.19E-28 1.37E-13 5.43E-27 8.74E-13 3.84E-27
  Cognitive I. 3.41E-27 1.42E-18 2.71E-25 3.54E-17 1.65E-27
  PET 2.17E-26 3.80E-27 6.16E-19 1.91E-26 7.95E-26
  PET + Cognitive 1.67E-22 6.19E-14 4.91E-19 1.12E-13 2.63E-22

Fig. 10  Convergence of the NSGAII for the Neurodegenerative Dis-
orders (NEU) vs Healthy Controls (HC) diagnosis including PET data 
using (a) Naive Bayes classifier or (b) SVMs. NEU represents AD 

or bvFTD disorders. The pareto front subfigure is defined by equa-
tion: 1 − Length (Individual)

Num. Features
 . The results of two different initialisations are 

shown


 Medical & Biological Engineering & Computing

1 3

classification model, which integrates the output of the 
binary classifiers using EG or Bayesian classifiers. In order 
to validate this module, we show the results obtained using 
the one vs rest classification models with the AD condi-
tion. Figure 13 shows the results obtained using accuracy 
and F1-score as metrics to evaluate the performance. It is 
clearly observed that the strategy with EG improved the 
classification results compared to the best results obtained 

using the binary classifiers independently and Bayesian 
networks. Even after including gender and age variables, 
which produce a loss of performance, EG overcomes the 
performance of the previous ones.

Results from this model with EG have demonstrated a 
great potential to improve the classification accuracy with 
limited datasets, as cognitive assessments.

Table 6  P-value for metrics 
applied in the case study

P-value

Classifier Accuracy Precission Sensitivity Specificity F1-Score

Naive Bayes 1.95E-25 1.08E-24 4.05E-17 8.25E-23 6.42E-25
SVM 3.20E-25 2.35E-24 1.96E-17 1.30E-22 3.47E-25
KNN 2.16E-22 1.32E-23 5.38E-16 1.30E-22 4.04E-22
Decision Trees 4.59E-25 3.54E-24 7.41E-19 1.30E-22 1.02E-24
Random Forest 9.36E-25 1.37E-24 2.09E-15 1.30E-22 4.11E-24
Gradient Boosting 3.20E-25 2.35E-24 1.96E-17 1.30E-22 3.47E-25

Fig. 11  Variation in clas-
sification performance for the 
NEU (AD or bvFTD) vs HC 
diagnosis and PET data. The 
reference values Acc = 0.885, 
Pre = 0.919, Rec = 0.924, F1 = 
0.921 correspond to the highest 
scores in Fig. 10

Fig. 12  Decision trees cor-
responding to the classifica-
tion problem NEU (AD or 
bvFTD) vs HC is presented as 
an example of a more interpret-
able graph. (Performance: Acc 
= 0.885 ± 0.04;Pre = 0.919 ± 
0.03;Rec = 0.924 ± 0.04;F1 
= 0.921 ± 0.03). Squares and 
ellipses represent nodes and 
leaves, respectively. The color 
blue denotes that most instances 
belong to the class indicated 
on the leaf but at least 1/4 cor-
respond to the opposite class


Medical & Biological Engineering & Computing 

1 3

5  Conclusions

This paper has presented the design and implementation of a 
machine learning–based framework for the automatic diag-
nosis, especially, of neurodegenerative diseases. Neuropsy-
chological and neuroimaging assessments provide large, het-
erogeneous datasets, with high possibilities for knowledge 
mining and the development of diagnostic tools. Our tool 
is proposed under the XAI perspective to support the clini-
cians in the diagnosis, as it provides all the steps required to 
analyse these datasets, from the data preprocessing, feature 
selection through an evolutionary approach, and modeling 
of the mentioned diseases.

As a case of study, we have evaluated the performance of 
our approach in the diagnosis of two widespread neurode-
generative diseases, AD and FTD. It was clearly observed 
how the proposed framework allows a smooth processing of 
the cognitive and image assessments, with a high reduction 
in the number of features needed for the diagnosis, and a 
high accuracy in the classification. A strong effort has been 
put on the interpretability of the results, showing how a 
data-centric point of view helps to understand AD and FTD 
disorders.

Electronic supplementary material The online version of this article 
(https:// doi. org/ 10. 1007/ s11517- 022- 02630-z) contains supplementary 
material, which is available to authorized users.

Author contribution Fernando Garcia-Gutierrez, José Luis Ayala, 
Jordi A Matias-Guiu. Data acquisition: Vanesa Pytel, María Nieves 
Cabrera. Methodology: Fernando Garcia-Gutierrez, Jose Luis Ayala. 
Writing original draft preparation: Fernando Garcia-Gutierrez, Josefa 
Diaz-Alvarez, Jose Luis Ayala, Jordi A Matias-Guiu. Writing review 
and editing: all. Formal analysis and investigation: Fernando Garcia-
Gutierrez, Josefa Diaz-Alvarez, Jose Luis Ayala. Funding acquisition: 

Jorge Matias-Guiu, Jordi A Matias-Guiu, Josefa Diaz-Alvarez, Jose 
Luis Ayala. Supervision: Josefa Diaz-Alvarez, Jorge Matias-Guiu, Jose 
Luis Ayala, Jordi A Matias-Guiu

Funding Open Access funding provided thanks to the CRUE-CSIC 
agreement with Springer Nature. This work is supported by the Insti-
tuto de Salud Carlos III through the project INT20/00079 (co-funded 
by European Regional Development Fund, A way to make Europe) 
and the Spanish Ministry of Science and Innovation under project 
PID2019-110866RB-I00, part of the Grant PID2020-115570GB-C21 
funded by MCIN/AEI/10.13039/501100011033 and Junta de Extrema-
dura, project GR15068.

Data availability Al data are available in a systematic database created 
by the Department of Neurology of the San Carlos Hospital, in Madrid, 
and accessible to clinicians and researchers participating in the project. 
These data are not publicly available due to data privacy laws.

Code availability Code is available in PyPi and GitHub

Declarations 

Conflict of interest The authors declare no competing interests.

Ethics approval and consent to participate The Institutional Research 
Ethics Committee from Hospital Clinico San Carlos approved the 
research protocol with the 1964 Helsinki declaration and its later 
amendments. Written informed consent was obtained from all indi-
vidual participants included in the study or their caregivers.

Consent for publication Not applicable

Open Access This article is licensed under a Creative Commons Attri-
bution 4.0 International License, which permits use, sharing, adapta-
tion, distribution and reproduction in any medium or format, as long 
as you give appropriate credit to the original author(s) and the source, 
provide a link to the Creative Commons licence, and indicate if changes 
were made. The images or other third party material in this article are 
included in the article's Creative Commons licence, unless indicated 

Fig. 13  Performance obtained 
in multi-class classification 
integrating the output binary 
classifiers into a multiclass 
output. Ten repetitions of 5 CV 
were applied for the validation 
process of the reference model 
and the modeling strategy with 
Bayesian networks (described in 
Section 3.3.2)

https://doi.org/10.1007/s11517-022-02630-z


 Medical & Biological Engineering & Computing

1 3

otherwise in a credit line to the material. If material is not included in 
the article's Creative Commons licence and your intended use is not 
permitted by statutory regulation or exceeds the permitted use, you will 
need to obtain permission directly from the copyright holder. To view a 
copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

References

 1. Ahmed Z, Mohamed K, Zeeshan S, Dong X (2020) Artificial 
intelligence with multi-functional machine learning platform 
development for better healthcare and precision medicine. Data-
base 2020. https:// doi. org/ 10. 1093/ datab ase/ baaa0 10

 2. Albert MS, DeKosky ST, Dickson D, Dubois B, Feldman HH, Fox 
NC, Gamst A, Holtzman DM, Jagust WJ, Petersen RC, Snyder PJ, 
Carrillo MC, Thies B, Phelps CH (2011) The diagnosis of mild 
cognitive impairment due to Alzheimer’s disease: recommenda-
tions from the National Institute on Aging-Alzheimer’s Associa-
tion workgroups on diagnostic guidelines for Alzheimer’s disease. 
Alzheimer’s Dement 7:270–279

 3. Association A (2019) 2019 Alzheimer’s disease facts and figures. 
Alzheimer’s & Dementia 15:321–387

 4. Azur MJ, Stuart EA, Frangakis C, Leaf PJ (2011) Multiple imputa-
tion by chained equations: what is it and how does it work?. Int J 
Methods Psychiatr Res 20:40–49

 5. Beume N, Fonseca CM, Lopez-Ibanez M, Paquete L, Vahrenhold 
J (2009) On the complexity of computing the hypervolume indica-
tor. IEEE Trans Evol Comput 13:1075–1082

 6. Bitam S, Mellouk A (2006) Brodmann’s localisation in the cer-
ebral cortex. Springer, Berlin, p 298

 7. Brown KJ, Bohnen NI, Wong KK, Minoshima S, Frey KA (2014) 
Brain pet in suspected dementia: patterns of altered fdg metabo-
lism. Radiographics 34:684–701

 8. Brzezicki MA, Kobetíc MD, Neumann S, Pennington C (2019) 
Diagnostic accuracy of frontotemporal dementia. an artificial 
intelligence-powered study of symptoms, imaging and clinical 
judgement. Adv Med Sci 64:292–302. https:// doi. org/ 10. 1016/j. 
advms. 2019. 03. 002

 9. Cabitza F, Gensini GF (2017) Unintended consequences of 
machine learning in medicine. JAMA 318:517518. https:// doi. 
org/ 10. 1001/ jama. 2017. 7797

 10. Casanova R, Wagner B, Whitlow CT, Williamson JD, Shumaker 
SA, Maldjian JA, Espeland MA (2011) High dimensional clas-
sification of structural MRI Alzheimer’s disease data based on 
large scale regularization. Front Neuroinformatics 5:22

 11. Castro AP, Fernandez-Blanco E, Pazos A, Munteanu CR (2020) 
Automatic assessment of Alzheimer’s disease diagnosis based on 
deep learning techniques. Comput Biol Med, 103764

 12. Contreras I, Oviedo S, Vettoretti M, Visentin R, Veh J (2017) 
Personalized blood glucose prediction: a hybrid approach using 
grammatical evolution and physiological models. PLOS ONE 
12:1–16. https:// doi. org/ 10. 1371/ journ al. pone. 01877 54

 13. D. S, R. S (1994) Np-completeness of searches for smallest pos-
sible feature sets. In: AAAI Symposium on Intelligent Relevance, 
AAAI Press, pp 37–39

 14. De A, Chowdhury AS (2020) Dti based Alzheimer’s disease classifi-
cation with rank modulated fusion of cnns and random forest. Expert 
Syst Appl 114338. https:// doi. org/ 10. 1016/j. eswa. 2020. 114338

 15. Deb K, Pratap A, Agarwal S, Meyarivan T (2002) A fast and elit-
ist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol 
Comput 6:182–197

 16. Delgado-Álvarez A, Cabrera-Martn MN, Pytel V, Delgado-Alonso 
C, Matías-Guiu J, Matias-Guiu JA (2021) Design and verbal flu-
ency in Alzheimer’s disease and frontotemporal dementia: clinical 

and metabolic correlates. J Int Neuropsychol Soc. https:// doi. org/ 
10. 1017/ S1355 61772 10011 44, 1–16

 17. Ding Y, Sohn JH, Kawczynski MG, Trivedi H, Harnish R, Jen-
kins NW, Lituiev D, Copeland TP, Aboian MS, Mari Aparici C, 
Behr SC, Flavell RR, Huang S-Y, Zalocusky KA, Nardo L, Seo 
Y, Hawkins RA, Hernandez Pampaloni M, Hadley D, Franc BL 
(2019) A deep learning model to predict a diagnosis of Alzheimer 
disease by using 18f-FDG PET of the brain. Radiology 290:456–
464. https:// doi. org/ 10. 1148/ radiol. 20181 80958

 18. Dror IE, Kukucka J, Kassin SM, Zapf PA (2018) When expert 
decision making goes wrong: consensus, bias, the role of experts, 
and accuracy. J Appl Res Memory Cognit 7:162–163. https:// doi. 
org/ 10. 1016/j. jarmac. 2018. 01. 007

 19. Eiben AE, Smith JE (2015) Introduction to evolutionary comput-
ing. 2 ed., Springer. https:// doi. org/ 10. 1007/ 978-3- 662- 44874-8

 20. Erkkinen MG, Kim M-O, Geschwind MD (2018) Clinical neurol-
ogy and epidemiology of the major neurodegenerative diseases. 
Cold Spring Harbor Perspectives in Biology 10:a033118

 21. Fagan D, Fenton M, O’Neill M (2016) Exploring position inde-
pendent initialisation in grammatical evolution. In: 2016 IEEE 
Congress on Evolutionary Computation (CEC), IEEE. pp. 
5060-5067

 22. Fenton M, McDermott J, Fagan D, Forstenlechner S, Hemberg E, 
O’Neill M (2017) PonyGE2: Grammatical evolution in Python. 
In: Proceedings of the Genetic and Evolutionary Computation 
Conference Companion, ACM, Berlin, Germany. pp. 1194-1201

 23. Fernández-Matarrubia M, Matías-Guiu JA, Moreno-Ramos T, 
Matías-Guiu J (2014) Demencia frontotemporal variante conduc-
tual: aproximación clínica y terapéutica. Neurología 29:464–472

 24. Fernández-Matarrubia M, Matías-Guiu JA, Cabrera-Martín MN, 
Moreno-Ramos T, Valles-Salgado M, Carreras JL, Matías-Guiu J 
(2017) Episodic memory dysfunction in behavioral variant fron-
totemporal dementia: A clinical and fdg-pet study. J Alzheimer’s 
Dis: 1251 1264. https:// doi. org/ 10. 3233/ JAD- 160874

 25. Fisher CK, Smith AM, Walsh JR, Simon EA (2019) Machine 
learning for comprehensive forecasting of Alzheimer’s disease 
progression. Scient Reports 9:13622. https:// doi. org/ 10. 1038/ 
s41598- 019- 49656-2

 26. Foster NL, Heidebrink JL, Clark CM, Jagust WJ, Arnold SE, Bar-
bas NR, DeCarli CS, Scott Turner R, Koeppe RA, Higdon R et al 
(2007) Fdg-pet improves accuracy in distinguishing frontotempo-
ral dementia and Alzheimer’s disease. Brain 130:2616–2635

 27. Garcia-Gutierrez F, Delgado-Alvarez A, Delgado-Alonso C, Díaz-
Álvarez J, Pytel V, Valles-Salgado M, Gil MJ, Hernández-Lorenzo 
L, Matías-Guiu J, Ayala JL et al (2022) Diagnosis of Alzheimer’s 
disease and behavioural variant frontotemporal dementia with 
machine learning-aided neuropsychological assessment using fea-
ture engineering and genetic algorithms. Int J Geriatr Psychiatry:37

 28. Gunning D (2017) Explainable artificial intelligence (xai). http:// 
www. darpa. mil/ progr am/ expla inable- artif icial- intel ligen 
ce. Accessed 10/11/2021

 29. Harper L, Fumagalli GG, Barkhof F, Scheltens P, OBrien JT, 
Bouwman F, Burton EJ, Rohrer JD, Fox NC, Ridgway GR, 
Schott JM (2016) MRI visual rating scales in the diagnosis of 
dementia: evaluation in 184 post-mortem confirmed cases. Brain 
139(4):1211–1225. https:// doi. org/ 10. 1093/ brain/ aww005

 30. Hidalgo JI, Colmenar JM, Kronberger G, Winkler SM, Garnica 
O, Lanchares J (2017) Data based prediction of blood glucose 
concentrations using evolutionary methods. J Med Syst 41:142

 31. JA M-G, MN C-M, Valles-Salgado M EA (2017) Neural basis 
of cognitive assessment in Alzheimer disease, amnestic mild 
cognitive impairment, and subjective memory complaintsia, 
amyotrophic lateral sclerosis, and Alzheimer’s disease: clinical 
assessment and metabolic correlates. Am J Geriatr Psychiatry 
25(7):730–740. https:// doi. org/ 10. 1016/j. jagp. 2017. 02. 002

http://creativecommons.org/licenses/by/4.0/
https://doi.org/10.1093/database/baaa010
https://doi.org/10.1016/j.advms.2019.03.002
https://doi.org/10.1016/j.advms.2019.03.002
https://doi.org/10.1001/jama.2017.7797
https://doi.org/10.1001/jama.2017.7797
https://doi.org/10.1371/journal.pone.0187754
https://doi.org/10.1016/j.eswa.2020.114338
https://doi.org/10.1017/S1355617721001144
https://doi.org/10.1017/S1355617721001144
https://doi.org/10.1148/radiol.2018180958
https://doi.org/10.1016/j.jarmac.2018.01.007
https://doi.org/10.1016/j.jarmac.2018.01.007
https://doi.org/10.1007/978-3-662-44874-8
https://doi.org/10.3233/JAD-160874
https://doi.org/10.1038/s41598-019-49656-2
https://doi.org/10.1038/s41598-019-49656-2
http://www.darpa.mil/program/explainable-artificial-intelligence
http://www.darpa.mil/program/explainable-artificial-intelligence
http://www.darpa.mil/program/explainable-artificial-intelligence
https://doi.org/10.1093/brain/aww005
https://doi.org/10.1016/j.jagp.2017.02.002


Medical & Biological Engineering & Computing 

1 3

 32. John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the 
subset selection problem. In: Cohen WW, Hirsh H (eds) Machine 
Learning Proceedings 1994. Morgan Kaufmann, pp 121–129

 33. Koller D, Friedman N Probabilistic graphical models. Princi-
ples and Techniques. The MIT Press. https:// books. google. co. in/ 
books? id= 7dzpH CHzNQ 4C. Accessed 23 Oct 2021

 34. Kourou K, Exarchos TP, Exarchos KP, Karamouzis MV, Fotiadis 
DI (2015) Machine learning applications in cancer prognosis and 
prediction. Comput Struct Biotechnol J 13:8–17

 35. Luce L (2019) Democratization and impacts of ai. Apress, Berke-
ley, pp 185–195. https:// doi. org/ 10. 1007/ 978-1- 4842- 3931- 5∖_ 12

 36. Luo G (2016) A review of automatic selection methods for 
machine learning algorithms and hyper-parameter values. Net-
work Modeling Analysis in Health Informatics and Bioinformat-
ics, 5. https:// doi. org/ 10. 1007/ s13721- 016- 0125-6

 37. Matías-Guiu J A, N. C-MM, Valles-Salgado M, Rognoni T, Galán 
L, Moreno-Ramos T, Carreras JL, Matías-Guiu J (2019) Inhibition 
impairment in frontotemporal dementia, amyotrophic lateral scle-
rosis, and Alzheimer’s disease: clinical assessment and metabolic 
correlates. Brain Imaging and Behavior 13 (3):651659. https:// doi. 
org/ 10. 1007/ s11682- 018- 9891-3

 38. McKhann GM, Knopman DS, Chertkow H, Hyman BT, Jack 
CR Jr, Kawas CH, Klunk WE, Koroshetz WJ, Manly JJ, Mayeux 
R, et al. (2011) The diagnosis of dementia due to Alzheimer’s 
disease: recommendations from the national institute on aging-
Alzheimer’s association workgroups on diagnostic guidelines for 
Alzheimer’s disease. Alzheimer’s & Dementia 7:263–269

 39. Nordberg A, Rinne JO, Kadir A, Långström B (2010) The use of 
pet in Alzheimer disease. Nat Rev Neurol 6:78–87

 40. Olazarán J, Mouronte P, Bermejo F (2005) [Clinical validity of 
two scales of instrumental activities in Alzheimer’s disease]. Neu-
rologia 20:395–401

 41. Ossenkoppele R, Singleton EH, Groot C, Dijkstra AA, Eikelboom 
WS, Seeley WW, Miller B, Laforce RJ, Scheltens P, Papma JM, 
Rabinovici GD, Pijnenburg YAL (2022) Research criteria for the 
behavioral variant of Alzheimer disease: a systematic review and 
meta-analysis. JAMA Neurol 79:48–60. https:// doi. org/ 10. 1001/ 
jaman eurol. 2021. 4417

 42. Pagán J, Risco-Martín JL, Moya JM, Ayala JL (2016) Grammati-
cal evolutionary techniques for prompt migraine prediction. In: 
Proceedings of the Genetic and Evolutionary Computation Con-
ference 2016, Association for Computing Machinery, New York, 
NY, USA. p. 973980. https:// doi. org/ 10. 1145/ 29088 12. 29088 97

 43. Pena D, Barman A, Suescun J, Jiang X, Schiess MC, Giancardo L, 
The Alzheimer’s Disease Neuroimaging Initiative (2019) Quan-
tifying neurodegenerative progression with deepsymnet, an end-
to-end data-driven approach. Front Neurosci 13:1053. https:// doi. 
org/ 10. 3389/ fnins. 2019. 01053

 44. Puente-Castro A, Fernandez-Blanco E, Pazos A, Munteanu 
CR (2020) Automatic assessment of Alzheimer’s disease diag-
nosis based on deep learning techniques. Comput Biol Med 
120:103764. https:// doi. org/ 10. 1016/j. compb iomed. 2020. 103764

 45. Rajkomar A, Dean J, Kohane I (2019) Machine learning in medi-
cine. N Engl J Med 380 (14):1347–1358

 46. Rascovsky K, Hodges JR, Knopman D, Mendez MF, Kramer JH, 
Neuhaus J, van Swieten JC, Seelaar H, Dopper EG, Onyike CU, 
Hillis AE, Josephs KA, Boeve BF, Kertesz A, Seeley WW, Rankin 
KP, Johnson JK, Gorno-Tempini ML, Rosen H, Prioleau-Latham 
CE, Lee A, Kipps CM, Lillo P, Piguet O, Rohrer JD, Rossor MN, 
Warren JD, Fox NC, Galasko D, Salmon DP, Black SE, Mesulam 
M, Weintraub S, Dickerson BC, Diehl-Schmid J, Pasquier F, Der-
amecourt V, Lebert F, Pijnenburg Y, Chow TW, Manes F, Graf-
man J, Cappa SF, Freedman M, Grossman M, Miller BL (2011) 
Sensitivity of revised diagnostic criteria for the behavioural vari-
ant of frontotemporal dementia. Brain 134:2456–2477

 47. Ryan C, O’Neill M (1998) Grammatical evolution: a steady state 
approach. In: Koza JR (ed) Late Breaking Papers at the Genetic 
Programming 1998 Conference, Stanford University Bookstore, 
University of Wisconsin, Madison, Wisconsin, USA. pp. 180-185.

 48. Saeys Y, Inza I, Larrañaga P (2007) A review of feature selec-
tion techniques in bioinformatics. Bioinformatics 23:2507–2517. 
https:// doi. org/ 10. 1093/ bioin forma tics/ btm344

 49. Schreiber J (2017) Pomegranate: fast and flexible probabilistic 
modeling in Python. J Mach Learn Res 18:5992–5997

 50. Stekhoven DJ, Bühlmann P (2012) Missforestnon-parametric missing 
value imputation for mixed-type data. Bioinformatics 28:112–118

 51. Stern Y (2021) How can cognitive reserve promote cognitive and 
neurobehavioral health?. Arch Clin Neuropsychol 36:1291–1295. 
https:// doi. org/ 10. 1093/ arclin/ acab0 49

 52. Tăuţan A-M, Ionescu B, Santarnecchi E (2021) Artificial intelligence 
in neurodegenerative diseases: a review of available tools with a focus 
on machine learning techniques. Artif Intell Med 117:102081

 53. Tracy JM, Özkanca Y, Atkins DC, Hosseini Ghomi R (2020) 
Investigating voice as a biomarker: Deep phenotyping meth-
ods for early detection of parkinson’s disease. J Biomed Inform 
104:103362. https:// doi. org/ 10. 1016/j. jbi. 2019. 103362

 54. Tzourio-Mazoyer N, Landeau B, Papathanassiou D, Crivello F, 
Etard O, Delcroix N, Mazoyer B, Joliot M (2002) Automated 
anatomical labeling of activations in spm using a macroscopic 
anatomical parcellation of the mni mri single-subject brain. Neu-
roImage 15:273–289. https:// doi. org/ 10. 1006/ nimg. 2001. 0978

 55. Vehí J, Contreras I, Oviedo S, Biagi L, Bertachi A (2020) Predic-
tion and prevention of hypoglycaemic events in type-1 diabetic 
patients using machine learning. Health Inf J 26:703–718. https:// 
doi. org/ 10. 1177/ 14604 58219 850682 PMID: 31195880

 56. Wolpert DH (1992) Stacked generalization. Neural Netw 5:241–
259. https:// doi. org/ 10. 1016/ S0893- 6080(05) 80023-1

 57. Wu J, Azarm S (2001) Metrics for quality assessment of a multi-
objective design optimization solution set. J Mech Des 123:18–25

 58. Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolu-
tionary computation approaches to feature selection. IEEE Trans 
Evol Comput 20(4):606–626

 59. Zhang H, Sun G (2002) Feature selection using tabu search 
method. Pattern Recogn 35:701–711. https:// doi. org/ 10. 1016/ 
S0031- 3203(01) 00046-2

Publisher’s note Springer Nature remains neutral with regard to 
jurisdictional claims in published maps and institutional affiliations.

Fernando García‑Gutierrez is a MsC in Bioinformatics pursuing his 
PhD in Computer Science. His research interests are in the area of 
machine learning application to the diagnosis and prognosis of neuro-
degenerative diseases.

Josefa Díaz‑Álvarez is Assistant Professor in Computer Architecture 
at Extremadura University. PhD in Computer Engineering in Com-
plutense University of Madrid. She is interested in bioinspired algo-
rithms, and computing applied to bioengineering.

Jordi A. Matias‑Guiu is a neurologist in the Neurology Service of the 
Hospital Clínico San Carlos, Madrid. PhD in Medicine, Complutense 
University of Madrid. His research activity is focused on neuropsycho-
logical assessments, neuroimaging, and non-invasive neuromodulation.

Vanesa Pytel is a neurologist in ACE Alzheimer Center, Barcelona. 
PhD in Medicine, Complutense University of Madrid. Her research 
interests are in the field of transcranial stimulation applied to neurode-
generative diseases and genetic profiling.

https://books.google.co.in/books?id=7dzpHCHzNQ4C
https://books.google.co.in/books?id=7dzpHCHzNQ4C
https://doi.org/10.1007/978-1-4842-3931-5∖_12
https://doi.org/10.1007/s13721-016-0125-6
https://doi.org/10.1007/s11682-018-9891-3
https://doi.org/10.1007/s11682-018-9891-3
https://doi.org/10.1001/jamaneurol.2021.4417
https://doi.org/10.1001/jamaneurol.2021.4417
https://doi.org/10.1145/2908812.2908897
https://doi.org/10.3389/fnins.2019.01053
https://doi.org/10.3389/fnins.2019.01053
https://doi.org/10.1016/j.compbiomed.2020.103764
https://doi.org/10.1093/bioinformatics/btm344
https://doi.org/10.1093/arclin/acab049
https://doi.org/10.1016/j.jbi.2019.103362
https://doi.org/10.1006/nimg.2001.0978
https://doi.org/10.1177/1460458219850682
https://doi.org/10.1177/1460458219850682
https://doi.org/10.1016/S0893-6080(05)80023-1
https://doi.org/10.1016/S0031-3203(01)00046-2
https://doi.org/10.1016/S0031-3203(01)00046-2


 Medical & Biological Engineering & Computing

1 3

Jorge Matías‑Guiu is a Professor of Neurology, Universidad Com-
plutense de Madrid. Director of the Neuroscience Institute of Hospital 
Clinico San Carlos, and Head of Neurology Service.

María Nieves Cabrera‑Martín is a specialist in Nuclear Medicine in the 
Hospital Clínico San Carlos, Madrid. PhD in Medicine, Complutense 
University of Madrid. Research interests on the validation of 18-FDG 
PET for the diagnosis of neurodegenerative diseases.

José L. Ayala is a Professor in Computer Architecture and Automation 
in Complutense University of Madrid. PhD in Computer and Electri-
cal Engineering, Technical University of Madrid. Research interests in 
computing applied to bioengineering.


	GA-MADRID: design and validation of a machine learning tool for the diagnosis of Alzheimer’s disease and frontotemporal dementia using genetic algorithms
	Abstract
	1 Introduction
	2 Methods
	3 Results
	3.1 Data pre-processing
	3.2 Feature selection
	3.2.1 Evolutionary algorithms
	3.2.2 Multiobjective evolutionary algorithms

	3.3 ML-based solutions
	3.3.1 Machine learning models
	3.3.2 Meta-model strategy


	4 Discussion in a case of study
	4.1 Data preprocessing
	4.2 Features engineering
	4.3 Meta-models

	5 Conclusions
	References