Ink of Insight: Data Augmentation for Dementia Screening
through Handwriting Analysis

Nina Hosseini-Kivanani
nina.hosseinikivanani@uni.lu

Department of Computer Science,
University of Luxembourg

Esch-sur-Alzette, Luxembourg

Elena Salobrar-García
elenasalobrar@med.ucm.es

Ramon Castroviejo Institute of
Ophthalmologic Research,

Universidad Complutense de Madrid
Madrid, Spain

Lorena Elvira-Hurtado
marelvir@ucm.es

Ramon Castroviejo Institute of
Ophthalmologic Research,

Universidad Complutense de Madrid
Madrid, Spain

Inés López-Cuenca
inelopez@ucm.es

Ramon Castroviejo Institute of
Ophthalmologic Research,

Universidad Complutense de Madrid
Madrid, Spain

Rosa de Hoz
rdehoz@ucm.es

Ramon Castroviejo Institute of
Ophthalmologic Research,

Universidad Complutense de Madrid
Madrid, Spain

José M. Ramírez
ramirezs@med.ucm.es

Ramon Castroviejo Institute of
Ophthalmologic Research,

Universidad Complutense de Madrid
Madrid, Spain

Pedro Gil
pgil@salud.madrid.org

Memory Unit, Geriatrics Service,
Hospital Clínico San Carlos

Madrid, Spain

Mario Salas-Carrillo
mario.salas@salud.madrid.org
Memory Unit, Geriatrics Service,

Hospital Clínico San Carlos
Madrid, Spain

Christoph Schommer
christoph.schommer@uni.lu

Department of Computer Science,
University of Luxembourg

Esch-sur-Alzette, Luxembourg

Luis A. Leiva
luis.leiva@uni.lu

Department of Computer Science,
University of Luxembourg

Esch-sur-Alzette, Luxembourg

ABSTRACT
We investigate the use of handwriting data as a means of predicting
early symptoms of Alzheimer’s disease (AD). Thirty-six subjects
were classified based on the standardized pentagon drawing test
(PDT) using deep learning (DL) models. We also compare and con-
trast classic machine learning (ML) models with DL by employing
different data augmentation (DA) techniques. Our findings indicate
that DA greatly improves the performance of all models, but the
DL-based ones are the ones that achieve the best and highest results.
The best model (EfficientNet) achieved a classification accuracy of
87% and an area under the receiver operating characteristic curve
(AUC) of 91% for binary classification (healthy or AD patients),
whereas for multiclass classification (healthy, mild AD, or moderate
AD) accuracy was 76% and AUC was 77%. These results under-
score the potential of DA as a simple, cost-effective approach to
aid practitioners in screening AD in larger populations, suggesting
DL models are capable of analyzing handwriting data with a high

This work is licensed under a Creative Commons Attribution International
4.0 License.

ICMHI 2024, May 17–19, 2024, Yokohama, Japan
© 2024 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-1687-4/24/05
https://doi.org/10.1145/3673971.3673992

degree of accuracy, which may lead to better and earlier detection
of AD.tempate

CCS CONCEPTS
• Applied computing; • Life and medical sciences; • Health
care information systems;

KEYWORDS
Alzheimer’s Disease; Screening; Pentagon Drawing Test; Data Aug-
mentation; Image Classification; Machine Learning; Deep Learning
ACM Reference Format:
Nina Hosseini-Kivanani, Elena Salobrar-García, Lorena Elvira-Hurtado,
Inés López-Cuenca, Rosa de Hoz, José M. Ramírez, Pedro Gil, Mario Salas-
Carrillo, Christoph Schommer, and Luis A. Leiva. 2024. Ink of Insight: Data
Augmentation for Dementia Screening through Handwriting Analysis. In
2024 8th International Conference on Medical and Health Informatics (ICMHI
2024), May 17–19, 2024, Yokohama, Japan.ACM, New York, NY, USA, 6 pages.
https://doi.org/10.1145/3673971.3673992

1 INTRODUCTION AND RELATEDWORK
Alzheimer’s disease refers to a dementia syndrome characterized
by primary impairments of cortical cognitive functions, includ-
ing memory, language, and praxis, that gradually progress over
time [15]. These impairments have a high functional impact and
are often accompanied by various neuropsychiatric symptoms [7].

224

https://creativecommons.org/licenses/by/4.0/
https://creativecommons.org/licenses/by/4.0/
https://doi.org/10.1145/3673971.3673992
https://doi.org/10.1145/3673971.3673992
http://crossmark.crossref.org/dialog/?doi=10.1145%2F3673971.3673992&domain=pdf&date_stamp=2024-09-09


ICMHI 2024, May 17–19, 2024, Yokohama, Japan N. Hosseini-Kivanani et al.

As the disease progresses, the number of damaged neurons and
the extent of affected brain regions increases, leading to a greater
need for assistance from family members, friends, and professional
caregivers for daily tasks [1]. The early stages of AD are charac-
terized by memory loss, recognition problems (such as an object
or face recognition [8]), visual impairments, and deficits in spatial
perception [21], despite relatively normal visual acuity values and
intact visual fields [23]. Recent research has shown that assess-
ing visuospatial function, in addition to brain scanning, can aid in
the early detection of impairments. Effective screening tests can
identify visuospatial dysfunction, which may manifest years before
the onset of clinical symptoms [18]. However, existing screening
measures for cognitive changes face challenges, particularly with
regard to their limited intra-individual reliability, which hinders
accurate tracking of cognitive changes over time.

Drawing tests, frequently used in dementia screening, can re-
flect the presence of the condition through changes in a person’s
drawing ability [22]. However, the subjectivity in scoring systems
used in these tests and their limited scope in capturing a range
of drawing attributes often result in missing subtle yet clinically
significant indicators of cognitive decline. This means that no sin-
gle scoring system is reported as the most effective and reliable
evaluation method (e.g., [12]). This highlights the need for more
comprehensive and objective screening methods. There is a grow-
ing interest in exploring more advanced analytical approaches, such
as the integration of machine learning (ML) techniques, to augment
the diagnostic effectiveness of cognitive screening tools.

Recent advancements in artificial intelligence (AI), particularly
in deep learning (DL), have significantly impacted healthcare, espe-
cially when it comes to diagnosing neurodegenerative diseases like
AD (e.g., [6, 12, 24]). DL models have played an instrumental role in
the analysis of neuroimaging, detecting complex patterns in brain
scans that are imperceptible to the human eye. Our study focuses
on refining DL models for dementia screening and emphasizing the
importance of data augmentation (DA) techniques in contexts with
limited high-quality and diverse data. This approach is vital for im-
proving model robustness, especially in applications like automated
analysis of scanned paper-based handwriting and drawings, which
are crucial in AD screening. Recent research has highlighted DL’s
transformative role in healthcare, particularly in the early detection
and management of cognitive impairments [3, 6, 11, 14].

Relevant studies (e.g., [6, 11]) have highlighted the precision
of DL models, particularly convolutional neural networks (CNNs).
However, the effectiveness of these models is often limited by the
small size of available datasets. Maruta et al. [19] demonstrated that
the fine-tuned GoogleNet CNN outperforms other CNNmodels like
VGG-16, ResNet-50, and Inception-v3 in automatically evaluating
the pentagon drawings for constructional apraxia. Additionally,
Tasaki et al. [28] conducted a study on the usage of a DL model
called PentaMind. which analyzes hand-drawn images of intersect-
ing pentagons to extract cognition-related features. The model was
trained on 13,777 images and successfully extracted features such
as line waviness, which shows improvement over conventional
visual assessment methods. Jiménez-Mesa et al. [16] proposed a
CNN-based method for diagnosing cognitive impairment through
the Clock Drawing Test (CDT), effectively classifying drawings as
healthy or patient, indicating its potential for hospital and clinic

use, particularly in resource-limited areas. The use of DL in cogni-
tive impairment tests is not without limitations, primarily due to
the limited dataset sizes and variability. DA emerges as a pivotal
solution to enhance model robustness and accuracy. It involves gen-
erating additional training data from existing datasets, increasing
size, diversity, and quality. However, challenges exist in preserving
clinical relevance and avoiding artificial biases.

Summary of Contributions
Our research builds upon significant advancements in ML for cogni-
tive impairment screening, aiming to tackle the existing challenges.
This brings us to the core objectives of our research. Firstly, we
aim to develop robust DL Models for AD screening to refine and
enhance the existing models. Secondly, our study focuses on the im-
portance of DA in clinical settings, emphasizing the preservation of
data integrity and reliability. Thirdly, we explore the comparative
advantages of DL over classic ML in the context of AD screen-
ing, providing a comprehensive insight into the future of digital
screening in AD.

2 MATERIALS AND METHODS
Our goal is to improve the performance of ML models in classi-
fying handwriting data by implementing suitable DA techniques.
Although DA has shown advantages in other scientific domains,
its application to handwriting data in clinical contexts has received
little attention. This is primarily because the augmented data is
often either too similar to the original data or too distorted for the
models to learn effectively from it (e.g., [25]). This study compares
classic ML models (SVMs, RFs, 𝑘-NNs) and DL models (CNNs) in
the context of classifying binary (healthy vs. patient) and multiclass
(healthy, mild AD, and moderate AD) classification tasks, both with
and without applying DA.

2.1 Data Collection and Tasks
The study recruited 36 subjects (13 female and 23 male) from the
Memory Unit of the Hospital Clinico San Carlos (HCSC) for a study
on cognitive and neurophysiological characteristics of individuals
at high risk of dementia. Subjects were categorized according to the
guidelines of the National Institute of Neurological and Communica-
tive Disorders and Stroke-AD and Related Disorders Association
(NINCDS-ADRDA) [20] and the Statistical Manual of Mental Disor-
ders V (DSM V) [9]. Based on these guidelines, the subjects were
classified into two groups of patients (mild AD, n=3, and moderate
AD, n=3) and one group of healthy subjects (control, n=30). All
the subjects provided informed consent prior to participation. The
participants’ ages ranged from 61 to 88 years old, with a mean age
of 73.92 ± 6,78 years old. No significant differences in age were
observed among the healthy group, mild AD, or moderate AD based
on 𝑝-value > .05. The study included 30 individuals aged between
61 to 84 years who were cognitively healthy with no evidence of
brain injury and had MMSE score above 26. Non-healthy partici-
pants had Mini-Mental State Examination (MMSE) scores between
25 and 17.

225


Ink of Insight: Data Augmentation for Dementia Screening through Handwriting Analysis ICMHI 2024, May 17–19, 2024, Yokohama, Japan

Figure 1: Example of the preprocessing steps for Pentagon Drawing Test (PDT) images: prompting pentagon (A) on top with
participants’ drawings at the bottom; image processing (B, C); and final image (D) for model input.

2.2 Image Preprocessing and Data
Augmentation

Participants were given a blank A4-sized paper and asked to copy a
figure of two overlapping pentagons with an interlocking shape (as
shown in Figure 1a). The paper-and-pencil drawingswere converted
from PDF files to image format (PNG format) (Figure 1a and b)
to be processed with our classic ML and DL models. The PNG
images were then converted from color images (three channels)
to grayscale (one channel) (Figure 1c). The resulting images were
resized to standard dimensions (224×224 px). Any nonrelevant
information, such as the original printed images from the clinicians,
that appeared on the top side of the original file (Figure 1a), was
removed during the preprocessing pipeline. Finally, images were
padded to remove noise from the image and make them in the same
shape, and the canny edge detector from OpenCV library [4] was
used to improve the resulting image (Figure 1d). Low-quality and
noisy images (in total, 14 images from the healthy group) were
manually filtered out.

ML (and, particularly, DL) models perform better when trained
on large datasets; however, obtaining such large-scale datasets is re-
ally challenging in clinical fields. To address this, DA techniques can
be used to artificially increase the size of the dataset. By generating
additional images from the input images, these techniques can help
reduce the risk of overfitting and increase the model’s generalizabil-
ity, leading to better overall performance. These techniques include
applying geometric transformations (such as flipping, cropping,
rotating, and translating), changing the color space of the images,
mixing images, or even using generative adversarial networks [25].
In this study, we only applied geometric transformations to images
for DA, carefully avoiding transformations that would potentially
destroy the semantics of the original image and are not suitable
for our grayscale handwritten images. Therefore, techniques com-
monly used in broader computer vision applications, such as hue
adjustments or color inversion, were deliberately excluded from our

process. Our approach was to maintain the integrity of the original
handwritten samples, ensuring that the essential characteristics of
these images were preserved.

To determine the quality of the augmented data, we used the
structural similarity index (SSIM) [29]. SSIMmeasures the similarity
between two images by considering the human visual perception of
differences in terms of luminance, contrast, and structure. SSIM is
a widely used measurement tool because of its low computational
complexity and ability to compare synthetic and original images.
The SSIM method uses a sliding window to analyze the structural
distortion between two similar images. The SSIM score ranges from
0 to 1, with a score of 1 indicating that the images are the same
and a score of 0 indicating that the images are totally different.
For applying DA techniques such as elastic transformation, grid
distortion, and rotation to the images in our training set, we used
the Albumentations open-source toolkit [5]. These DA techniques
were applied to the images in our training set, which resulted in an
increased sample size. Crucially, we allocated all original images
from patient subjects exclusively to the test set to ensure a robust
testing protocol. The training set consisted of 60 images for healthy
and 60 for patient classes. The test set comprised 6 images for
healthy and 6 images for patient classes.

After DA, as shown in Figure 2, the SSIM values ranged from 0.6
to 0.7, indicating that the augmented images are not near-duplicates
of the original data but are rather new images. However, when all
DA techniques from the Albumentations toolkit were applied, the
distribution of SSIM values was from 0.1 to 0.7, indicating that
the augmented images are much more different than the original
images, which is not desirable in our research.

2.3 Classic ML and DL models for AD screening
We selected classic ML and DL models based on their proven
strengths and applicability to medical image analysis. Classic ML
models were support vector machines (SVM), random forest (RF),

226


ICMHI 2024, May 17–19, 2024, Yokohama, Japan N. Hosseini-Kivanani et al.

Figure 2: SSIM distributions for Pentagon drawings. Dashed
lines represent the SSIM results of all augmentation tech-
niques (All aug.), while solid lines correspond to the selected
augmentation techniques (Sel. aug).

and 𝑘-nearest neighbors (𝑘-NN). They require manual feature ex-
traction, whereas DL models automatically identify and optimize
relevant features from data. Among DL models, CNNs are the most
popular and widely used in image-related tasks [31], due to their
ability to automatically detect features by using a composition of
the different types of layers: (i) Convolution layers (CONV) are the
primary building blocks of a CNN model for extracting features
such as colors, edges, and corners from the input by applying the
convolution operation through a sliding kernel, (ii) Pooling layers
(POOL) are used to reduce the dimensionality of the feature maps
computed by the CONV layers, and (iii) Fully-connected layers
(FC) are placed at the end of the model’s architecture to flatten the
output of the previous layer and to add non-linearities to the model.

We evaluated various state-of-the-art CNN architectures for
AD screening. All models have the same input layer (224x224 px
grayscale images) and the same output layer (with Softmax activa-
tion function):
VGG-16 [26] features 16 CONV layers with 3×3 kernels, followed
by 3 FC layers before the output layer.
ResNet-152 [10] is a deep residual network architecture with 152
CONV layers. It uses skip connections between CONV layers, a
kernel size of 3×3, and batch normalization. The model has two FC
layers before the output layer.
DenseNet-121 [13] is a deep CNN composed of 121 layers, includ-
ing CONV layers with 7×7 kernels, and DenseBlocks, which are
groups of CONV layers with 1×1 and 3×3 kernels interconnected
through transition layers, and finally an FC layer followed by the
output layer.
EfficientNet [27] has multiple CONV layers with a mix of differ-
ent kernels, followed by corresponding POOL layers and a single
FC layer before the output layer.
Custom CNN that we designed with five CONV layers with 3×3
kernels, followed by two POOL layers and one FC layer before the
output layer.

Except for our proposed Custom CNN model, the other CNNs
are pre-trained on the ImageNet dataset, which contains 1M images
distributed over 1000 classes. Therefore, we used transfer learning

to finetune those architectures on our dataset. Accordingly, the
dimensionality of the output layer is reduced from 1000 classes to 2
or 3, depending on the classification experiments. We used 2 classes
in binary classification experiments and 3 classes in multiclass
classification experiments.

2.4 Model training
To train the classic ML models (SVM, 𝑘-NN, RF), we used 5-fold
cross-validation, which involves randomly dividing the dataset into
5 groups or folds. For the SVM classifier, a “C” value of 0.1, a “gamma”
of 0.0001, and a “linear” kernel were determined to be best. For the
𝑘-NN classifier, the “manhattan” metric with “n_neighbors” set to 3
and “weights” configured as “distance” was used. The RF classifier,
on the other hand, used a “max_depth” of 15, “max_features” of 9,
a “min_impurity_decrease” of 1e-05, and “n_estimators” set at 70.

To train the DL models (CNNs), we used grid search to find
the optimal parameters for each model. The learning rate varied
between 0.0001 and 0.1, weight decay was fixed at 0.01, and the
Adam optimization was employed. The models were trained over
50 epochs, using a batch size of 16, and the Cross-Entropy loss
function was applied to optimize classification performance.

3 RESULTS AND DISCUSSION
The efficiency of ML and DL models was evaluated using accuracy
and area under the receiver operating characteristic curve (AUC).
Accuracy represents the ratio of correct classifications to the total
number of samples. AUC reports the performance of a classifier as
a trade-off between the True Positive Rate and False Positive Rate,
ranging from 0.5 (indicating random performance) to 1 (indicating
perfect performance).

We have explored various classic ML and DL models for binary
(healthy and patients) and multiclass (healthy, mild AD, and mod-
erate AD) classification. The results are presented in Figure 3. Both
classic ML and DL models showed an increase in accuracy after DA.
This improvement was significant when compared to the baseline
model (without DA). The results obtained from the classifiers that
employed EfficientNet and our custom CNN outperformed all the
other models, with 0.87 accuracy and 0.91 AUC scores for binary
classification and 0.76 accuracy and 0.8 AUC for multi-class classifi-
cation. In sum, DA led to a 10 to 30% increase in binary classification
experiments and to a 10 to 20% increase in multi-class classification
experiments.

Our work showcases the ability of classic ML and DL models
to accurately classify AD patients, with a particular emphasis on
integrating DA techniques. These DA methods were carefully se-
lected based on their suitability in analyzing cognitive assessment
tests used in AD diagnosis, addressing the limitations of current
approaches in the existing literature (e.g., [30]). According to the
SSIM results (Figure 2), the most appropriate DA techniques for
PDT images include elastic transformation, grid distortion, hori-
zontal flipping, translation offset, and rotations. Hosseini-Kivanani
et al. [12] highlighted the importance of accurately choosing the
DA techniques, showing that flipping and rotation can destroy the
semantics of a CDT image. In contrast, in this work, flipping and
rotation are appropriate DA techniques for PDT images, given their
symmetry.

227


Ink of Insight: Data Augmentation for Dementia Screening through Handwriting Analysis ICMHI 2024, May 17–19, 2024, Yokohama, Japan

SVM
𝑘
-N

N
RF VG

G
-16

D
enseN

et-121
ResN

et-152
Effi

cientN
et

Custom
CN

N

30
40
50
60
70
80
90
100

A
cc
ur

ac
y
(%
)

SVM
𝑘
-N

N
RF VG

G
-16

D
enseN

et-121
ResN

et-152
Effi

cientN
et

Custom
CN

N

30
40
50
60
70
80
90
100

Before and After DA
Binary Classification Multiclass Classification

SVM
𝑘
-N

N
RF VG

G
-16

D
enseN

et-121
ResN

et-152
Effi

cientN
et

Custom
CN

N

30
40
50
60
70
80
90
100

A
U
C
(%
)

SVM
𝑘
-N

N
RF VG

G
-16

D
enseN

et-121
ResN

et-152
Effi

cientN
et

Custom
CN

N

30
40
50
60
70
80
90
100

Before and After DA
Binary Classification Multiclass Classification

Figure 3: Accuracy and AUC values of classic ML and DL
models, both before and after DA and for both binary class
and multiclass classification experiments. Dashed lines rep-
resent the performance of a random classifier, illustrating
the empirical lower bound in classification performance.

DL models have been used in various research for different types
of cognitive assessments such as the paper-and-pencil CDT or cube
drawing (e.g., [2, 6, 16, 24]). However, none of these studies have
specifically focused on the use of DA. Furthermore, while there have
been a few efforts to apply DL to automatically screen PDT images,
these have not included the use of DA, as seen in [17, 19]. Our
Custom CNN model, enhanced with DA, demonstrated exceptional
proficiency in evaluating PDT images and outperformed previous
studies’ results with fewer data used in their studies.

After benchmarking our custom CNN against other state-of-
the-art models, we found that it performs better in many cases,
particularly when the data has a simple underlying pattern. The
simpler structure of our Custom model allows it to learn and gener-
alize these patterns more effectively, leading to higher performance.
This suggests that our Custom CNN model with well-designed aug-
mented images excels at certain tasks, such as simple drawings by
patients, and is valuable for detecting AD patients from healthy
individuals. Furthermore, it outperforms recent work that used
pre-trained CNN models in similar task [19].

Our findings underscore the transformative potential of DA in
enhancing the DL model’s performance. By artificially increasing
the dataset’s size and diversity, both ML and DL models can be
trained to be more robust and accurate, ultimately leading to im-
proved patient outcomes in clinical settings. This research lays the

groundwork for future advancements in AD treatment and care,
aiming to ultimately improve the quality of life for those affected
by AD.

4 CONCLUSION
This work provides valuable insights into the effectiveness of using
DA on small clinical datasets for AD screening through handwriting
analysis. Both classic ML and DL models were able to achieve better
performance than they could without DA. Our method, which is
practical for clinical use, offers a cost-effective solution to assist
healthcare professionals in patient screening and minimizes subjec-
tivity in interpreting clinical data, particularly in resource-limited
settings. It can have a significant impact by helping doctors make
more informed decisions and eventually provide better treatment
options for patients.

ACKNOWLEDGMENTS
Work supported by the UCM research group (Grupo de Inves-
tigación básica en Ciencias de la Visión del IIORC, UCM-GR17-
920105), the Horizon 2020 FET program of the European Union
(grant CHIST-ERA-20-BCI-001), and the European Innovation Coun-
cil Pathfinder program (grant 101071147).

REFERENCES
[1] Alzheimer’s Association. 2022. 2022 Alzheimer’s disease facts and figures.

Alzheimer’s & Dementia: The Journal of the Alzheimer’s Association 18, 4 (April
2022), 700–789.

[2] SamadAmini, Lifu Zhang, BoranHao, AmanGupta,Mengting Song, CodyKarjadi,
Honghuang Lin, Vijaya B. Kolachalama, Rhoda Au, and Ioannis Ch Paschalidis.
2021. An Artificial Intelligence-Assisted Method for Dementia Detection Using
Images from the Clock Drawing Test. Journal of Alzheimer’s Disease 83, 2 (2021),
581–589. https://doi.org/10.3233/JAD-210299 Publisher: IOS Press BV.

[3] Sabyasachi Bandyopadhyay, Jack Wittmayer, David J. Libon, Patrick Tighe,
Catherine Price, and Parisa Rashidi. 2023. Explainable semi-supervised deep
learning shows that dementia is associated with small, avocado-shaped clocks
with irregularly placed hands. Scientific Reports 13, 1 (May 2023), 7384. https:
//doi.org/10.1038/s41598-023-34518-9

[4] Gary Bradski. 2000. The openCV library. Miller Freeman Inc 25, 11 (2000),
120–123.

[5] Alexander Buslaev, Vladimir I. Iglovikov, Eugene Khvedchenya, Alex Parinov,
Mikhail Druzhinin, and Alexandr A. Kalinin. 2020. Albumentations: Fast and
Flexible Image Augmentations. Information 11, 2 (Feb. 2020), 125.

[6] Shuqing Chen, Daniel Stromer, Harb Alnasser Alabdalrahim, Stefan Schwab,
Markus Weih, and Andreas Maier. 2020. Automatic dementia screening and
scoring by applying deep learning on clock-drawing tests. Scientific Reports 2020
10:1 10, 1 (Nov. 2020), 1–11. Publisher: Nature Publishing.

[7] Jeffrey L. Cummings. 2004. Alzheimer’s disease. The New England Journal of
Medicine 351, 1 (July 2004), 56–67. https://doi.org/10.1056/NEJMra040223

[8] John D. W. Greene and John R. Hodges. 1996. Identification of famous faces
and famous names in early Alzheimer’s disease: Relationship to anterograde
episodic and general semantic memory. Brain 119, 1 (Feb. 1996), 111–128. https:
//doi.org/10.1093/brain/119.1.111

[9] Martin Guha. 2014. Diagnostic and Statistical Manual of Mental Disorders: DSM-5
(5th edition). Reference Reviews 28, 3 (Jan. 2014), 36–37.

[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual
Learning for Image Recognition. In Proceedings of the IEEE conference on computer
vision and pattern recognition. 770–778.

[11] Nina Hosseini-Kivanani, Elena Salobrar-Gracía, Lorena Elvira-Hurtado, Inés
López-Cuenca, Rosa de Hoz, José M. Ramírez, Pedro Gil, Mario Salas, Christoph
Schommer, and Luis A. Leiva. 2023. Better Together: Combining Different Hand-
writing Input Sources Improves Dementia Screening. In IEEE 19th International
Conference on e-Science (e-Science). IEEE, Cyprus, 1–7.

[12] Nina Hosseini-Kivanani, Christoph Schommer, and Luis. A Leiva. 2023. The
Magic Number: Impact of Sample Size for Dementia Screening Using Transfer
Learning and Data Augmentation of Clock Drawing Test Images. In International
Conference on E-health Networking, Application & Services (Healthcom). IEEE,
China, 23–28.

228

https://doi.org/10.3233/JAD-210299
https://doi.org/10.1038/s41598-023-34518-9
https://doi.org/10.1038/s41598-023-34518-9
https://doi.org/10.1056/NEJMra040223
https://doi.org/10.1093/brain/119.1.111
https://doi.org/10.1093/brain/119.1.111


ICMHI 2024, May 17–19, 2024, Yokohama, Japan N. Hosseini-Kivanani et al.

[13] Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger.
2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE
conference on computer vision and pattern recognition. 4700–4708.

[14] Donato Impedovo and Giuseppe Pirlo. 2018. Dynamic Handwriting Analysis for
the Assessment of Neurodegenerative Diseases: A Pattern Recognition Perspec-
tive. IEEE reviews in biomedical engineering 12 (2018), 209–220. Publisher: IEEE
Rev Biomed Eng.

[15] Jessica J Jalbert, Lori A Daiello, and Kate L Lapane. 2008. Dementia of the
Alzheimer Type | Epidemiologic Reviews | Oxford Academic. Epidemiologic
reviews 30, 1 (2008), 15–34. https://academic.oup.com/epirev/article/30/1/15/
623289

[16] C. Jiménez-Mesa, Juan E. Arco, M. Valentí-Soler, B. Frades-Payo, M. A. Zea-
Sevilla, A. Ortiz, M. Ávila Villanueva, Diego Castillo-Barnes, J. Ramírez, T. del
Ser-Quijano, C. Carnero-Pardo, and J. M. Górriz. 2022. Automatic Classification
System for Diagnosis of Cognitive Impairment Based on the Clock-Drawing Test.
Lecture Notes in Computer Science 13258 LNCS (2022), 34–42.

[17] Yike Li, Jiajie Guo, and Peikai Yang. 2022. Developing an Image-Based Deep Learn-
ing Framework for Automatic Scoring of the Pentagon Drawing Test. Journal of
Alzheimer’s disease: JAD 85, 1 (2022), 129–139.

[18] José Eduardo Martinelli, Juliana Francisca Cecato, Marcos Oliveira Martinelli,
Brian Alvarez Ribeiro de Melo, and Ivan Aprahamian. 2018. Performance of
the Pentagon Drawing test for the screening of older adults with Alzheimer’s
dementia. Dementia & Neuropsychologia 12, 1 (Jan. 2018), 54–60. Publisher:
Academia Brasileira de Neurologia, Departamento de Neurologia Cognitiva e
Envelhecimento.

[19] Jumpei Maruta, Kentaro Uchida, Hideo Kurozumi, Satoshi Nogi, Satoshi Akada,
Aki Nakanishi, Miki Shinoda, Masatsugu Shiba, and Koki Inoue. 2022. Deep
convolutional neural networks for automated scoring of pentagon copying test
results. Scientific Reports 12, 1 (Dec. 2022), 9881.

[20] Guy McKhann, David Drachman, Marshall Folstein, Robert Katzman, Donald
Price, and Emanuel M. Stadlan. 1984. Clinical diagnosis of Alzheimer’s disease:
Report of the NINCDS-ADRDA Work Group* under the auspices of Department
of Health and Human Services Task Force on Alzheimer’s Disease. Neurology 34,
7 (July 1984), 939–939. https://doi.org/10.1212/WNL.34.7.939

[21] Mario F Mendez, Monique M Cherrier, and Robert S Meadows. 1996. Depth
Perception in Alzheimer’s Disease. Perceptual and motor skills 83, 3 (1996), 987–
995.

[22] Gabriel Poirier, Alice Ohayon, Adrien Juranville, France Mourey, and Jeremie
Gaveau. 2021. Deterioration, Compensation and Motor Control Processes in
Healthy Aging, Mild Cognitive Impairment and Alzheimer’s Disease. Geriatrics
6, 1 (2021), 33. Publisher: Geriatrics (Basel).

[23] Elena Salobrar-García, Rosa de Hoz, Ana I. Ramírez, Inés López-Cuenca, Pilar
Rojas, Ravi Vazirani, Carla Amarante, Raquel Yubero, Pedro Gil, María D. Pinazo-
Durán, Juan J. Salazar, and José M. Ramírez. 2019. Changes in visual function
and retinal structure in the progression of Alzheimer’s disease. PLOS ONE 14, 8
(Aug. 2019), e0220535. https://doi.org/10.1371/journal.pone.0220535

[24] Kenichiro Sato, Yoshiki Niimi, Tatsuo Mano, Atsushi Iwata, and Takeshi Iwatsubo.
2022. Automated Evaluation of Conventional Clock-Drawing Test Using Deep
Neural Network: Potential as a Mass Screening Tool to Detect Individuals With
Cognitive Decline. Frontiers in Neurology 13 (2022), 831–831. https://www.
frontiersin.org/articles/10.3389/fneur.2022.896403

[25] Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on Image Data
Augmentation for Deep Learning. Journal of Big Data 6, 1 (July 2019).

[26] Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Net-
works for Large-Scale Image Recognition.

[27] Mingxing Tan and Quoc Le. 2019. EfficientNet: Rethinking Model Scaling for Con-
volutional Neural Networks. In Proceedings of the 36th International Conference
on Machine Learning. PMLR, 6105–6114.

[28] Shinya Tasaki, Namhee Kim, Tim Truty, Ada Zhang, Aron S. Buchman, Melissa
Lamar, and David A. Bennett. 2023. Interpretable deep learning approach for
extracting cognitive features from hand-drawn images of intersecting pentagons
in older adults.

[29] Zhou Wang, Alan Conrad Bovik, Hamid Rahim Sheikh, and Eero P. Simoncelli.
2004. Image quality assessment: From error visibility to structural similarity.
IEEE Transactions on Image Processing 13, 4 (April 2004), 600–612.

[30] Victor Wasserman, Sheina Emrani, Emily F. Matusz, Jamie Peven, Seana Cleary,
Catherine C. Price, Terrie Beth Ginsberg, Rodney Swenson, Kenneth M. Heilman,
Melissa Lamar, and David J. Libon. 2020. Visuospatial performance in patients
with statistically-defined mild cognitive impairment. Journal of Clinical and
Experimental Neuropsychology 42, 3 (April 2020), 319–328. https://doi.org/10.
1080/13803395.2020.1714550

[31] Guangle Yao, Tao Lei, and Jiandan Zhong. 2019. A review of Convolutional-
Neural-Network-based action recognition. Pattern Recognition Letters 118 (Feb.
2019), 14–22. https://doi.org/10.1016/j.patrec.2018.05.018

229

https://academic.oup.com/epirev/article/30/1/15/623289
https://academic.oup.com/epirev/article/30/1/15/623289
https://doi.org/10.1212/WNL.34.7.939
https://doi.org/10.1371/journal.pone.0220535
https://www.frontiersin.org/articles/10.3389/fneur.2022.896403
https://www.frontiersin.org/articles/10.3389/fneur.2022.896403
https://doi.org/10.1080/13803395.2020.1714550
https://doi.org/10.1080/13803395.2020.1714550
https://doi.org/10.1016/j.patrec.2018.05.018

	Abstract
	1 Introduction and Related Work
	2 Materials and Methods
	2.1 Data Collection and Tasks
	2.2 Image Preprocessing and Data Augmentation
	2.3 Classic ML and DL models for AD screening
	2.4 Model training

	3 Results and Discussion
	4 Conclusion
	Acknowledgments
	References