SMC
2023

Proceedings of the
Sound and Music Computing

Conference 2023

Royal College of Music and
KTH Royal Institute of Technology

in Stockholm, Sweden

ISBN 978-91-527-7372-7


Bresin, R., & Falkenberg, K. (Eds.). 2023. Proceedings of the 20th Sound and Music Computing 
Conference. June 15-17, 2023. Stockholm, Sweden 

DOI 10.5281/zenodo.8136568 

ISBN 978-91-527-7372-7 

Conference website: smcnetwork.org/smc2023/ 

Video recordings of the conference concerts and keynotes: www.youtube.com/@navetresearch 


Table of Contents

CREPE NOTES: A NEW METHOD FOR SEGMENTING PITCH CONTOURS INTO
DISCRETE NOTES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

Xavier Riley and Simon Dixon

Design Process in Visual Programming: Methods for Visual and Temporal Analysis . . . . . . 6

Jack Armitage, Thor Magnusson and Andrew McPherson

Comparing various sensors for capturing human micromotion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Maham Riaz, Finn Upham, Kayla Burnim, Laura Bishop and Alexander Refsum
Jensenius

Introducing stateful conditional branching in Ciaramella. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Paolo Marrone, Stefano D’Angelo and Federico Fontana

F0 ANALYSIS OF GHANAIAN POP SINGING REVEALS PROGRESSIVE
ALIGNMENT WITH EQUAL TEMPERAMENT OVER THE PAST THREE
DECADES: A CASE STUDY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Iran Roman, Daniel Faronbi, Isabelle Burger-Weiser and Leila Adu-Gilmore

Conditional sound effects generation with regularized WGAN. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

Yunyi Liu and Craig Jin

INTERACTIVE MUSIC SCORE COMPLETION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

Gregory Beller, Jacob Sello, Georg Hajdu and Thomas Görne

STRUCTURING MUSIC FOR ANY SOURCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Orestis Karamanlis

Daisy Dub: a modular and updateable real-time audio effect for music production and
performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

Rasmus Kjærbo, Leo Fogadic, Oliver Bjørk Winkel and Stefania Serafin

The effect of actuating the bass trombone second valve on the quality of note transition
in legato . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Renato Lisboa, Gustavo Machado, Thiago Campolina and Mauŕıcio Loureiro

TickTacking – Drawing trajectories with two buttons and rhythm . . . . . . . . . . . . . . . . . . . . . . . . 63

Davide Rocchesso, Alessio Bellino and Antonino Perez

VIBROTACTILE FEEDBACK ENHANCES PERCEIVED AROUSAL AND
LISTENING EXPERIENCE IN MUSIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

Hanna Järveläinen and Eric Larrieux

WebChucK IDE: A Web-Based Programming Sandbox for ChucK . . . . . . . . . . . . . . . . . . . . . . . . 79

Terry Feng, Celeste Betancur, Michael Mulshine, Chris Chafe and Ge Wang

XR etudes for augmented piano . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Giovanni Santini

Dynamical Complexity Measurement with Random Projection: A Metric Optimised for
Realtime Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Chris Kiefer

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

i


Temporality Across Three Media: Inner Transmissions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

Julia Mills

VocalHUM: real-time whisper-to-speech enhancement for patients with vocal frailty . . . . . . 103

Francesco Roberto Dani, Sonia Cenceschi and Alessandro Trivilini

A Programmable Linux-Based FPGA Platform for Audio DSP . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

Pierre Cochard, Maxime Popoff, Antoine Fraboulet, Tanguy Risset, Stephane Letz and
Romain Michon

DJeye: Towards an Accessible Gaze-Based Musical Interface for Quadriplegic DJs . . . . . . . . 117

Fabio Bottarelli, Nicola Davanzo, Giorgio Presti and Federico Avanzini

EFFICIENT SIMULATION OF ACOUSTIC PHYSICAL MODELS WITH
NONLINEAR DISSIPATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Riccardo Russo, Michele Ducceschi, Stefan Bilbao and Matthew Hamilton

musif: a Python package for symbolic music feature extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

Ana Llorens, Federico Simonetta, Mart́ın Serrano and Álvaro Torrente

Score-Informed MIDI Velocity Estimation for Piano Performance by FiLM Conditioning . . 139

Hyon Kim, Marius Miron and Xavier Serra

A Web-Based MIDI 2.0 Monitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

Federico Avanzini, Vanessa Faschi and Luca Andrea Ludovico

Web Applications for Automatic Audio-to-Score Synchronization with Iterative
Refinement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

Adriano Baratè, Goffredo Haus, Luca Andrea Ludovico, Giorgio Presti, Stefano Di
Bisceglie, Alessandro Minoli and Davide Andrea Mauro

Multi-Source Contrastive Learning for Musical Audio. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

Christos Garoufis, Athanasia Zlatintsi and Petros Maragos

A Microcontroller-Based Network Client Towards Distributed Spatial Audio . . . . . . . . . . . . . . 170

Thomas Rushton, Romain Michon and Stéphane Letz

Principal Component Analysis of binaural HRTF pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

Georgios Marentakis

EXPLORING POLYPHONIC ACCOMPANIMENT GENERATION USING
GENERATIVE ADVERSARIAL NETWORKS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186

Danae Charitou, Christos Garoufis, Athanasia Zlatintsi and Petros Maragos

A real-time cent-sensitive strobe-like tuning software based on spectral estimates of the
Snail-Analyser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

Thomas Hélie, Charles Picasso, Robert Piéchaud, Michaël Jousserand and Tom Colinot

Accessible Sonification of Movement: A case in Swedish folk dance . . . . . . . . . . . . . . . . . . . . . . . 201

Olof Misgeld, Hans Lindetorp and Andre Holzapfel

The ”Collective Rhythms Toolbox”: an audio-visual interface for coupled-oscillator
rhythmic generation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

Nolan Lem

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

ii


AUTOMATIC LEGATO TRANSCRIPTION BASED ON ONSET DETECTION . . . . . . . . 214

Simon Falk, Bob Sturm and Sven Ahlbäck

Post-mix vocoding and the making of All You Need Is Lunch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222

Miller Puckette and Kerry Hagan

Using Deep Learning and Low-Frequency Fourier Analysis to Predict Parameters of
Coupled Non-Linear Oscillators for the Generation of Complex Rhythms . . . . . . . . . . . . . . . . . 227

Luc Döbereiner

Music Boundary Detection Using Local Contextual Information Based on
Implication-Realization Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232

Kaede Noto, Akira Maezawa, Yoshinari Takegawa, Takuya Fujishima and Keiji Hirata

Sound Design Strategies For Latent Audio Space Explorations Using Deep Learning
Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239

Kivanç Tatar, Kelsey Cotton and Daniel Bisig

POLYSPRING : A PYTHON TOOLBOX TO MANIPULATE 2-D SOUND
DATABASE REPRESENTATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247

Victor Paredes, Jules Françoise and Frederic Bevilacqua

OUR SOUND SPACE (OSS) – AN INSTALLATION FOR PARTICIPATORY AND
INTERACTIVE EXPLORATION OF SOUNDSCAPES. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

Maurizio Goina, Roberto Bresin and Romina Rodela

Developing and evaluating a Musical Attention Control Training game application . . . . . . . . 261

Anja Volk, Ermis Chalkiadakis, Sander Bakkes, Laurien Hakvoort and Rebecca S.
Schaefer

Modeling Piano Fingering Decisions with Conditional Random Fields. . . . . . . . . . . . . . . . . . . . . 269

David Randolph, Barbara Di Eugenio and Justin Badgerow

Generating symbolic music using diffusion models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Lilac Atassi

Embodied Tempo Tracking with a Virtual Quadruped . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283

Alex Szorkovszky, Frank Veenstra, Olivier Lartillot, Alexander Jensenius and Kyrre
Glette

A COMPARATIVE ANALYSIS OF LATENT REGRESSOR LOSSES FOR SINGING
VOICE CONVERSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289

Brendan O’Connor and Simon Dixon

Quantifying the Extended Acceptance of Pioneering Art Music Through the Creation of
Electroacoustic Music . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296

Soma Arai, Hiroyuki Yaguchi, Hidefumi Ohmura, Ludger Brümmer and Takuro
Shibayama

A Qualitative Investigation of Binaural Spatial Music in Virtual Reality . . . . . . . . . . . . . . . . . . 304

Sandra Mahlamäki

A digital toolbox for musical analysis of computer music: exploring music and
technology through sonic experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

Michael Clarke, Frédéric Dufeu and Keitaro Takahashi

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

iii


Citation is not Collaboration: Music-Genre Dependence of Graph-Related Metrics in a
Music Credits Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

Giulia Clerici and Marco Tiraboschi

Ding-dong: Meaningful Musical Interactions with Minimal Input . . . . . . . . . . . . . . . . . . . . . . . . . 323

Ciaran Frame

A Comparative Computational Approach to Piano Modeling Analysis . . . . . . . . . . . . . . . . . . . . 330

Riccardo Simionato and Stefano Fasciani

RESURRECTING THE VIOLINO ARPA: A MUSEUM EXHIBITION . . . . . . . . . . . . . . . . . . 338

Simon Rostami Mosen, Stefania Serafin, Ali Adjorlu, Ulla Hahn Ranmar and Marie
Martens

Song Popularity Prediction using Ordinal Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346

Michael Vötter, Maximilian Mayerl, Eva Zangerle and Günther Specht

Real-Time Implementation of the Kirchhoff Plate Equation using Finite-Difference
Time-Domain Methods on CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354

Zehao Wang, Stefan Bilbao, Tom Erbe and Miller Puckette

Playing the Design: Creating Soundscapes through Playful Interaction . . . . . . . . . . . . . . . . . . . 362

Ricardo Atienza, Hans Lindetorp and Kjetil Falkenberg

WHAT IS THE COLOR OF CHORO? COLOR PREFERENCES FOR AN
INSTRUMENTAL BRAZILIAN POPULAR MUSIC GENRE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 370

Philip Berrez, Tiago Maranhao, Martin Kihl and Roberto Bresin

Sculpting Algorithmic Pattern: Informal and Visuospatial Interaction in Musical
Instrument Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377

Jack Armitage, Thor Magnusson and Andrew McPherson

”Video Accompaniment”: Synchronous Live Playback for Score-Aligned Animation . . . . . . . 385

Kaitlin Pet, Nikki Pet and Christopher Raphael

Heat-sensitive sonic textiles: increasing awareness of the energy we save by wearing
warm fabrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395

Vincenzo Madaghiele, Arife Dila Demir and Sandra Pauletto

Sonifying energy consumption using SpecSinGAN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403

Sandra Pauletto, Adrián Barahona-Rı́os, Vincenzo Madaghiele and Yann Seznec

A Real-Time Cochlear Implant Simulator - Design and Evaluation. . . . . . . . . . . . . . . . . . . . . . . . 410

Christina Steinhauer, Tobias Lykke Sønderbo and Razvan Paisa

AN ARTISTIC AUDIOTACTILE INSTALLATION FOR AUGMENTED MUSIC. . . . . . . . 419

Jeremy Marozeau

Salient Sights and Sounds: Comparing Visual and Auditory Stimuli Remembrance using
Audio Set Ontology and Sonic Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426

Laura McHugh, Chih-Wei Wu, Xuanling Xu and Kjetil Falkenberg

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

iv


musif: a Python package for symbolic music feature extraction

Ana Llorens,1 Federico Simonetta,2 Martı́n Serrano,2 Álvaro Torrente1, 2

1 Department of Musicology, Universidad Complutense de Madrid, Madrid, Spain
2 Instituto Complutense de Ciencias Musicales, Madrid, Spain

{first letter of the first name}{surname}@iccmu.es

ABSTRACT

In this work, we introduce musif, a Python package that fa-
cilitates the automatic extraction of features from symbolic
music scores. The package includes the implementation of
a large number of features, which have been developed by
a team of experts in musicology, music theory, statistics,
and computer science. Additionally, the package allows
for the easy creation of custom features using commonly
available Python libraries. musif is primarily geared to-
wards processing high-quality musicological data encoded
in MusicXML format, but also supports other formats com-
monly used in music information retrieval tasks, including
MIDI, MEI, Kern, and others. We provide comprehensive
documentation and tutorials to aid in the extension of the
framework and to facilitate the introduction of new and
inexperienced users to its usage.

1. INTRODUCTION

The abstraction represented in music scores, which are sym-
bolic representations of music, has been shown to be highly
relevant for both cognitive and musicological studies. In
cognitive studies, the abstraction process used by human
music cognition to categorize sound is important to un-
derstand how we identify and perceive different musical
aspects, such as timbres, pitches, durations, and rhythms [1].
In musicological studies, the abstraction represented in mu-
sic scores is important as it provides a direct source of
information to understand how the music was constructed.
Throughout history, these aspects have been encoded in dif-
ferent forms, with common Western music notation being
the most widely used in the Western world for centuries.
Therefore, music notation is considered of paramount im-
portance in the field of musicology.

In the field of sound and music computing, however, re-
search has primarily focused on analyzing music in the
audio domain, while other modalities such as images and
scores have received less attention [2]. Researchers inter-
ested in applying machine learning methods to the analysis
of music scores will likely seek methods for representing
them in a suitable way. In the context of modern deep
learning and machine learning, two main approaches have
emerged: feature learning [3] and feature extraction [4, 5].

Copyright: © 2023 Ana Llorens et al. This is an open-access article distributed

under the terms of the Creative Commons Attribution 3.0 Unported License, which

permits unrestricted use, distribution, and reproduction in any medium, provided the

original author and source are credited.

Feature learning – or representation learning – involves us-
ing algorithms to learn the features from the data in a way
that is optimal to the specific statistical inference problem
and is mainly applied with Neural Networks [6–8]; feature
extraction, instead, involves the computation of generic and
hand-crafted features, needing further successive steps such
as feature selection and dimensionality reduction. Both
approaches have their own set of advantages and disadvan-
tages and the choice of which approach to use will depend
on the specific task and the available data. Here we focus
on the latter exclusively.

Feature extraction has widely been used in various ma-
chine learning tasks and has been partially successful in
music computing [9–11]. However, a major drawback is the
effort and time required to craft useful features for a specific
task. To address this issue, researchers have previously pro-
posed software tools that assist in extracting features from
music, such as audio files and scores. Additionally, with
the advancement of modern computer languages such as
Python and JavaScript, the implementation of new features
has become easier and more accessible.

Musicologists may also resort to feature extraction, espe-
cially in the context of the so-called corpus studies. In fact,
existing software for symbolic music feature extraction –
e.g. jSymbolic [5] – was partly designed to help musicol-
ogists obtain the data they required in a fast and accurate
way. This is especially important because the computation
of the features could hardly be achieved by the manual work
of musicologists, who, as of today, devote time to manual
annotations such as harmony [12] and cadence [13, 14]. Ex-
amples of such feature-driven, computational musicology
can be found in studies of musical form [15], harmony [16],
and compositional styles [17–21], among others.

In this work, we introduce a software tool named musif,
which offers a comprehensive collection of features that are
extracted from various file formats. The tool is designed
to be easily extensible using the Python programming lan-
guage and is specifically tailored for 18th-century opera
arias, although it has been tested on a variety of other reper-
toires, including Renaissance and Pop music. Furthermore,
in contrast to previous software [4, 5], musif is developed
with a focus on musicological studies and is thus geared
towards high-quality music datasets, addressing the issue
of limited data availability that is commonly encountered
in feature learning methods.

To aid in its usage, musif is accompanied by detailed
software documentation 1 . This documentation provides

1 https://musif.didone.eu

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

132

http://creativecommons.org/licenses/by/3.0/
https://musif.didone.eu


FeatureExtractor

DataProcessor

NaN

NaN

NaN

NaN

NaN

NaN

NaN

NaN

NaN

NaN

NaN

NaN NaN NaN

Score-level feature
extraction

Window-level feature
extraction

DataProcessor

Music Score

Figure 1. Flow chart of the general pipeline for feature extraction with musif from a single score. First, features are
extracted from a music score. If window-level features are used, then a row for each window of measures is generated,
otherwise a single row for the whole score. Then, the DataProcessor cleans the table by removing NaN, replacing them
with 0, and merging or removing the undesired columns.

adequate information for both novice and advanced users,
enabling them to take full advantage of the tool and add
new features and file formats as needed. The project is
developed using open source methods and adopts GitHub
to manage issues and pull requests, as well as to distribute
the source code 2 .

2. DESIGN PRINCIPLES

The development of the musif was guided by four key
design principles.

The foremost principle was the ability to customize and ex-
tend the framework to meet the user’s specific requirements.
This includes the capability to alter the feature extraction
process by introducing new features coded by the user and
by modifying the existing pipeline.

The second principle that guided the development was to
ensure the usability of the software by individuals with min-
imal technical expertise, with musicologists as the primary
target audience. This principle mainly entailed providing
a user-friendly interface for the entire feature extraction
process, with default settings that are deemed optimal. Ad-
ditionally, comprehensive documentation was produced to
aid novice users in understanding the feature extraction
process of symbolic music.

As musicologists were identified as the primary target au-
dience, special attention was paid to the file types supported
by the system. Specifically, an effort was made to find a
combination of file formats that were both easy to create and
able to represent musicological annotations, which could
be used as sources for feature extraction.

The final principle that underpins the entire structure of
musif is its suitability for big data analysis. Specifically,
measures were taken to ensure that the framework was com-
putationally efficient on commercially available computers.

2 https://github.com/DIDONEproject/musif

3. IMPLEMENTATION

3.1 General pipeline

The implementation of musif is mainly based on mu-
sic21 [4] and methodically divided into two primary stages,
both of which are highly configurable. Fig. 1 shows a
flowchart of the general pipeline.

The initial stage pertains to the actual extraction of fea-
tures, during which a substantial number of features are de-
rived from the data. Among these features, some are solely
designed for the calculation of “second-order” features,
which are derived from the primary ones. For instance, the
number of notes on a score may not hold inherent signifi-
cance, but it acquires meaning when considered in relation
to the total length of the score. Therefore, an additional
operation is required to compute the ratio between the num-
ber of notes and features that denote the total duration of
the score, such as the total number of beats. As a result of
this, certain ”first-order” features may not be relevant for
the specific task at hand.

To address this issue, we have implemented an additional
step that we refer to as “post-processing”. In this stage,
certain “first-order” features are eliminated, while others
are aggregated according to the user specifications. For
example, to lower the overall number of features and attain
a more succinct representation, the user may choose to
aggregate features that originate from similar instruments,
such as strings, by utilizing statistical measures such as the
mean, the variance, and other statistical moments. Another
crucial task accomplished during post-processing is the
standardization of representation for missing data, such as
NaN values or empty strings.

The aforementioned two steps correspond to two Python
objects, namely, the FeaturesExtractor and the
DataProcessor. Both of these objects take as input an
extensible configuration, which can be expressed in vari-
ous ways, namely variadic python arguments in the class
constructor and/or a YAML file. The configuration of the

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

133

https://github.com/DIDONEproject/musif


from musif.extract.extract import FeaturesExtractor
from musif.process.processor import DataProcessor

features = FeaturesExtractor(
# here we use `None`, but it could be the path to a YAML file containing
# specifications
None,
# the options below override the YAML file if it is provided
xml_dir="data_notation",
musescore_dir="data_harmony",
basic_modules=["scoring"],
features=["core", "ambitus", "interval", "tempo",

"density", "texture", "lyrics", "scale",
"key", "dynamics", "rhythm"]

).extract()

# For the DataProcessor, the arguments are the extracted table and the path to a YAML
# file
# As before, the YAML file can be overridden by variadic arguments
processed_features = DataProcessor(features, None).process().data

# the output is a pandas DataFrame!

Listing 1: Example of usage feature extraction with default options and stock features

FeaturesExtractor object includes the path to the
data, the features that should be extracted, the paths or
objects containing custom features, and other similar re-
quirements. For its part, the configuration of the
DataProcessor object offers the flexibility to specify
the columns that should be aggregated or removed, as well
as the columns in which NaN values should be replaced
with a default value, such as zero.

The outcome of the entire process is a tabular representa-
tion, with one column per feature and one row per musical
score. Optionally, scores can be analyzed using moving
windows, in which case the output table will have one row
for each window. When using windows, the window size
and overlap can be specified as the number of measures, as
shown in Fig. 2. A sample code that demonstrates the usage
of the tool is provided in Listing 1.

3.2 File formats

Given that our primary objective was to develop a software
tool for musicological applications, it was imperative to
support file formats that are easily usable in musicological
analysis. As such, we carefully considered file formats such
as MusicXML, MEI, and IEEE 1599. These file formats can
represent common Western music notation with a high de-
gree of detail and have been utilized for both musicological
and MIR tasks. However, it was determined that only Mu-
sicXML is fully supported by user-end graphical interfaces.
The requirement for users to possess both musicological
training and the ability to effectively utilize advanced soft-
ware for editing large XML files is a rare combination, and,
as such, it was not deemed a viable inclusion in the design
of the system. Moreover, certain features implemented by
musif are derived from functional, Roman-numeral har-
monic analysis, which cannot be represented in the standard
format of MusicXML. To solve this issue, we have adopted

the MuseScore file format, in line with previous works in
this field [12, 13].

Overall, the recommended file formats for the musif sys-
tem are MusicXML for notation parsing and MuseScore for
harmonic annotations. However, if only MuseScore files
are available, the MuseScore software can be utilized to
generate the necessary MusicXML files. Additionally, alter-
native file formats may be employed in place of MusicXML
by leveraging the music21 library for parsing notation
files. This approach supports a comprehensive array of
file formats. Furthermore, any file format supported by
MuseScore can be utilized through automatic conversion to
MusicXML. This pipeline is particularly recommended for
extracting features from MIDI files.

However, the parsing approach adopted in this system may
be relatively slow when working with a large number of
files. To mitigate this issue, a caching system has been im-
plemented in order to save to disk any property, function, or
method result that originates from music21 objects. This
approach has been tested and has demonstrated a signifi-
cant improvement in processing speed, with a 2 to 3 times
increase in speed observed when cached files are used. This
caching system is particularly useful when designing or
debugging feature extractions on a large number of files, as
it allows for more efficient and expedient processing.

3.3 Customization

To facilitate customization of the feature extraction process,
three main tools are available. These tools allow for more
flexibility and precision in the feature extraction process,
enabling users to tailor the process to their specific needs
and requirements. These tools are further described in the
subsequent list.

1. Custom features: The user can add custom features

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

134


Figure 2. Example of windowing on a music score. The unit of measure for the window length and overlap are measures. In
this case, windows have a length of 3 and an overlap of 2. At the top of the score, in red, an example of harmonic analysis is
shown.

by developing two simple functions: one to extract
features from each individual part in the score, and
another to extract features from the entire score. This
second function can optionally utilize the features
extracted from the individual parts. Additionally,
the user can specify the extraction order and feature
dependencies, allowing for the use of previously ex-
tracted features in the computation of newer features.
The implementation of these custom features can be
easily accomplished using the music21 Python li-
brary.

2. Hooks: Hooks are user-provided functions that are
called at specific stages of the extraction process. In
the current version of musif, only one type of hook
is possible, namely just after the parsing of the input
files is completed and just before the caching mech-
anism is initialized. The user can provide a list of
functions that accept the parsed score as input and
that are run before the caching mechanism is initial-
ized. When using the cached files, these hooks will
no longer be run. This hook is particularly useful
for modifying the input scores before caching, such
as deleting or modifying unsupported notation ele-
ments from music21 objects, thus mitigating the
constraints of the caching mechanism, which only
allows read-only operations on the scores.

3. Python mechanisms: The Python programming lan-
guage offers a range of advanced methods for mod-
ifying and extending existing software. As musif
is fully implemented in pure Python, these methods
are fully applicable. They include, but are not limited

to, class inheritance, method and property overriding,
and type casting.

4. STOCK FEATURES

musif is distributed with a wide variety of features already
implemented. These sets of features can be selected for ex-
traction using the FeaturesExtractor’s constructor
arguments – see Listing 1 –, while the DataProcessor
can be utilized for further refining the desired features. Each
set corresponds to a specific Python sub-package. The total
number of features varies based on the instrumentation used
in the score and is usually between 500 feature values for
simple monophonic scores and more than 10,000 feature
values for orchestral scores. In this presentation, we will
provide a brief summary of each of these modules. For
those who wish to carefully select features, more detailed
information can be found in the online documentation, in-
cluding pre-made Python regular expressions that can be
used to easily select the desired features.

In general, all the features were designed to be meaning-
ful for musicologists and music theorists, giving value to
studies attempting to explain statistical results on the basis
of the features. The modular structure of the features also
allows researchers to conveniently focus their analysis on
only certain aspects of the music.

Here, we will use the word sound to refer to a specific
timbre – e.g. violin – which can be repeated multiple times
in the score – e.g. violin I and violin II. Moreover, we will
use family to refer to a family of instruments – e.g. strings,
voices, brasses, and so on. The stock feature modules
available in musif are as follows:

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

135


• Core: These features are essential for the identifica-
tion of music scores and for subsequent elaboration.
They are always required and include the total num-
ber of measures and notes, as well as the number of
measures containing notes and their averages for each
sound or part and for each family and/or score. Other
examples of such features include the filename of the
score, the time signature, and the key signature.

• Scoring: This module computes features that are re-
lated to the instrumentation and voices used in the
score. Examples of features in this module include
the instruments, families, and parts present in the
score, as well as the number of parts for each instru-
ment and family in the score. This module can be
used to get a better understanding of the orchestration
used in the composition.

• Key: This module computes features that are related
to the key signature and tonality, i.e., the key, of the
piece. Examples of features in this module include
the Krumhansl-Schmuckler tonality estimation [22],
the key signature, and the mode (major or minor).
This module allows for analyzing the underlying
tonal system used in the composition.

• Tempo: This module computes features that are re-
lated to the tempo marking on the score. It should be
noted that since some features depend on the termi-
nology used by the composer for the tempo indica-
tion, some of these features may not be reliable for
all repertoires. In fact, as the composers’ marking
need not be expressed quantitatively – it is actually
more typical in some repertoires to have just a ver-
bal indication – the numerical values extracted by
musif ultimately depend on the BPM value given
during the engraving process, if available.

• Density: These features relate the number of notes
with respect to the total number of measures, as well
as with respect to the total number of measures that
contain sound, for a single part, sound, or family.
This module provides insights into the density of
the sound in the composition and allows comparing
the activity level of different parts or families in the
score.

• Harmony: This is one of the largest feature modules;
it computes features based on the harmonic annota-
tions provided in the MuseScore files according to
a previous standard [12, 13]. Examples of these fea-
tures include the number of harmonic annotations,
the number of chords performing the tonic, dominant,
and sub-dominant functions, the harmonic rhythm –
i.e. the rate of harmonic changes in relation to the
number of beats or measures –, as well as features
related to modulations annotated in the MuseScore
files. This module can be used to get a better under-
standing of the harmonic structure of the composition
and to analyze the harmonic progressions used in the
composition.

• Rhythm: This module computes features related to
the note durations and to particular rhythmic figures,
such as dotted and double-dotted rhythms. Examples
of features in this module include the average note
duration and the frequency of particular rhythmic
figures. This module analyzes the rhythmic structure
of the composition and the rhythmic patterns used in
it.

• Scale: This module computes features related to spe-
cific melodic degrees with respect to the main key of
the score, as computed in the key module, and to the
local key, as provided in the MuseScore harmonic
annotations. Examples of features in this module
include the frequency of specific scale degrees in a
given part.

• Dynamics: This module computes features related
to the distribution of dynamic markings across the
score, by assigning numerical values to each dynamic
marking according to their corresponding intensity.
As is the case with tempo, the specific numerical
value of a given dynamic marking is assigned during
the engraving process, with some software assigning
default values that the engraver may need to modify
depending on the notation conventions. Similarly to
other features, this module may not be completely
generalizable to some repertoires, as the interpreta-
tion of dynamic markings can vary across different
compositions and styles, or even be completely ab-
sent. Examples of features in this module include
the frequency of specific dynamic markings, the av-
erage dynamic level, and the distribution of dynamic
markings across the score. This module extracts in-
formation about the expressivity of the composition
and analyzes the use of dynamic contrasts in it.

• Ambitus: This module computes the ambitus, or
melodic range, of the piece in semitones, for the
whole piece as well as for each individual part, sound,
or family. It also computes the lowest and highest
pitches and the note names thereof.

• Melody: This module computes an extensive num-
ber of features related to the distribution and types of
melodic intervals for each part, voice, sound, and fam-
ily. This is the largest set of features within musif.
Examples of features in this module include the fre-
quency of specific interval types, the distribution of
interval sizes, and the proportion of ascending and
descending intervals. This module provides insights
into the melodic structure of the composition by ana-
lyzing the use of specific intervals in it.

• Lyrics: This module considers the alignment between
lyrics, if available, and the notes and computes fea-
tures related to their distribution. Examples of fea-
tures in this module include the total number of sylla-
bles in each vocal part, the average number of notes
per syllable, and the proportion of measures that con-
tain notes for each vocal part in the score. This mod-
ule can facilitate a more profound comprehension

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

136


of the relationships between lyrics and music in the
composition.

• Texture: This module computes the ratio of the num-
ber of notes between two parts, considering all the
possible pairs of parts. This feature can provide in-
sight into the relative density and activity level of
different parts in the score and can be used to analyze
the texture of the composition.

5. DISCUSSION AND FUTURE WORKS

This work presents the musif module to the scientific
community as a tool for the extraction of features from
symbolic music scores. It is designed with a focus on
extensibility and customization, while also providing good
defaults for the novice user and supporting musicologically-
curated datasets. The module is implemented in Python,
and it provides a comprehensive set of features covering
various aspects of music scores, including harmony, rhythm,
melody, and many more. The modular structure of the
musif makes it easy to use and customize according to the
user’s needs.

In comparison to existing software such as jsymbolic [5]
and music21 [4], musif offers a significantly larger num-
ber of features, approximately 2 times larger. Additionally,
jsymbolic computes features based on pure MIDI en-
coding, with only 2 features based on the MEI format. This
is an essential aspect for musicological studies as MIDI,
although commonly used in the MIP field, is not capable
of representing various characteristics of music notation,
such as alterations, key signatures, rhythmic and dynamic
annotations, chords, and lyrics.
music21 already implements several features based on

its powerful parsing engine, which allows it to take full ad-
vantage of MusicXML, MEI, and Kern features. However,
musif expands upon this set of computable features while
remaining completely based on music21 and allowing
the automatic extraction of features at the window level.
Furthermore, it includes a caching system that allows for
improved performance during the feature extraction process.
This caching system saves the results of computations to
disk, reducing the need to perform the same calculations
multiple times, thus making the extraction process more
efficient. Thus, musif provides a more extensive set of
features while being highly performant in its extraction pro-
cess, making it a valuable tool for researchers in the field of
music information retrieval and musicology.

While this paper describes the release of musif 1.0, we
are aware that there is wide room to improve musif fur-
ther, making it faster, more general, usable, and accurate.
Specifically, we want to improve three aspects of the soft-
ware:

• Data visualization: we want to provide the user
with tools that help the visualization of the data that
musif extracts; this aspect would particularly be
useful for preliminary analysis.

• Repertoire: As of now, musif has been tested on
other types of corpora for different music styles,

including EWLDc̃itesimonetta2018symbolic, Hum-
drum database [23], piano scores and performances [24],
and masses from the Renaissance [25]. It has addi-
tionally been utilized on an in-house corpus of more
than 1600 opera arias. For this reason, most of the
design choices and of the implemented features target
this repertoire. We want to make it more powerful
and efficient for other repertoires too.

• More numerical features: Although musif already
provides a wide set of musical features, we are sure
that many other features could be defined and in-
cluded in musif, empowering both musicological
analysis and data science studies.

We also plan to study in more depth the comparison be-
tween the existing tools for music feature extraction, includ-
ing benchmarks and test performances.

While we continue working on these paths, we hope that
musif can be a valuable tool for the Sound and Music
Computing community and welcome any suggestions or
contributions to the software. We encourage the community
to use and test musif and provide feedback so that we can
continue to improve and develop it further. It is our goal to
make musif a widely used and reliable tool for MIP and
musicology research.

Acknowledgments

This work is a result of the Didone Project [26], which
has received funding from the European Research Council
(ERC) under the European Union’s Horizon 2020 research
and innovation program, Grant agreement No. 788986. It
has also been conducted with funding from Spain’s Min-
istry of Science and Innovation (IJC2020-043969-I/AEI/-
10.13039/501100011033).

6. REFERENCES

[1] D. Deutsch, The Psychology of Music, 3rd ed. Aca-
demic Press, 2013.

[2] F. Simonetta, S. Ntalampiras, and F. Avanzini, “Mul-
timodal Music Information Processing and Retrieval:
Survey and Future Challenges,” in Proceedings of 2019
International Workshop on Multilayer Music Repre-
sentation and Processing. Milan, Italy: IEEE Con-
ference Publishing Services, 2019, pp. 10–18. doi:
10.1109/mmrp.2019.00012

[3] Y. Bengio, A. Courville, and P. Vincent, “Representa-
tion Learning: A Review and New Perspectives,” IEEE
Transactions on Pattern Analysis and Machine Intelli-
gence, vol. 35, no. 8, pp. 1798–1828, Aug. 2013. doi:
10.1109/TPAMI.2013.50

[4] M. S. Cuthbert, C. Ariza, and L. Friedland, “Feature
extraction and machine learning on symbolic music us-
ing the music21 toolkit,” in Proceedings of the 12th
International Society for Music Information Retrieval
Conference, ISMIR 2011, Miami, Florida, USA, Oc-
tober 24-28, 2011. University of Miami, 2011, pp.
387–392. doi: 10.5281/zenodo.1416288

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

137

https://doi.org/10.13039/501100011033
https://doi.org/10.13039/501100011033
https://doi.org/10.1109/mmrp.2019.00012
https://doi.org/10.1109/TPAMI.2013.50
https://doi.org/10.5281/zenodo.1416288


[5] C. McKay, J. Cumming, and I. Fujinaga, “jSymbolic
2.2: Extracting features from symbolic music for use
in musicological and MIR research,” in Proceedings of
the 19th International Society for Music Information
Retrieval Conference, 2018. ISBN 978-2-9540351-2-3
pp. 348–354. doi: 10.5281/zenodo.1492421

[6] F. Simonetta, C. E. Cancino-Chacón, S. Ntalampiras,
and G. Widmer, “A convolutional approach to melody
line identification in symbolic scores,” in Proceedings
of the 20th International Society for Music Information
Retrieval Conference. Delft, The Netherlands: ISMIR,
Nov. 2019, pp. 924–931. doi: 10.5281/zenodo.3527966

[7] M. Prang and P. Esling, “Signal-domain representa-
tion of symbolic music for learning embedding spaces,”
Stockholm, Sweden, p. 10, Oct. 2020.

[8] P. Lisena, A. Meroño-Peñuela, and R. Troncy,
“MIDI2vec: Learning MIDI embeddings for reliable
prediction of symbolic music metadata,” Semantic Web,
vol. 13, no. 3, pp. 357–377, Jan. 2022. doi: 10.3233/SW-
210446

[9] L. Bigo, M. Giraud, R. Groult, N. Guiomard-Kagan, and
F. Levé, “Sketching sonata form structure in selected
classical string quartets.” in Proceedings of the 18th
International Society for Music Information Retrieval
Conference. Suzhou, China: ISMIR, Oct. 2017, pp.
752–759. doi: 10.5281/zenodo.1415020

[10] K. C. Kempfert and S. W. K. Wong, “Where does
Haydn end and Mozart begin? Composer classifica-
tion of string quartets,” Journal of New Music Re-
search, vol. 49, no. 5, pp. 457–476, Oct. 2020. doi:
10.1080/09298215.2020.1814822

[11] F. Simonetta, F. Avanzini, and S. Ntalampiras, “A Per-
ceptual Measure for Evaluating the Resynthesis of Au-
tomatic Music Transcriptions,” Multimedia Tools and
Applications, 2022. doi: 10.1007/s11042-022-12476-0

[12] M. Neuwirth, D. Harasim, F. C. Moss, and
M. Rohrmeier, “The Annotated Beethoven Corpus
(ABC): A Dataset of Harmonic Analyses of All
Beethoven String Quartets,” Frontiers in Digital Hu-
manities, vol. 5, 2018. doi: 10.3389/fdigh.2018.00016

[13] J. Hentschel, M. Neuwirth, and M. Rohrmeier, “The
Annotated Mozart Sonatas: Score, Harmony, and Ca-
dence,” Transactions of the International Society for
Music Information Retrieval, vol. 4, no. 1, pp. 67–80,
May 2021. doi: 10.5334/tismir.63

[14] O. Raz, D. Chawin, and U. B. Rom, “The Mozart Expo-
sitional Punctuation Corpus: A Dataset of Interthematic
Cadences in Mozart’s Sonata-Allegro Expositions,” Em-
pirical Musicology Review, vol. 16, pp. 134–144, 2021.
doi: 10/grq2fp

[15] F. Moss, W. Fernandes de Souza, and M. Rohrmeier,
“Harmony and Form in Brazilian Choro: A Corpus-
Driven Approach to Musical Style Analysis,” Journal

of New Music Research, vol. 49, pp. 416–437, 2020.
doi: 10/grq2fm

[16] J. Hentschel, F. C. Moss, A. McLeod, M. Neuwirth,
and M. Rohrmeir, “Towards a Unified Model of Chords
in Western Harmony,” in Music Encoding Conference
2021, 2022, pp. 143–149. doi: 10/grq2fk

[17] A. Llorens and A. Torrente, “Constructing opera seria
in the Iberian Courts: Metastasian Repertoire for Spain
and Portugal,” Anuario Musical, vol. 76, pp. 73–110,
Jul. 2021. doi: 10/grq2fn

[18] M. E. Cuenca and C. McKay, “Exploring Musical Style
in the Anonymous and Doubtfully Attributed Mass
Movements of the Coimbra Manuscripts: A Statisti-
cal Approach,” in Medieval and Renaissance Music
Conference, 2019.

[19] V. Anzani and A. Llorens, “Shaping Eighteenth-Century
Opera: The Singer’s Impact,” in Tosc@ Junior Confer-
ence, 2021.

[20] A. Torrente, “Didone trasmutata: Aria Settings and
the Expression of Emotions in Metastasian Operas,” in
Mapping Artistic Networks of Italian Theatre and Opera
across Europe, 1600–1800, 2019.

[21] E. Rodriguez-Garcia and C. McKay, “Ave festiva fer-
culis: Exploring Attribution by Combining Manual and
Computational Analysis.” in Medieval and Renaissance
Music Conference, 2021.

[22] C. L. Krumhansl, Cognitive Foundations of Musical
Pitch. Oxford University Press, 1990. ISBN 0-19-
505475-X

[23] C. S. Sapp, “Online Database of Scores in the Humdrum
File Format,” in Proceedings of the 6th International
Conference on Music Information Retrieval, 2005, p. 2.
doi: 10.5281/zenodo.1417281

[24] F. Foscarin, A. Mcleod, P. Rigaux, F. Jacquemard, and
M. Sakai, “ASAP: A dataset of aligned scores and
performances for piano transcription,” in Proceedings
of the 21st International Society for Music Informa-
tion Retrieval, 2020, Proceedings. doi: 10.5281/zen-
odo.4245489

[25] J. Cumming, C. McKay, J. Stuchbery, and I. Fujinaga,
“Methodologies for Creating Symbolic Corpora of West-
ern Music Before 1600,” in Proceedings of the ISMIR.
Paris, France: ISMIR, Sep. 2018, pp. 491–498. doi:
10.5281/zenodo.1492459

[26] A. Torrente and A. Llorens, “The Musicology Lab:
Teamwork and the Musicological Toolbox,” in Mu-
sic Encoding Conference 2021, 2022, pp. 9–20. doi:
10/grqp2b

Proceedings of the Sound and Music Computing Conference 2023, Stockholm, Sweden

138

https://doi.org/10.5281/zenodo.1492421
https://doi.org/10.5281/zenodo.3527966
https://doi.org/10.3233/SW-210446
https://doi.org/10.3233/SW-210446
https://doi.org/10.5281/zenodo.1415020
https://doi.org/10.1080/09298215.2020.1814822
https://doi.org/10.1007/s11042-022-12476-0
https://doi.org/10.3389/fdigh.2018.00016
https://doi.org/10.5334/tismir.63
https://doi.org/10/grq2fp
https://doi.org/10/grq2fm
https://doi.org/10/grq2fk
https://doi.org/10/grq2fn
https://doi.org/10.5281/zenodo.1417281
https://doi.org/10.5281/zenodo.4245489
https://doi.org/10.5281/zenodo.4245489
https://doi.org/10.5281/zenodo.1492459
https://doi.org/10/grqp2b


Proceedings of the
Sound and Music Computing

Conference 2023

Royal College of Music and
KTH Royal Institute of Technology

in Stockholm, Sweden


	Binder3.pdf
	SMC2023_cover_page

	SMC2023_proceedings.pdf
	preface.pdf
	toc.pdf
	pc.pdf
	paper_199
	paper_417
	paper_568
	paper_631
	paper_815
	paper_986
	paper_1119
	paper_1314
	paper_1411
	paper_1454
	paper_1717
	paper_1956
	paper_2102
	paper_2164
	paper_2465
	paper_2722
	paper_2881
	paper_2999
	paper_3407
	paper_3854
	paper_4133
	paper_4386
	paper_4772
	paper_5030
	paper_5060
	paper_5179
	paper_5282
	paper_5331
	paper_5469
	paper_5588
	paper_5641
	paper_5662
	paper_5727
	paper_5923
	paper_6100
	paper_6312
	paper_6752
	paper_6896
	paper_7179
	paper_7290
	paper_7327
	paper_7416
	paper_7440
	paper_7462
	paper_7494
	1. INTRODUCTION
	Overview and Conceptualisation

	2. Theoretical framework:  Auralisation
	2.1 Geometry-Based Auralisation
	2.2 Perception-Based Auralisation

	3. research questions and  methodology
	4. experiment procedure
	4.1 The Two Experiment Versions Explained –   Differences in Music Spatialisation
	4.2 Spatial Music Static Auralisation
	4.3 Spatial Music Object-Based Auralisation
	4.4 Adaptive Music Auralisation in the Second   Experiment Version
	4.5 Hybrid of Auralisation Techniques

	5. Results of the qualitative  inquiry
	6. Discussion
	7. conclusions
	8. REFERENCES

	paper_7681
	paper_7850
	paper_8079
	paper_8112
	paper_8117
	paper_8127
	paper_8412
	paper_8636
	paper_8654
	paper_9431
	paper_9594
	paper_9600
	paper_9619
	paper_9832
	paper_9850
	paper_9932
	keyword_index.pdf
	author_index.pdf