A web interface for acoustic-semantic analysis of extreme metal vocal styles

Activity: Talk or presentation typesOral presentation

Description

Background
Vocal styles in extreme metal involve a range of characteristic effects and expressive techniques (e.g., growling, screaming, shrieks) that can serve as markers for different subgenres and convey specific semantic qualities. Particularly, guttural vocals in extreme metal are characterized by low harmonicity and high roughness and associated with expressive traits like "aggressiveness" (Tsai et al., 2010). Audio features can help classify metal vocals into broad style categories (Nieto, 2013; Kalbag & Lerch, 2022). However, despite a variety of subgenres and vocal effects specific to the genre, the perceptual organization of these styles has not yet been empirically explored more closely concerning verbal associations. Which audio features correspond with the perception of different expressive techniques and styles remains unknown.

Aims
We aim to explore extreme metal vocals through a semantically meaningful space of verbal associations and their correspondence with audio features. Based on this, an interactive web application has been developed that allows for the exploration of examples of metal vocals in terms of their perception and acoustic properties.

Methods
To this end, we extracted short phrases from 115 professional metal vocal tracks without accompaniment. 10 excerpts were pilot-rated for pairwise similarity (45 comparisons) by 11 listeners with varying preferences for metal music. The resulting similarity matrix forms the basis for a perceptual similarity space computed using multidimensional scaling (MDS). In a second experiment, we collected verbal associations for the whole dataset of vocal excerpts. Associations could be selected from a predefined list of 71 descriptive tags and provided in a free-text field. From those adjectives, we constructed a semantic co-occurrence network. Further, we investigated the structure of associations via multiple correspondence analysis (MCA). Finally, 69 audio features relating to timbral characteristics of the voice were extracted via Praat (Jadoul et al., 2018), Timbre Toolbox (Peeters et al., 2011), and Essentia (Bogdanov et al., 2013) and related to the resulting semantic dimensions via correlation analysis.

Results
MDS reveals a three-dimensional similarity space whose first dimension contrasts harmonic vs. inharmonic/rough vocals (Harmonic-to-Noise Ratio: r = 0.837, p = 0.005). While the second perceptual dimension shows no correlations with extracted sound features, the third dimension relates to the position of the higher formants (e.g., F2: r = -0.855, p = 0.003). A dichotomy between harmonic/inharmonic vocals constitutes the predominant axis of the semantic network of verbal associations, which shows subclusters relating to fine-grained descriptions. Along the same line, the first dimension of the MCA shows an almost perfect correlation with acoustic Harmonicity (r = 0.932, p < 0.001). The second seems to relate to higher-level concepts potentially distinguishing traditional metal genres from newer, more controversial styles like metalcore (see Smialek, 2023). When projecting the verbal associations onto the acoustic space, descriptions like "demonic," "scratchy," "powerful," or "angelic" map along specific positions.

A web application allows interactive exploration of the acoustic-semantic space with psychoacoustically relevant audio features. We plan to extend the web application to allow self-recorded examples to be uploaded, which we hope can serve as a helpful resource for vocalists training and evaluating their performance of expressive vocal styles.
Period6 Sep 2024
Event title40th Annual Conference of the German Society for Music Psychology: Digitisation in Music Psychology
Event typeConference
Conference number40
LocationMunich, Germany, BavariaShow on map
Degree of RecognitionInternational