Exploring the perception of extreme metal vocals via verbal associations and audio feature analysis

Growled vocals in extreme metal involve low harmonicity and high roughness and are associated with expressive traits like "aggressiveness" (Tsai et al., 2010). Audio features can help classify them into broad style categories (Nieto, 2013; Kalbag & Lerch, 2022). However, despite a variety of subgenres and vocal effects specific to the genre, the perceptual organization of these styles has not yet been empirically explored more closely in regards to verbal associations. Which audio features correspond with the perception of different expressive techniques and styles remains unknown.

We aim to analyze extreme metal vocals via a semantically meaningful space of verbal associations and their correspondence with audio features.

To determine this, we extracted short phrases from 115 professional metal vocal tracks. 10 were pilot-rated by subjects for pairwise similarity (45 comparisons). The resulting similarity matrix forms the basis for a perceptual similarity space computed using multidimensional scaling (MDS). In a second experiment, we collected verbal associations for the whole dataset of vocal excerpts. From those adjectives, we constructed a semantic map computationally via co-occurrence analysis. We analyzed the verbal associations in connection with spectral and temporal audio features extracted from the excerpts.

Preliminary analyses reveal a three-dimensional similarity space whose first major axis contrasts harmonic vs. inharmonic/rough vocals (Harmonic-to-Noise Ratio: r = 0.837, p = 0.005; Spectral Complexity: r = -0.959, p < 0.001). While the second perceptual dimension shows no correlations with extracted sound features, the third dimension relates to the position of the higher formants (e.g., F2: r = -0.855, p = 0.003). A dichotomy between harmonic/inharmonic vocals constitutes the predominant axis of the semantic network of verbal associations, which shows subclusters relating to fine-grained descriptions. When projecting the verbal associations onto the acoustic space, descriptions like "demonic," "scratchy," "powerful," or "angelic" map along specific positions.

Discussion and conclusion
A web application allows interactive exploration of the acoustic-semantic space with psychoacoustically relevant audio features.
