An investigation into the factors that affect user ratings of music productions

  • Daisy Newman-Hills

Student thesis: Master's Thesis


Subjective listening tests like the MUSHRA (Multi Stimulus with Hidden Reference and Anchor) usually take place under controlled conditions, using high-quality listening equipment, with critical listeners. These measures are taken to mitigate uncontrollable factors that can cause ambiguity in test results. The MUSHRA is considered the gold-standard when evaluating audio quality and the quality of audio systems, but may not be representative of real-world listening of popular music productions. This research used a mixed-methods approach to investigate factors that may or may not affect music production ratings in a noncontrolled environment.

For the main experiment, forty-four participants conducted a listening test in their own time, using their own listening device, in a location of their choice. The listening test was devised and distributed online to both critical and naïve listeners. Pre-test questions were included to identify the participants demographic information, listening device and listening environment. The listening test required the participants to evaluate the music production of the top ten most streamed songs on Spotify. The ten songs were from a variety of genres to cover several production techniques and to establish the genre preference of the participants.

The first stage of the listening test incorporated aspects of Wilson & Fazenda’s GUI (2016), where participants were asked about their genre preference, familiarity with each song, and likability of each song. This was to identify if the musical content of the song affected the production ratings. The second and third stages of the listening test incorporated aspects of Overall Listening Experience (OLE). For the second stage, participants were presented with the ten songs and were asked to rate the music production. The songs were unmanipulated and not normalised to identify the participant’s production ratings of each song’s original production content. For the third stage, participants rated the music production of the same 6 songs with added audio degradations. This was to establish if both critical and naïve listeners can hear perceived differences in music production. The audio degradations were created using a variety of processing techniques such as low pass filtering, dynamic range compression, narrowing the stereo image and reducing the clarity of the music production.

During the statistical analysis, participants were split into data groups based on their responses to the pre-test questions. This identified which of the factors affected the music production ratings. The results suggest popular music production evaluations can be conducted in a non-controlled environment with both critical and naïve listeners. Furthermore, the participant’s genre preference, familiarity and likability of the song being assessed does not appear to affect their music production ratings.

The limitations from this research are: Self-selection bias, as the experiment was distributed online an equal amount of participants could not be recruited for each data group. As all factors were investigated during one experiment, there were multiple independent variables and unequal data groups when evaluating each individual factor. Further controlled experiments, focusing on each factor individually and with more participants, could be performed to validate and support the findings of this research.
Date of Award2 Feb 2023
Original languageEnglish
SupervisorChristopher Dewey (Main Supervisor) & Jonathan Wakefield (Co-Supervisor)

Cite this