Reference Sample Size and the Computation of Numerical Likelihood Ratios Using Articulation Rate

Vincent Hughes, Ashley Brereton, Erica Gold

Research output: Contribution to journalArticle

Abstract

This paper explores the effects of variability in the amount of reference data used in quantifying the strength of speech evidence using numerical likelihood ratios (LRs). Monte Carlo simulations (MCS) are performed to generate synthetic data from a sample of existing raw local articulation rate (AR) data. LRs are computed as the number of reference speakers (up to 1000), and the number of tokens per reference speaker (up to 200) is systematically increased. The distributions of same-speaker and different-speaker LRs and system performance (log LR cost (Cllr) and equal error rate (EER)) are assessed
as a function of the size of the reference data. Results reveal that LRs based on AR are relatively robust to small reference samples, but that system calibration plays an important role in determining the sensitivity of the LRs to sample size.
LanguageEnglish
JournalYork Papers in Linguistics
Issue number13
Publication statusPublished - Dec 2013
Externally publishedYes

Fingerprint

reference sample
calibration
rate
cost
simulation

Cite this

@article{2f70fe7d70dd4c789edd0b685a050f06,
title = "Reference Sample Size and the Computation of Numerical Likelihood Ratios Using Articulation Rate",
abstract = "This paper explores the effects of variability in the amount of reference data used in quantifying the strength of speech evidence using numerical likelihood ratios (LRs). Monte Carlo simulations (MCS) are performed to generate synthetic data from a sample of existing raw local articulation rate (AR) data. LRs are computed as the number of reference speakers (up to 1000), and the number of tokens per reference speaker (up to 200) is systematically increased. The distributions of same-speaker and different-speaker LRs and system performance (log LR cost (Cllr) and equal error rate (EER)) are assessedas a function of the size of the reference data. Results reveal that LRs based on AR are relatively robust to small reference samples, but that system calibration plays an important role in determining the sensitivity of the LRs to sample size.",
author = "Vincent Hughes and Ashley Brereton and Erica Gold",
year = "2013",
month = "12",
language = "English",
journal = "York Papers in Linguistics",
issn = "1758-0315",
number = "13",

}

Reference Sample Size and the Computation of Numerical Likelihood Ratios Using Articulation Rate. / Hughes, Vincent; Brereton, Ashley; Gold, Erica.

In: York Papers in Linguistics, No. 13, 12.2013.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Reference Sample Size and the Computation of Numerical Likelihood Ratios Using Articulation Rate

AU - Hughes, Vincent

AU - Brereton, Ashley

AU - Gold, Erica

PY - 2013/12

Y1 - 2013/12

N2 - This paper explores the effects of variability in the amount of reference data used in quantifying the strength of speech evidence using numerical likelihood ratios (LRs). Monte Carlo simulations (MCS) are performed to generate synthetic data from a sample of existing raw local articulation rate (AR) data. LRs are computed as the number of reference speakers (up to 1000), and the number of tokens per reference speaker (up to 200) is systematically increased. The distributions of same-speaker and different-speaker LRs and system performance (log LR cost (Cllr) and equal error rate (EER)) are assessedas a function of the size of the reference data. Results reveal that LRs based on AR are relatively robust to small reference samples, but that system calibration plays an important role in determining the sensitivity of the LRs to sample size.

AB - This paper explores the effects of variability in the amount of reference data used in quantifying the strength of speech evidence using numerical likelihood ratios (LRs). Monte Carlo simulations (MCS) are performed to generate synthetic data from a sample of existing raw local articulation rate (AR) data. LRs are computed as the number of reference speakers (up to 1000), and the number of tokens per reference speaker (up to 200) is systematically increased. The distributions of same-speaker and different-speaker LRs and system performance (log LR cost (Cllr) and equal error rate (EER)) are assessedas a function of the size of the reference data. Results reveal that LRs based on AR are relatively robust to small reference samples, but that system calibration plays an important role in determining the sensitivity of the LRs to sample size.

M3 - Article

JO - York Papers in Linguistics

T2 - York Papers in Linguistics

JF - York Papers in Linguistics

SN - 1758-0315

IS - 13

ER -