TY - JOUR
T1 - Identification and quantification of chimeric sequencing reads in a highly multiplexed RAD‐seq protocol
AU - Martin Cerezo, Maria Luisa
AU - Raval, Rohan
AU - De Haro Reyes, Bernardo
AU - Kucka, Marek
AU - Chan, Frank Yingguang
AU - Bryk, Jarosław
N1 - Funding Information:
This work was supported by the University of Huddersfield, the Friedrich Miescher Laboratory of the Max Planck Society and the Microsoft Azure for Research Grant awarded to MLMC and JB. Handling Editor:
Publisher Copyright:
© 2022 The Authors. Molecular Ecology Resources published by John Wiley & Sons Ltd.
PY - 2022/6/27
Y1 - 2022/6/27
N2 - Highly multiplexed approaches have become common in genomic studies. They have improved the cost-effectiveness of genotyping hundreds of individuals using combinatorially barcoded adapters. These strategies, however, can potentially misassigned reads to incorrect samples. Here, we used a modified quaddRAD protocol to analyse the occurrence of index hopping and PCR chimeras in a series of experiments with up to 100 multiplexed samples per sequencing lane (639 samples in total). We created two types of sequencing libraries: four libraries of type A, where PCRs were run on individual samples before multiplexing, and three libraries of type B, where PCRs were run on pooled samples. We used fixed pairs of inner barcodes to identify chimeric reads. Type B libraries show a higher percentage of misassigned reads (1.15%) than type A libraries (0.65%). We also quantify the commonly undetectable chimeric sequences that occur whenever multiplexed groups of samples with different outer barcodes are sequenced together on a single flow cell. Our results suggest that these types of chimeric sequences represent up to 1.56% and 1.29% of reads in type A and B libraries, respectively. We also show that increasing the number of mismatches allowed for barcode rescue to above 2 dramatically increases the number of recovered chimeric reads. We provide recommendations for developing highly multiplexed RAD-seq protocols and analysing the resulting data to minimize the generation of chimeric sequences, allowing their quantification and a finer control on the number of PCR cycles necessary to generate enough input DNA for library preparation.
AB - Highly multiplexed approaches have become common in genomic studies. They have improved the cost-effectiveness of genotyping hundreds of individuals using combinatorially barcoded adapters. These strategies, however, can potentially misassigned reads to incorrect samples. Here, we used a modified quaddRAD protocol to analyse the occurrence of index hopping and PCR chimeras in a series of experiments with up to 100 multiplexed samples per sequencing lane (639 samples in total). We created two types of sequencing libraries: four libraries of type A, where PCRs were run on individual samples before multiplexing, and three libraries of type B, where PCRs were run on pooled samples. We used fixed pairs of inner barcodes to identify chimeric reads. Type B libraries show a higher percentage of misassigned reads (1.15%) than type A libraries (0.65%). We also quantify the commonly undetectable chimeric sequences that occur whenever multiplexed groups of samples with different outer barcodes are sequenced together on a single flow cell. Our results suggest that these types of chimeric sequences represent up to 1.56% and 1.29% of reads in type A and B libraries, respectively. We also show that increasing the number of mismatches allowed for barcode rescue to above 2 dramatically increases the number of recovered chimeric reads. We provide recommendations for developing highly multiplexed RAD-seq protocols and analysing the resulting data to minimize the generation of chimeric sequences, allowing their quantification and a finer control on the number of PCR cycles necessary to generate enough input DNA for library preparation.
KW - Chimeras
KW - RAD-seq
KW - quaddRAD
KW - index hopping
KW - read misassignment
KW - adapters
KW - barcodes
UR - http://www.scopus.com/inward/record.url?scp=85132726128&partnerID=8YFLogxK
U2 - 10.1111/1755-0998.13661
DO - 10.1111/1755-0998.13661
M3 - Article
JO - Molecular Ecology Resources
JF - Molecular Ecology Resources
SN - 1755-098X
ER -