RT Journal Article T1 Predicting haplogroups using a versatile machine learning program (PredYMaLe) on a new mutationally balanced 32 Y-STR multiplex (CombYplex): unlocking the full potential of the human STR mutation rate spectrum to estimate forensic parameters A1 Bouakaze, Caroline A1 Delehelle, Franklin A1 Saenz-Oyhéréguy, Nancy A1 Moreira, Andreia A1 Schiavinato, Stéphanie A1 Croze, Myriam A1 Delon, Solène A1 Fortes-Lima, César A1 Gibert, Morgane A1 Bujan, Louis A1 Huyghe, Eric A1 Bellis, Gil A1 Calderón Fernández, María Del Rosario A1 Hernández, Candela A1 Avendaño-Tamayo, Efren A1 Bedoya, Gabriel A1 Salas, Antonio A1 Mazières, Stéphane A1 Chiaroni, Jacques A1 Migot-Nabias, Florence A1 Ruiz-Linares, Andres A1 Dugoujon, Jean-Michel A1 Théves, Catherine A1 Mollereau-Manaute, Catherine A1 Nôus, Camille A1 Poulet, Nicolas A1 King, Turi A1 D'Amato, Maria Eugenia A1 Balaresque, Patricia AB We developed a new mutationally well-balanced 32 Y-STR multiplex (CombYplex) together with a machine learning (ML) program PredYMaLe to assess the impact of STR mutability on haplogourp prediction, while respecting forensic community criteria (high DC/HD). We designed CombYplex around two sub-panels M1 and M2 characterized by average and high-mutation STR panels. Using these two sub-panels, we tested how our program PredYmale reacts to mutability when considering basal branches and, moving down, terminal branches. We tested first the discrimination capacity of CombYplex on 996 human samples using various forensic and statistical parameters and showed that its resolution is sufficient to separate haplogroup classes. In parallel, PredYMaLe was designed and used to test whether a ML approach can predict haplogroup classes fromY-STR profiles. Applied to our kit, SVM and Random Forest classifiers perform very well (average 97%), better than Neural Network (average 91%) and Bayesian methods (<90%). We observe heterogeneity in haplogroup assignation accuracy among classes, with most haplogroups having high prediction scores (99-100%) and two (E1b1b and G) having lower scores (67%). The small sample sizes of these classes explain the high tendency to misclassify the Y-profiles of these haplogroups; results were measurably improved as soon as more training data were added. We provide evidence that our ML approach is a robust method to accurately predict haplogroups when it is combined with a sufficient number of markers, well-balanced mutation rate Y-STR panels, and large ML training sets. Further research on confounding factors (such as gene conversion) and ideal STR panels in regard to the branches analysed can be developed to help classifiers further optimize prediction scores. PB Science Direct SN 1872-4973 YR 2020 FD 2020 LK https://hdl.handle.net/20.500.14352/98074 UL https://hdl.handle.net/20.500.14352/98074 LA eng NO Bouakaze C, Delehelle F, Saenz-Oyhéréguy N, Moreira A, Schiavinato S, Croze M, Delon S, Fortes-Lima C, Gibert M, Bujan L, Huyghe E, Bellis G, Calderon R, Hernández CL, Avendaño-Tamayo E, Bedoya G, Salas A, Mazières S, Charioni J, Migot-Nabias F, Ruiz-Linares A, Dugoujon JM, Thèves C, Mollereau-Manaute C, Noûs C, Poulet N, King T, D'Amato ME, Balaresque P. Predicting haplogroups using a versatile machine learning program (PredYMaLe) on a new mutationally balanced 32 Y-STR multiplex (CombYplex): Unlocking the full potential of the human STR mutation rate spectrum to estimate forensic parameters. Forensic Sci Int Genet. 2020 Sep;48:102342. NO University Toulouse III NO Ministerio de Economía y Competitividad (España) NO Observatory Man-Environment Haut-Vicdessos (France) NO National Research Foundation DS Docta Complutense RD 8 abr 2025