RT Journal Article T1 Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based method A1 Bokharaeian, Behrouz A1 Dehghani, Mohammad A1 Díaz Esteban, Alberto AB Extraction of associations of singular nucleotide polymorphism (SNP) and phenotypes from biomedical literature is a vital task in BioNLP. Recently, some methods have been developed to extract mutation-diseases affiliations. However, no accessible method of extracting associations of SNP-phenotype from content considers their degree of certainty. In this paper, several machine learning methods were developed to extract ranked SNP-phenotype associations from biomedical abstracts and then were compared to each other. In addition, shallow machine learning methods, including random forest, logistic regression, and decision tree and two kernel-based methods like subtree and local context, a rule-based and a deep CNN-LSTM-based and two BERT-based methods were developed in this study to extract associations. Furthermore, the experiments indicated that although the used linguist features could be employed to implement a superior association extraction method outperforming the kernel-based counterparts, the used deep learning and BERT-based methods exhibited the best performance. However, the used PubMedBERT-LSTM outperformed the other developed methods among the used methods. Moreover, similar experiments were conducted to estimate the degree of certainty of the extracted association, which can be used to assess the strength of the reported association. The experiments revealed that our proposed PubMedBERT–CNN-LSTM method outperformed the sophisticated methods on the task. PB Springer Nature SN 1471-2105 YR 2023 FD 2023-04-12 LK https://hdl.handle.net/20.500.14352/92387 UL https://hdl.handle.net/20.500.14352/92387 LA eng DS Docta Complutense RD 24 abr 2025