Person:
Díaz Esteban, Alberto

Loading...
Profile Picture
First Name
Alberto
Last Name
Díaz Esteban
Affiliation
Universidad Complutense de Madrid
Faculty / Institute
Informática
Department
Ingeniería del Software e Inteligencia Artificial
Area
Lenguajes y Sistemas Informáticos
Identifiers
UCM identifierORCIDScopus Author IDDialnet IDGoogle Scholar ID

Search Results

Now showing 1 - 10 of 10
  • Item
    Automatic SignWriting Recognition: Combining Machine Learning and Expert Knowledge to Solve a Novel Problem
    (IEEE Access, 2023) García Sevilla, Antonio Fernando; Díaz Esteban, Alberto; Lahoz Bengoechea, José María
    Sign languages are viso-gestual languages, using space and movement to convey meaning. To be able to transcribe them, SignWriting uses an iconic system of symbols meaningfully arranged in the page. This two-dimensional system, however, is very different to traditional writing systems, so its automatic processing poses a novel challenge for computational linguistics. In this article, we present a novel problem for the state of the art in artificial intelligence: automatic SignWriting recognition. We examine the problem, model the underlying data domain, and present a first solution in the form of an expert system that exploits the domain knowledge encoded in the data modelization. This system uses an adaptable pipeline of neural networks and deterministic processing, overcoming the challenges posed by the novelty and originality of the problem. Thanks to our data modelization, it improves the accuracy compared to a straight-forward deep learning approach by 17%. All of our data and code are publicly available, and our approach may be useful not only for SignWriting processing but also for other similar graphical data.
  • Item
    Enhancing Extraction of Drug-Drug Interaction from Literature Using Neutral Candidates, Negation, and Clause Dependency
    (PLoS ONE, 2016) Behrouz Bokharaeian; Hamidreza Chitsaz; Díaz Esteban, Alberto; Francisco M Couto
    Motivation Supervised biomedical relation extraction plays an important role in biomedical natural language processing, endeavoring to obtain the relations between biomedical entities. Drug-drug interactions, which are investigated in the present paper, are notably among the critical biomedical relations. Thus far many methods have been developed with the aim of extracting DDI relations. However, unfortunately there has been a scarcity of comprehensive studies on the effects of negation, complex sentences, clause dependency, and neutral candidates in the course of DDI extraction from biomedical articles. Results Our study proposes clause dependency features and a number of features for identifying neutral candidates as well as negation cues and scopes. Furthermore, our experiments indicate that the proposed features significantly improve the performance of the relation extraction task combined with other kernel methods. We characterize the contribution of each category of features and finally conclude that neutral candidate features have the most prominent role among all of the three categories.
  • Item
    Building the VisSE Corpus of Spanish SignWriting
    (Language Resources and Evaluation, ) Díaz Esteban, Alberto; García Sevilla, Antonio Fernando; Lahoz Bengoechea, José María
    SignWriting is a system for transcribing sign languages, using iconic depictions of the hands and other body parts, as well as exploiting the possibilities of the page as a two dimensional medium to capture the three-dimensional nature of signs. This goes beyond the usual line-oriented nature of oral writing systems, and thus requires a different approach to its processing. In this article we present a corpus of handwritten SignWriting, a collection of images which transcribe signs from Spanish Sign Language. We explain the annotation schema we have devised, and the decisions which have been necessary to deal with the challenges that both sign language and SignWriting present. These challenges include the transformational nature of symbols in SignWriting, which can rotate and otherwise transform to convey meaning, as well as how to properly codify location, a fundamental part of SignWriting which is completely different to oral writing systems. The data in the corpus is fully annotated, and can serve as a tool for computational training and evaluation of algorithms, as well as provide a window into the nature of SignWriting and the distribution of its features across a real vocabulary. The corpus is freely available online at https://zenodo.org/record/6337885.
  • Item
    Integración de técnicas de clasificación de texto y modelado de usuario para la personalización en servicios de noticias
    (2006) Díaz Esteban, Alberto; Gervás Gómez-Navarro, Pablo; Buenaga Rodríguez, Manuel de
    En los últimos años, la información disponible en formato electrónico se ha incrementado de tal manera que es muy difícil no verse saturado cuando uno intenta encontrar la información que realmente le interesa. Los contenidos Web aparecen de muy diversas maneras en distintos dominios de aplicación pero en la mayoría de ellos la forma de presentación de la información es la misma para todos los usuarios, es decir, esos contenidos son estáticos en el sentido de que no se adaptan a cada usuario desde dos puntos de vista: ni son presentados de manera diferente para cada usuario ni se adaptan a los cambios en los intereses del usuario a lo largo del tiempo. La personalización de contenidos Web trata de eliminar la sobrecarga de información mediante la adaptación de los contenidos a cada tipo de usuario y a lo largo del tiempo. En esta tesis se muestra un enfoque integrado de personalización de contenidos Web, aplicado a servicios de noticias, basado en tres funcionalidades principales: selección de contenidos, adaptación del modelo de usuario y presentación de resultados. Todos estos procesos están basados en la representación de los intereses del usuario que estarán reflejadas en un perfil o modelo de usuario. La selección de contenidos se refiere a la elección entre todos los documentos de entrada de aquellos más interesantes para un usuario dado. La adaptación del modelo de usuario es necesaria ya que las necesidades de los usuarios cambian a lo largo del tiempo, sobre todo como resultado de su interacción con la información que reciben. La presentación de resultados consiste en, una vez seleccionados los elementos de información que más le interesan a un usuario, mostrar un documento resultado que contenga, para cada elemento seleccionado, un extracto que sea indicativo de su contenido. En particular, se ha generado un resumen personalizado por cada elemento de información seleccionado para cada usuario. El modelo de usuario utilizado integra cuatro tipos de sistemas de referencia que permiten representar los intereses de los usuarios desde diferentes puntos de vista. Estos intereses están divididos en dos tipos: intereses a largo plazo e intereses a corto plazo. Los primeros representan intereses del usuario que permanecen constantes a lo largo del tiempo, mientras que los segundos representan los intereses que se van modificando. A su vez, el modelo a largo plazo utiliza tres métodos de clasificación que permiten al usuario definir sus necesidades de información desde 3 puntos de vista diferentes: un sistema de clasificación dependiente del dominio, donde los documentos están preclasificados por el autor del documento (p.ej.: secciones en un periódico), un sistema de clasificación independiente del dominio, obtenido a partir de las categorías del primer nivel de Yahoo! España y un conjunto de palabras clave. Los distintos procesos de personalización se basan en técnicas estadísticas de clasificación de texto que se aplican tanto a los documentos como a los modelos de usuario. Las tareas de clasificación de texto que se utilizan están relacionadas con la recuperación de información, la categorización de textos, la realimentación y la generación de resúmenes. La evaluación de los sistemas de personalización es especialmente compleja debido a que son necesarias las opiniones de distintos usuarios para poder obtener conclusiones relevantes sobre su funcionamiento. Para evaluar los distintos procesos de personalización se han generado varias colecciones de evaluación donde se almacenan los juicios de relevancia de varios usuarios durante varios días de utilización del sistema. Estas colecciones han permitido probar los distintos enfoques propuestos para determinar cuál de ellos era la mejor elección. Además estas colecciones pueden ser utilizadas posteriormente por otros investigadores para comparar los resultados de sus técnicas de personalización. Las evaluaciones realizadas han mostrado que la propuesta de personalización basada en la combinación de modelos de usuario a largo y corto plazo, con resúmenes personalizados como forma de presentar los resultados finales, permite disminuir la sobrecarga de información de los usuarios, independientemente del dominio y del idioma, en un sistema de personalización de contenidos Web aplicado a servicios de noticias. ABSTRACT In the last years, the electronic information available has increased in such way that it is very difficult not to feel the overload when one try to find the information in which is really interested. Web content appears in many forms over different domains of application, but in most cases the form of presentation is the same for all users. The contents are static in the sense that they are not adapted to each user from two points of view: they are neither presented in a different way from each user nor capable of adapting to the interest changes of the users. Content personalization is a technique that tries to avoid information overload through the adaptation of web contents to each type of user and to the interest changes of the users. In this thesis an integrated approach of Web content personalization applied to news services is shown. This approach is based on three main functionalities: content selection, user model adaptation and results presentation. For these functionalities to be carried out in a personalized manner, they must be based on information related to the user that must be reflected in his user profile or user model. Content selection refers to the choice of the particular subset of all available documents that will be more relevant for a given user. User model adaptation is necessary because user needs change over time, especially as result of his interaction with information. Results presentation involves generating a new result web document that contains, for each selected item, an extract that is indicative of its content. In particular, a personalized summary for each selected item for each user has been generated. The user model integrates four types of reference systems that allow a representation of the interests of the users from different points of view. These interests are divided into two types: long term interests and short term interests. The first type represents interests of the user that remain constant over time, and the second represents the interests that are modified. The long term model uses three classification methods that allow the user to define his information needs from three different points of view: a domain dependent classification system, where the documents are pre-classified by the document author (e.g.: sections in a newspaper), an independent domain classification system, obtained of the first level categories of Yahoo! Spain, and a set of keywords. The different personalized processes are based on statistic classification text techniques that are applied as to the documents and to the user models. The text classification tasks that are used are related with information retrieval, text categorization, relevance feedback and text summarization. The evaluation of personalized systems is especially complex because the opinions of different users are necessary to be able to obtain relevant conclusions about system performance. To evaluate the different personalization processes some evaluation collections have been generated where the relevance judges of various users over various days are stored. These collections have made it possible to try different approaches to determine which are the best choices for this purpose. Moreover other investigators can use these collections to compare the results of their personalization techniques. The evaluations have showed that the personalization approach based on the combination of long term and short term models, with personalized summaries as way to present the final results, achieves a certain reduction of the information overload of the users, independently of the domain and the language, in a Web content personalization system applied to news services.
  • Item
    Conceptual Representations for Computational Concept Creation
    (ACM Computing Surveys, 2019) Ping Xiao; Hannu Toivonen; Oskar Gross; Amílcar Cardoso; João Correia; Penousal Machado; Pedro Martins; Hugo Goncalo Oliveira; Rahul Sharma; Alexandre Miguel Pinto; Carlos León; Jamie Forth; Matthew Purver; Geraint A. Wiggins; Dragana Miljković; Vid Podpečan; Senja Pollak; Jan Kralj; Martin Žnidaršič; Marko Bohanec; Nada Lavrač; Tanja Urbančič; Frank Van Der Velde; Stuart Battersby; Díaz Esteban, Alberto; Francisco Gilmartín, Virginia; Gervás Gómez-Navarro, Pablo; Hervás Ballesteros, Raquel
    Computational creativity seeks to understand computational mechanisms that can be characterized as creative. The creation of new concepts is a central challenge for any creative system. In this article, we outline different approaches to computational concept creation and then review conceptual representations relevant to concept creation, and therefore to computational creativity. The conceptual representations are organized in accordance with two important perspectives on the distinctions between them. One distinction is between symbolic, spatial and connectionist representations. The other is between descriptive and procedural representations. Additionally, conceptual representations used in particular creative domains, such as language, music, image and emotion, are reviewed separately. For every representation reviewed, we cover the inference it affords, the computational means of building it, and its application in concept creation.
  • Item
    Project number: 294
    Sensibilización y formación en la accesibilidad e inclusión de las personas con discapacidad visual al proceso de Enseñanza-Aprendizaje. SENSIVISUAL-UCM
    (2021) Guijarro Mata-García, María; Bartolomé Barolomé, Gema; Bautista Villar, Jesús; Bermúdez Cabra, Antonio; Cabrero Martín, Nestor; Carreño Gea, Pablo; Carrera García, Juan Manuel; Casas Torres, Laura; García Sevilla, Antonio Fernando; González Montero, María Guadalupe; Gutiérrez Hernández, Ángel Luis; Hernando Hernández, David; Manero Iglesias, José Borja; Martín Pérez, Yolanda; Recas Piorno, Joaquín; Muñoz Carenas, Jaime; Santos Peñas, Matilde; Díaz Esteban, Alberto
    El objetivo general de este proyecto viene definido por la necesidad de inclusión de las personas con discapacidad visual, parcial o absoluta, en el mundo académico, así como la de favorecer su incorporación al mundo laboral con unas condiciones formalizadas y estables. A través de las acciones realizadas en este proyecto de innovación y mejora de la calidad docente se podrá mejorar la accesibilidad en los diferentes Grados de la Universidad Complutense de Madrid, ayudando en la generación de material didáctico y composición de grupos de trabajo que fomenten el trabajo colaborativo permitiendo el re-fuerzo académico.
  • Item
    SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature
    (Journal of Biomedical Semantics, 2017) Behrouz Bokharaeian; Nasrin Taghizadeh; Hamidreza Chitsaz; Ramyar Chavoshinejad; Díaz Esteban, Alberto
    Background Single Nucleotide Polymorphisms (SNPs) are among the most important types of genetic variations influencing common diseases and phenotypes. Recently, some corpora and methods have been developed with the purpose of extracting mutations and diseases from texts. However, there is no available corpus, for extracting associations from texts, that is annotated with linguistic-based negation, modality markers, neutral candidates, and confidence level of associations. Method In this research, different steps were presented so as to produce the SNPPhenA corpus. They include automatic Named Entity Recognition (NER) followed by the manual annotation of SNP and phenotype names, annotation of the SNP-phenotype associations and their level of confidence, as well as modality markers. Moreover, the produced corpus was annotated with negation scopes and cues as well as neutral candidates that play crucial role as far as negation and the modality phenomenon in relation to extraction tasks. Result The agreement between annotators was measured by Cohen’s Kappa coefficient where the resulting scores indicated the reliability of the corpus. The Kappa score was 0.79 for annotating the associations and 0.80 for the confidence degree of associations. Further presented were the basic statistics of the annotated features of the corpus in addition to the results of our first experiments related to the extraction of ranked SNP-Phenotype associations. The prepared guideline documents render the corpus more convenient and facile to use. The corpus, guidelines and inter-annotator agreement analysis are available on the website of the corpus: http://nil.fdi.ucm.es/?q=node/639. Conclusion Specifying the confidence degree of SNP-phenotype associations from articles helps identify the strength of associations that could in turn assist genomics scientists in determining phenotypic plasticity and the importance of environmental factors. What is more, our first experiments with the corpus show that linguistic-based confidence alongside other non-linguistic features can be utilized in order to estimate the strength of the observed SNP-phenotype associations.
  • Item
    Multilingual extension and evaluation of a poetry generator
    (Natural Language Engineering, 2017) Oliveira, Hugo Gonçalo; Díaz Esteban, Alberto; Gervás Gómez-Navarro, Pablo; Hervás Ballesteros, Raquel
    Poetry generation is a specific kind of natural language generation where several sources of knowledge are typically exploited to handle features on different levels, such as syntax, semantics, form or aesthetics. But although this task has been addressed by several researchers, and targeted different languages, all known systems have focused on a limited purpose and a single language. This article describes the effort of adapting the same architecture to generate poetry in three different languages – Portuguese, Spanish and English. An existing architecture is first described and complemented with the adaptations required for each language, including the linguistic resources used for handling morphology, syntax, semantics and metric scansion. An automatic evaluation was designed in such a way that it would be applicable to the target languages. It covered three relevant aspects of the generated poems, namely: the presence of poetic features, the variation of the linguistic structure and the semantic connection to a given topic. The automatic measures applied for the second and third aspect can be seen as novel in the evaluation of poetry. Overall, poems were successfully generated in the three languages addressed. Despite minor differences in different languages or seed words, poems revealed to have a regular metre, frequent rhymes, to exhibit an interesting degree of variation, and to be semantically-associated with the initially given seeds.
  • Item
    Automatic extraction of ranked SNP-phenotype associations from text using a BERT-LSTM-based method
    (BMC bioinformatics, 2023) Bokharaeian, Behrouz; Dehghani, Mohammad; Díaz Esteban, Alberto
    Extraction of associations of singular nucleotide polymorphism (SNP) and phenotypes from biomedical literature is a vital task in BioNLP. Recently, some methods have been developed to extract mutation-diseases affiliations. However, no accessible method of extracting associations of SNP-phenotype from content considers their degree of certainty. In this paper, several machine learning methods were developed to extract ranked SNP-phenotype associations from biomedical abstracts and then were compared to each other. In addition, shallow machine learning methods, including random forest, logistic regression, and decision tree and two kernel-based methods like subtree and local context, a rule-based and a deep CNN-LSTM-based and two BERT-based methods were developed in this study to extract associations. Furthermore, the experiments indicated that although the used linguist features could be employed to implement a superior association extraction method outperforming the kernel-based counterparts, the used deep learning and BERT-based methods exhibited the best performance. However, the used PubMedBERT-LSTM outperformed the other developed methods among the used methods. Moreover, similar experiments were conducted to estimate the degree of certainty of the extracted association, which can be used to assess the strength of the reported association. The experiments revealed that our proposed PubMedBERT–CNN-LSTM method outperformed the sophisticated methods on the task.
  • Item
    Evolución de un espacio de trabajo multidisciplinar para el aprendizaje de la programación basado en casos prácticos: de los repositorios a los cursos adaptativos en el Campus Virtual de la UCM
    (V Jornada Campus Virtual UCM: Buenas prácticas e indicios de calidad, 2009) Gómez Albarrán, Mercedes; Jiménez Díaz, Guillermo; López Fernández, Marta; Gómez Martín, Marco Antonio; Hernández Yáñez, Luis Antonio; Ruiz Iniesta, Almudena; Díaz Esteban, Alberto
    La adaptación al Espacio Europeo de Educación Superior supone una reconsideración de las formas de enseñar y aprender en la universidad. Para el profesorado universitario entraña un auténtico desafío didáctico el énfasis en el autoaprendizaje de los alumnos y, como consecuencia, en la utilización de las Tecnologías de la Información y las Comunicaciones en la docencia. El Campus Virtual es el entorno en el que se ha implementado una Virtualización de Casos Prácticos para facilitar el aprendizaje activo de la materia «Introducción a la programación» en la Universidad Complutense de Madrid. El carácter multidisciplinar de la colección de casos prácticos, útil en diversas titulaciones de la citada universidad, así como la existencia de un equipo docente numeroso detrás de ella, ha favorecido la mejora continua del Espacio Temático de Trabajo Común donde reside esta virtualización, desde un conjunto inicial de material práctico a la actual formalización de la secuencia de aprendizaje. En este artículo se presenta la evolución de la virtualización de Casos Prácticos desde sus inicios, que supone una visión histórico-comparativa.