Hey ML, what can you do for me?

IEEE AIKE 2020 - IEEE International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), Irvine, CA, USA, 2020 - Virtual

Abstract: Machine learning (ML) algorithms are data-driven and given a goal task and a prior experience dataset relevant to the task, one can attempt to solve the task using ML seeking to achieve high accuracy. There is usually a big gap in the understanding between an ML experts and the dataset providers due to limited expertise in cross disciplines. Narrowing down a suitable set of problems to solve using ML is possibly the most ambiguous yet important agenda for data providers to consider before initiating collaborations with ML experts. We proposed an ML-fueled pipeline to identify potential problems (i.e., the tasks) so data providers can, with ease, explore potential problem areas to investigate with ML. The autonomous pipeline integrates information theory and graph-based unsupervised learning paradigms in order to generate a ranked retrieval of top-$k$ problems for the given dataset for a successful ML based collaboration. We conducted experiments on diverse real-world and well-known datasets, and from a supervised learning standpoint, the proposed pipeline achieved $72\%$ top-$5$ task retrieval accuracy on an average, which surpasses the retrieval performance for the same paradigm using the popular exploratory data analysis tools. Detailed experiment results with our source code are available at: github.

Authors: Javier Pastorino, Ashis Kumer Biswas.

Paper link. - Presentation Link

TexAnASD: Text Analytics for ASD Risk Gene Predictions

IEEE BIBM 2019 - IEEE International Conference on Bioinformatics and Biomedicine (BIBM), San Diego, CA, USA, 2019, pp. 1350-1357

Abstract: Autism Spectrum Disorder (ASD) is an extreme neurodevelopmental disease affecting 1 in every 59 children in the United States, and approximately 1% of US population. The clinical traits of the disorder include noticeable deficits in social interactions, language development and in many cases very narrowed and repetitive interests and behaviors. ASD is highly heritable genetic disease, but the known causes including biomarkers causing it are forming only the tip of the iceberg. Over the past decade extensive research on exome sequences revealed only around around one hundred genes causing it with a very high confidence. Number of putative ASD causing genes is rapidly growing with the advent of new technologies while researchers are struggling now to assess which genes are true causing genes. Manual curation of each of the long list of genes is a cumbersome process that requires huge amount of expert work-hours, and is expensive. An in silico prediction method can assist the human experts to check only a short-list of genes which were filtered by a machine learning system. Most of existing prediction algorithm either involve high-performance computing platform to analyze large-scale genetic data which is counter-intuitive to the actual benefit of using an in silico method in the first place. We proposed TexAnASD, a text analytics based ASD gene prediction algorithm that utilized only what we know about each gene that we learn from published literature. The proposed method outperforms most of the state-of-the- art prediction systems. Moreover, the method builds the least complex model than all the others.

Authors: Javier Pastorino, Ashis Kumer Biswas.

Paper link


Methodological guide for accessible virtual curriculum developments implementation


Universidad de Alcalá. April 2013

Abstract: This methodology guide for implementing virtual curriculum developments accessible has been developed as part of ESVI-AL project. This guide is designed as a support tool for everyone involved in accessible virtual educational projects, primarily for teachers, but also for management staff, administration and technical institutions, seeking to implement inclusive virtual training activities, in which can participate on equal terms all students.

Authors: José Ramón Hilera, Regina Motz, Javier Pastorino,

ISBN: 978-84-15834-07-6

Transformations between temporal evolution models

Cacic 2002

Abstract: Temporal databases store information evolutions during the time. Such evolution may be classified in schema evolution and extension evolution. This allows classifying temporal information systems in four different types considering the capabilities to manipulate the temporal evolution dimensions. The key target of the present study is the definition of a model that allows sharing information stored in that kind of databases based on methodologies to convert data between the models. We conclude that this kind of transformation is possible, without losing semantics of the information or the evolution registered in the source system, making a transformation to a Bi-Temporal Evolution System from other model and using this one as an equivalent to the original model.

Authors: Javier Pastorino, Regina Mot

Available in Spanish only: