Faculty of Computer Science

Research Group Theoretical Computer Science


Oberseminar: Heterogene formale Methoden


Date: 2020, November 10
Time: 09:00 a. m.
Place: Online
Author: Hastings, Janna and Glauer, Martin
Title: Learning Chemistry: Structure-Based Chemical Ontology Classification Using Machine Learning

Abstract:

Chemical data is increasingly openly available in databases such as PubChem, which contains more than 110 million compound entries as of October 2020. With the availability of data at such scale, the burden has shifted to organisation, analysis and interpretation. Chemical ontologies provide structured classifications of chemical entities that can be used for navigation and filtering of the large chemical space. ChEBI is a prominent example of a chemical ontology, widely used in life science contexts. However, ChEBI is manually maintained and as such cannot easily scale to the full scope of public chemical data. There is a need for tools that are able to automatically classify chemical data into chemical ontologies, which can be framed as a hierarchical multi-class classification problem.

In this presentation we report on the results of our recent work evaluating machine learning approaches for this task, comparing different learning frameworks including logistic regression, decision trees and LSTMs, and different encoding approaches for the chemical structures, including cheminformatics fingerprints and character-based encoding from chemical line notation representations. We also explore how `neuro-symbolic' approaches for logical axiom encodings could support this task, which we will test in the future.


Back to the Oberseminar web page
Webmaster