Faculty of Computer Science

Research Group Theoretical Computer Science


Oberseminar: Heterogene formale Methoden


Date: 2021, January 5
Time: 09:00 a. m.
Place: Online
Author: Memariani, Adel
Title: Neural Classification of Molecules

Abstract:

In this seminar, we will talk about the idea of transfer learning for the task of chemical classification. Transfer learning is a machine learning approach where we apply the gained knowledge from one problem to a different but related problem. In particular, we are focused on the chemical compounds within the ChEBI ontology and their classes. We view this problem as a multi-label classification task to assign a set of classes to the molecules. The molecules are represented as SMILES strings and we have employed some ideas from language models such as Bidirectional Encoder Representations from Transformers (BERT) for our classification problem. After training on a dataset with around 21K molecules, we then tested the performance of our models on a test dataset with about 6k molecules. Similar to any other language models, we used some strategies to tokenize our inputs. We experimented with two tokenization methods called Byte-level (BPE) and WordPiece tokenization. For training, we first pre-trained our models in an unsupervised manner with the idea of Masked Language Modeling (MLM), then as a fine-tuning step, we included our labels vector and trained our previous models for a multilabel classification problem (supervised learning).


Back to the Oberseminar web page
Webmaster