|Date:||2020, September 22|
|Title:||Automatic Chemical Compound Classification Based on Modern Deep Neural Networks|
In this thesis project, we aim to evaluate on how recent advances in deep learning methods (i.e. Bidirectional Encoder Representations from Transformers (BERT) and Tree-structured Recursive Neural Networks) can facilitate the work of manual annotations in chemical ontologies such as ChEBI. This thesis project is inspired by the resemblance between language models and chemical classification tasks. We use SMILES representation of chemical compounds as the input for our models. From our perspective, SMILES is a language with atoms and their bonds as the alphabet and a number of grammatical rules. This indicates that there is an insightful correspondence between the understanding of a compound by a chemist and the understanding of a term by a language speaker. To this end, we propose to formulate the problem of chemical classification as a multi-label classification task (similar to sentence tagging task in natural language processing).