Pure sequence learning is out, multiscale data is in
ESM3 is great! Before ESM3, SaProt has proposed combing AA+ discrete structural token for pre-training and shows that this helps scale to larger dataset.
SaProt: Protein Language Modeling with Structure-aware Vocabulary
Do you have any hypothesis on why a simple BERT-like structure for ESM performs way better than others? It sounds too simple almost
ESM3 is great! Before ESM3, SaProt has proposed combing AA+ discrete structural token for pre-training and shows that this helps scale to larger dataset.
SaProt: Protein Language Modeling with Structure-aware Vocabulary
Do you have any hypothesis on why a simple BERT-like structure for ESM performs way better than others? It sounds too simple almost