2 Comments

ESM3 is great! Before ESM3, SaProt has proposed combing AA+ discrete structural token for pre-training and shows that this helps scale to larger dataset.

SaProt: Protein Language Modeling with Structure-aware Vocabulary

Expand full comment

Do you have any hypothesis on why a simple BERT-like structure for ESM performs way better than others? It sounds too simple almost

Expand full comment