Technique can analyse a million protein sequences at a time to provide data for machine learning models
A new high-throughput technique can analyse the folding stabilities of nearly one million protein sequences at a time. The unprecedented method – which is fast, accurate and scalable – promises to help understand how amino acid sequences fold into three dimensional conformations, while providing the data needed to improve machine learning models.
The propensity for proteins to fold spontaneously is governed by their hidden and subtle energetics – such as hydrogen bonding and hydrophobic effects – that are unique to any given amino acid sequence that makes up a protein. Since even a single mutation in a protein’s sequence can affect folding, measuring protein folding stability is important for understanding disease, as well as drug development and protein design.
However, for decades it has only been possible to measure protein folding stability in a few proteins at a time. While thousands of measurements have been gathered, there’s still not enough data for machine learning to begin predicting and unravelling the hidden thermodynamics of folding stability.