State-aware protein-ligand complex prediction using AlphaFold3 with purified sequences
Abstract
Deep learning-based prediction of protein-ligand complexes has advanced significantly with the development of architectures such as AlphaFold3, Boltz-1, Chai-1, Protenix, and NeuralPlexer. Multiple sequence alignment (MSA) has been a key input, providing coevolutionary information critical for structural inference. However, recent benchmarks reveal a major limitation: these models often memorize ligand poses from training data and perform poorly on novel chemotypes or dynamic binding events involving substantial conformational changes in binding pockets. To overcome this, we introduced a state-aware protein-ligand prediction strategy leveraging purified sequence subsets generated by AF-ClaSeq - a method previously developed by our group. AF-ClaSeq isolates coevolutionary signals and selects sequences that preferentially encode distinct structural states as predicted by AlphaFold2. By applying MSA-derived conformational restraints, we observed significant improvements in predicting ligand poses. In cases where AlphaFold3 previously failed-producing incorrect ligand placements and associated protein conformations-we were able to correct the predictions by using sequence subsets corresponding to the relevant functional state, such as the inactive form of an enzyme bound to a negative allosteric modulator. We believe this approach represents a powerful and generalizable strategy for improving protein-ligand complex predictions, with potential applications across a broad range of molecular modeling tasks.