The release of Boltz-2 – a new machine learning model from MIT and Recursion that cannot only predict the 3D structure of ligand/protein complexes but predict their binding affinity with new levels of speed and accuracy – was met with open arms by the research community.
Today, Recursion is open-sourcing its SynFlowNet-Boltz trainer, enabling researchers to reproduce and build on the generative screening results presented in the paper, and push the boundaries of what’s possible in structure-based hit discovery.
SynFlowNet was first introduced in 2024 by researchers at Valence Labs – Recursion’s AI research engine – as a way to address the problem of generative models producing novel molecules that are difficult, costly, or even impossible to synthesize in the real world. Using known chemical reactions and purchasable reactants, SynFlowNet provides constraints on suggested in silico molecules, guiding the development of realistic synthesis pathways, even for completely novel compounds.
Now, Valence is making it easier for scientists to combine SynFlowNet with Boltz-2 via a new repository release that could enable more efficient design of high-affinity binders needed for successful early-stage drug discovery.
Official implementation of SynFlowNet, a GFlowNet model with a synthesis action space.
Valence worked with the research team at MIT to develop the breakthrough Boltz-2 model, which is the first to combine structure and binding affinity prediction, and approaches the accuracy of physics-based free energy perturbation (FEP) calculations while being over 1,000 times faster and less computationally expensive. The open source tool (available under an MIT license) is already in use by hundreds of scientists.
In the related paper, the authors demonstrate how the model can be combined with SynFlowNet to find diverse, synthesizable, high-affinity binders, as estimated by absolute FEP simulations on the Tyrosine kinase 2 (TYK2) target.
They chose TYK2 – a protein that’s part of the Janus kinase (JAK) family and plays a critical role in regulating immune response and inflammation – because it was a target the Boltz-2 affinity model was known to perform well on.
They began by screening two commercially available compound libraries from Enamine – Hit Locator Library (HLL, 460,160 compounds) and Kinase Library (64,960 compounds). Boltz-2 successfully prioritized high-affinity ligands in both cases. Based on Absolute Binding Free Energy (ABFE) estimates, 8 of the top 10 compounds from HLL and all 10 compounds from the Kinase library were predicted to bind, while all 10 random compounds were predicted to be non-binders.
Next, they coupled Boltz-2 with SynFlowNet to explore a much more extensive chemical space – the more than 76 billion synthesizable chemicals in the Enamine REAL space – to identify novel compounds using a generative approach. Out of thousands of molecules generated by SynFlowNet, 10 were selected for ABFE evaluation – and all 10 were predicted to bind TYK2.
These experiments suggest that forward-synthesis generative models can venture much further in the chemical space to find novel and high-affinity compounds, while requiring substantially less computational budget than fixed-library traditional screens.
In further assessing the novelty of the SynFlowNet-generated compounds, they found that these generated compounds did not exhibit significant similarity to public TKY2 binders, while noting that the results may be somewhat optimistic given Boltz-2’s strong performance on this target.
Scientists looking to access SynFlowNet in combination with Boltz-2 can find more information and access the open source code here.
Author: Brita Belli, Senior Communications Manager, Recursion