Turning drug discovery
into a search problem

Central to our mission is the Recursion Operating System (OS), a platform powered by one of the world’s largest proprietary biological and chemical datasets. Instead of looking narrowly at a handful of diseases with existing therapeutic hypotheses, we build Maps of Biology and Chemistry that broaden our search and allow us to explore unknown areas of disease biology.

Virtuous cycles of atoms and bits

Central to our mission is the Recursion Operating System (OS), a platform powered by one of the world’s largest proprietary biological and chemical datasets. Instead of looking narrowly at a handful of diseases with existing therapeutic hypotheses, we build Maps of Biology and Chemistry that broaden our search and allow us to explore unknown areas of disease biology.

Better data = better predictions

In machine learning research, the quality of the dataset on which models are trained is critical to ensuring the accuracy of the model’s predictions. Our highly automated wet laboratories control our data generation in-house, where we conduct millions of experiments across every human gene and our library of chemical compounds to generate our multi-layered dataset for mapping. This has resulted in more than 50 petabytes of high-quality data – one of the world’s largest proprietary biological and chemical datasets.

Our data generation strategy follows these 3 principles

Scalability

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Reliability

Reliable and accurate data is essential to reproducibility. We use highly controlled and standardized protocols while correcting for any variability in the technical execution of experiments to generate quality data.

Read the full story

Relatability

We build connected datasets, enabling comparisons across time and experimental methods. That way, the data we generate tomorrow can be related to data generated five years ago.

Explore our datasets & models

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Our data generation strategy follows these 3 principles

Scalability

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Reliability

Reliable and accurate data is essential to reproducibility. We use highly controlled and standardized protocols while correcting for any variability in the technical execution of experiments to generate quality data.

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Read the full story

Relatability

We build connected datasets, enabling comparisons across time and experimental methods. That way, the data we generate tomorrow can be related to data generated five years ago.

No static dataset will ever be sufficient to decode the vast space of biology. Our dataset is designed to expand over time as we test and validate predictions experimentally.

Explore our datasets & models

Imaging is our bread and butter, but we are so much more

We are pioneers of phenomics, the analysis of high-dimensional data from microscopy images of human cells. Images are rich with data, yet relatively cheap to capture and analyze at scale. AI turns unstructured images into computable data, creating biologically meaningful representations of cells that can be compared and contrasted to understand relationships across genes, compounds, and other perturbations. These relationships form the basis of our Maps of Biology and Chemistry.


Over the years, we’ve expanded our data generation to incorporate additional modalities that, when combined, allow us to gather a holistic picture of causal biological relationships. Recently, we unveiled LOWE, our LLM-based software capable of performing complex drug discovery tasks by orchestrating both the wet-lab and dry-lab components of Recursion OS.

Partner with us