Exploring the Dark Proteome: The Next Frontier in Drug Discovery

Written By:
No items found.
Read the post ›

Written by Michael Palovich.

The role of a given protein(s) to a specific disease or indication is a fundamental aspect of drug discovery. Not to be underestimated is the ability to modulate the protein’s function with a suitable agent (i.e., small molecule, biologic, etc.). While there are some proteins for which the 3D structure has been achieved or largely understood, there are many proteins for which this knowledge has yet to be identified or characterized. These proteins are often referred to as low data proteins or the dark proteome; it is estimated that 93% of the protein targets in the human body are low data and remain undrugged, meaning that industry has struggled to identify novel chemical probes for these targets.

So, what lies beneath this dark proteome and what potential new medicines could be developed by exploring this unexplored area? To answer these questions, Cyclica is using our proprietary platform, MatchMaker™, which models potential protein ligand interactions. This AI-enabled machine learning engine uses AlphaFold2 structures and homology models to predict the polypharmacology of small molecules across the proteome. Here, we provide some (of many) of Cyclica’s successful discoveries in the low data space, which may lead to the development of breakthrough medicines and treatment options for patients living with aggressive types of cancer.


As part of the Target2035 initiative, Cyclica’s drug discovery team collaborated with the Structural Genomics Consortium (SGC) in 2021 to explore DCAF1, a low data target, with the goal of finding a novel DCAF1 ligand that may ultimately support targeted therapeutics discovery.  DCAF1 has been identified as having an essential role in cellular processes and the development of certain cancer indications. As a key component in proteasomal degradation, DCAF1 is a member of the WD40 repeat (WDR) family and part of the e3 ligase complex. After testing approximately 100 compounds predicted by MatchMaker™, we found the first disclosed co-crystallized DCAF1 structure and a novel ligand to DCAF1, opening the doors for the development of a targeted oncology therapeutic.

Mutated oncoproteins

In an effort to identify novel drugs for difficult-to-treat cancers, Cyclica teamed up with Perturba Therapeutics, a spin-out company from the Stagljar lab at the University of Toronto, Donnelly Centre for Cellular and Biomolecular Research. The integration of Cyclica’s platform, MatchMaker™, and two of Perturba’s live cell-based assays (i.e., MaMTH-Drug Screening and SIMPL) resulted in the identification of a selective inhibitor of the epidermal growth factor receptor (EGFR) triple mutant protein. This low data target discovery brings potential to benefit patients with non-small cell lung cancer. In addition to this, we have been able to identify molecules that are active at the mutated protein versus the wild type for specific KRAS and EGFR mutations.  To expand on these successes, we have initiated efforts to identify novel chemical matter for eight other mutated oncoproteins with potential therapeutic benefit for additional types of cancer.

Classes of low data protein targets

In addition to the examples provided, to date we have successfully used MatchMaker™ to identify novel chemical series for roughly a dozen low data protein targets spanning a variety of different protein classes.  A few examples include a DNA polymerase, a member of the solute carrier protein and the kinesin protein families. Typically, we are able to identify 2-3 hit molecules from testing ~100 compounds or less; making this process highly scalable. We also have plans to greatly expand the scope of the work done to date by tactically selecting sets of low data proteins (currently 1000’s based on our assessment process) and executing screening to find novel chemical matter that can be progressed into lead discovery and lead optimization activities.  

Future outlook

By exploring the dark proteome, there are exciting opportunities to generate data, create new chemical series starting points, and diversify our portfolio of drug programs. Cyclica’s low data successes validate our technology and exemplify our scientific differentiation. Unlike the current industry standard, we are not limited to high data targets and have the capabilities to explore the undrugged protein universe, bringing novel medicines to patients. Indeed, once a novel small molecule has been identified, we recognize that development candidates will need to undergo clinical validation (i.e., assessing the drug’s therapeutic safety and efficacy). Overall, our platform and ability to identify low data targets offer significant potential to transform the preclinical stages of drug discovery, leading to faster development of medications for patients in need.

This blog was originally published on January 25, 2023 by Cyclica prior to being acquired by Recursion.