Generative design for de novo drugs

5 min readApr 21, 2021

Introduction

Recent years have seen a surge in the use of computation to design de novo drugs. In most instances, the computation has consisted entirely of screening virtual chemical libraries with predictive models to identify potent compounds. This virtual screening approach towards drug discovery has several shortcomings:

The chemical space that is explored is limited.
The molecules that are selected have undesirable structural properties.
Lead optimisation must be done manually.

In this article we examine each of these shortcomings, before introducing a new paradigm in computational drug design called generative design.

The chemical space that is explored is restricted to the size of the initial library.

According to (Walters, 2019), the number of drug-like molecules is estimated to be between 10³³ and 10⁶⁰, whereas the sizes of the largest chemical libraries used in virtual screening are of the order of 10¹². This means that the fraction of the total chemical space that is explored in even the largest virtual screen is no bigger than 0.0000000000000000001%, a negligible slice.

The likelihood of finding the best drug molecules with virtual screening rests on the extremely unlikely scenario that the most active chemical scaffold is present in this infinitesimal slice of chemical space.

The molecules that are selected often possess undesirable structural characteristics.

In (Stokes et al., 2020), a team of researchers from Massachusetts Institute of Technology (MIT) used a graph convolutional network called Chemprop to screen a library of 107 million compounds and presented halicin as the molecule most likely to inhibit the growth of E. coli. The team also presented two other molecules. Figure 1 shows the structures of the three molecules:

Figure 1: Molecules recommended by Chemprop

Medicinal chemists inspecting these structures would raise a number of objections, some of which are summarised in Table 1:

Table 1: Some properties of the Chemprop molecules

As the table shows, Chemprop’s molecules violate PAINS filters, which means that they may exhibit poor selectivity against the specified target. They also violate Brenk filters, which means that they may contain toxic functional groups. Additionally, they score poorly on the quantitative estimate of drug-likeness (QED). It is therefore unlikely that these molecules will be approved in clinical trials.

Lead optimisation of the selected molecules must be done manually.

The identification of hit molecules from a virtual screen is followed by a process known as lead optimisation, during which medicinal chemists attempt to generate new analogues with better potency and selectivity from the recommended scaffolds. This is a manual undertaking, whose laboriousness, compounded by the aforementioned structural deficiencies, inevitably precludes a thorough search of the chemical space near the most promising scaffolds. This usually leads to a failure to identify the most active analogues.

These observed difficulties are not hard to anticipate. A good drug molecule must possess numerous characteristics simultaneously: potency against the disease, low toxicity, high selectivity, bioavailability, structural dissimilarity from patented drugs, among many others. And virtual screening, by itself, is incapable of solving the many-faceted complexities of designing a molecule with conflicting objectives.

These considerations led Richard E. Lee, Endowed Chair in Medicinal Chemistry at St. Jude Children’s Research Hospital, to remark about Chemprop in (Lemonick): “We need new antibacterial chemotypes that may be hard to find through [Chemprop’s] approach.”

Generative design

Norachem goes beyond predictive modelling alone, and uses a new paradigm called generative design to create and optimise molecules for multiple objectives simultaneously. Our platform automates the process of combining the most important characteristics of the best molecules at each stage of the computation to create new molecules with progressively better properties. Even the most sophisticated predictive ensemble will fail to exceed the output of that same ensemble applied within Norachem’s generative design loop.

In order to demonstrate the superiority of generative design, we performed the exercise described in (Stokes et al., 2020). We invoked Chemprop from Norachem’s generative loop to design molecules to inhibit the growth of E. coli. In a short run that featured an initial library of only 300 molecules, Norachem was able to produce and recommend molecules with superior characteristics:

Figure 2: Molecules recommended by Norachem

We underscore the fact that these molecules were not present in the initial library of 300. They are entirely new molecules, created and optimised by Norachem to fit the goals of the exercise — i.e., the inhibition of E. coli while avoiding structural characteristics that are associated with adverse effects in humans. Table 2 highlights the success on this latter point:

Table 2: Some properties of the Norachem molecules

The molecules in Figure 2 avoid any violation of the PAINS and Brenk filters. And they possess better drug-likeness.

The plot in Figure 3 summarises the differences between the two sets of molecules. Chemprop predicts that molecule 1 from Norachem will be as potent against E. coli as halicin. But only Norachem’s molecules fall within the green zone of high predicted potency and superior drug-like qualities.

Figure 3: Only the Norachem molecules inhabit the green zone

We can perform a similar exercise with any number of objectives. Norachem’s suite includes automatic docking and retrosynthetic analysis of every molecule that it creates and evaluates in its generative loop.

Conclusion

Norachem’s approach of using generative design is methodologically superior to virtual screening, the predominant computational method of competing platforms. By automating the evaluation, selection, and cross-pollination of the best molecular structures, Norachem can search the chemical space near the most favourable scaffolds much more rapidly and thoroughly.

The result is the production of lead molecules in hours or days, not months or years.

References

Walters, W. P. (2019). Virtual Chemical Libraries. Journal of Medicinal Chemistry 62 (3), 1116–1124. DOI: 10.1021/acs.jmedchem.8b01048

Stokes, J. M., Yang, K., Swanson, K., Jin, W., Cubillos-Ruiz, A., Donghia, N. M., MacNair, C. R., French, S., Carfrae, L. A., Bloom-Ackermann, Z., Tran, V. M., Chiappino-Pepe, A., Badran, A. H., Andrews, I. W., Chory, E. J., Church, G. M., Brown, E. D., Jaakkola, T. S., Barzilay, R., Collins, J. J. (2020). A Deep Learning Approach to Antibiotic Discovery. Cell 180, 688–702. DOI: 10.1016/j.cell.2020.01.021

Lemonick, S. (2020, February 26). AI finds molecules that kill bacteria, but would they make good antibiotics? C&EN. cen.acs.org/physical-chemistry/computational-chemistry/AI-finds-molecules-kill-bacteria/98/web/2020/02