Introduction:

Tutorial inspured by this paper: Chemoinformatic Analysis of Combinatorial Libraries, Drugs, Natural Products and Molecular Libraries Small Molecule Repository: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2686115/

.. compared the properties of molecules found in four different chemical libraries.

The authors did several types of analysis. First, they generated a bunch of different features or “descriptors” for each molecule. A descriptor can be any property of the molecule, from trivial stuff like molecular weight and charge. Then they did a principal components analysis on the molecules and then scatterplotted the molecules in each library versus the set of FDA-approved drugs. Results: most of the molecules in these libraries occupy a chemical space quite distant from the approved drugs… therefore, not therapeutically active.

Important databases:

Getting started

R packages:rcdk and rpubchem. rJava is needed for visualization.

install.packages(c("rJava", "rcdk")) # huge, takes a few mins to download
library(rcdk) # docs at http://cran.r-project.org/web/packages/rcdk/rcdk.pdf, paper at http://www.jstatsoft.org/v18/i05/paper

Create a molecule object

Next I sought to manually create a Java object for a molecule from its SMILES. SMILES is a systematic, unambiguous plain-text representation of molecular structure. I found anle138b‘s SMILES in its PubChem entry.

# parse.smiles accepts a vector of SMILES strings and returns a list of type AtomContainer,
# containing items of type IAtomContainer
# if you have just one molecule of interest, just grab the first item with [[1]]
anle138b <- parse.smiles("C1OC2=C(O1)C=C(C=C2)C3=CC(=NN3)C4=CC(=CC=C4)Br")[[1]]

Exercise: Parse acetaminophen

Search for acetaminophen in PubChem, get the SMILES format: https://pubchem.ncbi.nlm.nih.gov/compound/acetaminophen Parse it in rcdk

ac <- parse.smiles("CC(=O)NC1=CC=C(C=C1)O")[[1]]

View/draw it

The next thing I wanted to do is plot it. The draw.molecule.2dfunction draws it in a separate Java window:

# draws it in a separate Java window
view.molecule.2d(anle138b)

Exercise: Draw acetaminophen

Get descriptors

The next thing I learned about rcdk is how to get these descriptors of molecules. rcdk comes with access to a whole list of descriptors which you can get like so:


```r
descriptors <- get.desc.names(type = "all")

This gave me a vector of character strings, the names of fifty descriptors you can calculate on your molecule. I figured a good one to try out would be the number of strikes a molecule has against it as a drug according to the Rule of Five:

# get one descriptor, for instance Rule of Five descriptors http://en.wikipedia.org/wiki/Lipinski's_rule_of_five
eval.desc(anle138b, "org.openscience.cdk.qsar.descriptors.molecular.RuleOfFiveDescriptor")

Exercise: Other properties

Try to get some properties of a molecule using rcdk’s base functions, like get.total.charge, get.volume, get.exact.mass

get.exact.mass(ac)

Another way of getting drugs

Load it as sdf

# Download the chemical structure in SDF format from PubChem (Diclofenac)
dic <- load.molecules("data/Structure2D_CID_3033.sdf")

Evaluate the new target drug

sapply(descriptors, eval.desc, molecules = anle138b)
LS0tCnRpdGxlOiAiTGVjdHVyZSAzOiBDaGVtaWNhbCBpbmZvcm1hdGljcyBwcmltZXIiCnNvdXJjZTogUm1kCi0tLQoKCmBgYHtyIHNldHVwLCBpbmNsdWRlPUZBTFNFfQpsaWJyYXJ5KGtuaXRyKQprbml0cjo6b3B0c19jaHVuayRzZXQoZWNobyA9IFRSVUUsIGV2YWwgPSBGQUxTRSkKYGBgCgojIyBJbnRyb2R1Y3Rpb246CgpUdXRvcmlhbCBpbnNwdXJlZCBieSB0aGlzIHBhcGVyOiBDaGVtb2luZm9ybWF0aWMgQW5hbHlzaXMgb2YgQ29tYmluYXRvcmlhbCBMaWJyYXJpZXMsIERydWdzLCBOYXR1cmFsIFByb2R1Y3RzIGFuZCBNb2xlY3VsYXIgTGlicmFyaWVzIFNtYWxsIE1vbGVjdWxlIFJlcG9zaXRvcnk6IGh0dHBzOi8vd3d3Lm5jYmkubmxtLm5paC5nb3YvcG1jL2FydGljbGVzL1BNQzI2ODYxMTUvCgouLiBjb21wYXJlZCB0aGUgcHJvcGVydGllcyBvZiBtb2xlY3VsZXMgZm91bmQgaW4gZm91ciBkaWZmZXJlbnQgY2hlbWljYWwgbGlicmFyaWVzLgoKVGhlIGF1dGhvcnMgZGlkIHNldmVyYWwgdHlwZXMgb2YgYW5hbHlzaXMuIEZpcnN0LCB0aGV5IGdlbmVyYXRlZCBhIGJ1bmNoIG9mIGRpZmZlcmVudCBmZWF0dXJlcyBvciAiZGVzY3JpcHRvcnMiIGZvciBlYWNoIG1vbGVjdWxlLiBBIGRlc2NyaXB0b3IgY2FuIGJlIGFueSBwcm9wZXJ0eSBvZiB0aGUgbW9sZWN1bGUsIGZyb20gdHJpdmlhbCBzdHVmZiBsaWtlICoqbW9sZWN1bGFyIHdlaWdodCBhbmQgY2hhcmdlKiouIFRoZW4gdGhleSBkaWQgYSAqKnByaW5jaXBhbCBjb21wb25lbnRzIGFuYWx5c2lzKiogb24gdGhlIG1vbGVjdWxlcyBhbmQgdGhlbiAqKnNjYXR0ZXJwbG90dGVkKiogdGhlIG1vbGVjdWxlcyBpbiBlYWNoIGxpYnJhcnkgdmVyc3VzIHRoZSBzZXQgb2YgRkRBLWFwcHJvdmVkIGRydWdzLiBSZXN1bHRzOiBtb3N0IG9mIHRoZSBtb2xlY3VsZXMgaW4gdGhlc2UgbGlicmFyaWVzIG9jY3VweSBhIGNoZW1pY2FsIHNwYWNlIHF1aXRlICoqZGlzdGFudCoqIGZyb20gdGhlIGFwcHJvdmVkIGRydWdzLi4uIHRoZXJlZm9yZSwgbm90IHRoZXJhcGV1dGljYWxseSBhY3RpdmUuCgojIyBJbXBvcnRhbnQgZGF0YWJhc2VzOgoqIFB1YkNoZW0KCiogRHJ1Z0JhbmsKCiogQ2hlbUJhbmsKCiogQ2hFTUJMCgojIyBHZXR0aW5nIHN0YXJ0ZWQKUiBwYWNrYWdlczpgcmNka2AgYW5kIGBycHViY2hlbWAuICBgckphdmFgIGlzIG5lZWRlZCBmb3IgdmlzdWFsaXphdGlvbi4KCmBgYHtyLCBldmFsID0gRkFMU0V9Cmluc3RhbGwucGFja2FnZXMoYygickphdmEiLCAicmNkayIpKSAjIGh1Z2UsIHRha2VzIGEgZmV3IG1pbnMgdG8gZG93bmxvYWQKYGBgCmBgYHtyfQpsaWJyYXJ5KHJjZGspICMgZG9jcyBhdCBodHRwOi8vY3Jhbi5yLXByb2plY3Qub3JnL3dlYi9wYWNrYWdlcy9yY2RrL3JjZGsucGRmLCBwYXBlciBhdCBodHRwOi8vd3d3LmpzdGF0c29mdC5vcmcvdjE4L2kwNS9wYXBlcgpgYGAKCiMjIENyZWF0ZSBhIG1vbGVjdWxlIG9iamVjdApOZXh0IEkgc291Z2h0IHRvIG1hbnVhbGx5IGNyZWF0ZSBhIEphdmEgb2JqZWN0IGZvciBhIG1vbGVjdWxlIGZyb20gaXRzIGBTTUlMRVNgLiAgU01JTEVTIGlzIGEgc3lzdGVtYXRpYywgdW5hbWJpZ3VvdXMgcGxhaW4tdGV4dCByZXByZXNlbnRhdGlvbiBvZiBtb2xlY3VsYXIgc3RydWN0dXJlLiAgSSBmb3VuZCBhbmxlMTM4YuKAmHMgU01JTEVTIGluIGl0cyBgUHViQ2hlbWAgZW50cnkuCgpgYGB7cn0KIyBwYXJzZS5zbWlsZXMgYWNjZXB0cyBhIHZlY3RvciBvZiBTTUlMRVMgc3RyaW5ncyBhbmQgcmV0dXJucyBhIGxpc3Qgb2YgdHlwZSBBdG9tQ29udGFpbmVyLAojIGNvbnRhaW5pbmcgaXRlbXMgb2YgdHlwZSBJQXRvbUNvbnRhaW5lcgojIGlmIHlvdSBoYXZlIGp1c3Qgb25lIG1vbGVjdWxlIG9mIGludGVyZXN0LCBqdXN0IGdyYWIgdGhlIGZpcnN0IGl0ZW0gd2l0aCBbWzFdXQphbmxlMTM4YiA8LSBwYXJzZS5zbWlsZXMoIkMxT0MyPUMoTzEpQz1DKEM9QzIpQzM9Q0MoPU5OMylDND1DQyg9Q0M9QzQpQnIiKVtbMV1dCmBgYAoKIyMjIEV4ZXJjaXNlOiBQYXJzZSBhY2V0YW1pbm9waGVuClNlYXJjaCBmb3IgYWNldGFtaW5vcGhlbiBpbiBQdWJDaGVtLCBnZXQgdGhlIFNNSUxFUyBmb3JtYXQ6IGh0dHBzOi8vcHViY2hlbS5uY2JpLm5sbS5uaWguZ292L2NvbXBvdW5kL2FjZXRhbWlub3BoZW4KUGFyc2UgaXQgaW4gYHJjZGtgCgpgYGB7cn0KYWMgPC0gcGFyc2Uuc21pbGVzKCJDQyg9TylOQzE9Q0M9QyhDPUMxKU8iKVtbMV1dCmBgYAoKIyMgVmlldy9kcmF3IGl0ClRoZSBuZXh0IHRoaW5nIEkgd2FudGVkIHRvIGRvIGlzIHBsb3QgaXQuIFRoZSBgZHJhdy5tb2xlY3VsZS4yZCBgZnVuY3Rpb24gZHJhd3MgaXQgaW4gYSBzZXBhcmF0ZSBKYXZhIHdpbmRvdzoKCmBgYHtyLCBldmFsID0gRkFMU0V9CiMgZHJhd3MgaXQgaW4gYSBzZXBhcmF0ZSBKYXZhIHdpbmRvdwp2aWV3Lm1vbGVjdWxlLjJkKGFubGUxMzhiKQpgYGAKCiMjIyBFeGVyY2lzZTogRHJhdyBhY2V0YW1pbm9waGVuCgo8IS0tIFRoYXQgZnVuY3Rpb24gaXMgZXNwZWNpYWxseSBuaWNlIGJlY2F1c2UgaWYgeW91IHBhc3MgaXQgbXVsdGlwbGUgbW9sZWN1bGVzIGl0IGNhbiBwbG90IHRoZW0gYWxsIGluIGEgZ3JpZCBmb3IgY29tcGFyaXNvbi4gIEJ1dCBmb3IganVzdCBvbmUgbW9sZWN1bGUgSSB0aG91Z2h0IGl0IHdvdWxkIGJlIG5pY2VyIHRvIHBsb3QgZGlyZWN0bHkgaW4gUiwgc28gYWRhcHRpbmcgYSBiaXQgb2YgY29kZSBmcm9tIHRoZSByY2RrIGRvY3MgSSB3cm90ZSB0aGlzIGZ1bmN0aW9uOgoKYGBge3J9CnJjZGtwbG90IDwtIGZ1bmN0aW9uKG1vbGVjdWxlLHdpZHRoID0gNTAwLCBoZWlnaHQgPSA1MDApIHsKICAgIHBhcihtYXI9YygwLDAsMCwwKSkgIyBzZXQgbWFyZ2lucyB0byB6ZXJvIHNpbmNlIHRoaXMgaXNuJ3QgYSByZWFsIHBsb3QKICAgIHRlbXAxID0gdmlldy5pbWFnZS4yZChtb2xlY3VsZSx3aWR0aCxoZWlnaHQpICMgZ2V0IEphdmEgcmVwcmVzZW50YXRpb24gaW50byBhbiBpbWFnZSBtYXRyaXguIHNldCBudW1iZXIgb2YgcGl4ZWxzIHlvdSB3YW50IGhvcml6IGFuZCB2ZXJ0aWNhbAogICAgcGxvdChOQSxOQSx4bGltPWMoMSwxMCkseWxpbT1jKDEsMTApLHhheHQ9J24nLHlheHQ9J24nLHhsYWI9JycseWxhYj0nJykgIyBjcmVhdGUgYW4gZW1wdHkgcGxvdAogICAgcmFzdGVySW1hZ2UodGVtcDEsMSwxLDEwLDEwKSAjIGJvdW5kYXJpZXMgb2YgcmFzdGVyOiB4bWluLCB5bWluLCB4bWF4LCB5bWF4LiBoZXJlIGkgc2V0IHRoZW0gZXF1YWwgdG8gcGxvdCBib3VuZGFyaWVzCn0KcmNka3Bsb3QoYW5sZTEzOGIpCmBgYCAtLT4KCiMjIEdldCBkZXNjcmlwdG9ycwpUaGUgbmV4dCB0aGluZyBJIGxlYXJuZWQgYWJvdXQgYHJjZGtgIGlzIGhvdyB0byBnZXQgdGhlc2UgZGVzY3JpcHRvcnMgb2YgbW9sZWN1bGVzLiAgcmNkayBjb21lcyB3aXRoIGFjY2VzcyB0byBhIHdob2xlIGxpc3Qgb2YgZGVzY3JpcHRvcnMgd2hpY2ggeW91IGNhbiBnZXQgbGlrZSBzbzoKCmBgYHtyfQpkZXNjcmlwdG9ycyA8LSBnZXQuZGVzYy5uYW1lcyh0eXBlID0gImFsbCIpCmBgYApUaGlzIGdhdmUgbWUgYSBgdmVjdG9yYCBvZiBjaGFyYWN0ZXIgc3RyaW5ncywgdGhlIG5hbWVzIG9mIGZpZnR5IGRlc2NyaXB0b3JzIHlvdSBjYW4gY2FsY3VsYXRlIG9uIHlvdXIgbW9sZWN1bGUuIEkgZmlndXJlZCBhIGdvb2Qgb25lIHRvIHRyeSBvdXQgd291bGQgYmUgdGhlIG51bWJlciBvZiBzdHJpa2VzIGEgbW9sZWN1bGUgaGFzIGFnYWluc3QgaXQgYXMgYSBkcnVnIGFjY29yZGluZyB0byB0aGUgW1J1bGUgb2YgRml2ZV0oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGlwaW5za2knc19ydWxlX29mX2ZpdmUpOgoKYGBge3J9CiMgZ2V0IG9uZSBkZXNjcmlwdG9yLCBmb3IgaW5zdGFuY2UgUnVsZSBvZiBGaXZlIGRlc2NyaXB0b3JzIGh0dHA6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTGlwaW5za2knc19ydWxlX29mX2ZpdmUKZXZhbC5kZXNjKGFubGUxMzhiLCAib3JnLm9wZW5zY2llbmNlLmNkay5xc2FyLmRlc2NyaXB0b3JzLm1vbGVjdWxhci5SdWxlT2ZGaXZlRGVzY3JpcHRvciIpCmBgYAoKIyMjIEV4ZXJjaXNlOiBPdGhlciBwcm9wZXJ0aWVzClRyeSB0byBnZXQgc29tZSBwcm9wZXJ0aWVzIG9mIGEgbW9sZWN1bGUgdXNpbmcgcmNka+KAmXMgYmFzZSBmdW5jdGlvbnMsIGxpa2UgYGdldC50b3RhbC5jaGFyZ2VgLCBgZ2V0LnZvbHVtZWAsIGBnZXQuZXhhY3QubWFzc2AKCmBgYHtyfQpnZXQuZXhhY3QubWFzcyhhYykKYGBgCgojIyBBbm90aGVyIHdheSBvZiBnZXR0aW5nIGRydWdzCkxvYWQgaXQgYXMgYHNkZmAKCmBgYHtyfQojIERvd25sb2FkIHRoZSBjaGVtaWNhbCBzdHJ1Y3R1cmUgaW4gU0RGIGZvcm1hdCBmcm9tIFB1YkNoZW0gKERpY2xvZmVuYWMpCmRpYyA8LSBsb2FkLm1vbGVjdWxlcygiZGF0YS9TdHJ1Y3R1cmUyRF9DSURfMzAzMy5zZGYiKQpgYGAKCiMjIEV2YWx1YXRlIHRoZSBuZXcgdGFyZ2V0IGRydWcKYGBge3J9CnNhcHBseShkZXNjcmlwdG9ycywgZXZhbC5kZXNjLCBtb2xlY3VsZXMgPSBhbmxlMTM4YikKYGBgCgoKYGBge3Iga25pdF9leGl0LCBpbmNsdWRlPUYsIGVjaG89Rn0Ka25pdF9leGl0KCkKYGBgCg==