Monday, November 4, 2013

Comparing Fraggle to other Fingerprints

Comparing Fraggle to other fingerprints

The Fraggle similarity algorithm from Jameed Hussain and Gavin Harper is available in the RDKit since the 2013_09 release.
The algorithm, which is described here: https://raw.github.com/rdkit/UGM_2013/master/Presentations/Hussain.Fraggle.pdf, uses the similarity between fragments of the query molecule and the database molecule and is an interesting complement to standard fingerprint similiarity.
Here I will take a look at Fraggle using the same tools I applied to the other fingerprinting methods in these two posts:
http://rdkit.blogspot.ch/2013/10/fingerprint-thresholds.html
http://rdkit.blogspot.ch/2013/10/comparing-fingerprints-to-each-other.html

TL;DR Summary

The baseline similarity values for Fraggle are quite high:
Fingerprint Metric 90% level 95% level 99% level
Fraggle 0.483 0.538 0.650
As expected from the definition, Fraggle similarity tends to be higher than RDKit5 similarity:
This is a nice example of a case where the RDKit5 fingerprint says the molecules are quite dissimilar, but Fraggle provides the expected high similarity score:
mol1 mol2 Fraggle RDKit5 Fragment FragMol
15634 Mol Mol 0.927711 0.191693 [*]c1ncnc2[nH]cnc21 Mol
Another interesting point about Fraggle is that it pulls back compounds that are quite complementary to the other methods we've looked at. To demonstrate, here is the percent overlap in the top 100 pairs found by Fraggle and a few other fingerprints:
Fingerprint 1 Fingerprint 2 Fraction in common (top 100)
Fraggle AP 0.18
Fraggle Avalon-1024 0.16
Fraggle RDKit5 0.24
Fraggle TT 0.21
AP Avalon-1024 0.58
AP RDKit5 0.69
AP TT 0.86
Avalon-1024 RDKit5 0.56
Avalon-1024 TT 0.60
RDKit5 TT 0.70
Unfortunately blogger isn't up to the challenge of the full post. It's available in the nbviewer here.

No comments: