Profile

Cover photo
Chemistry Development Kit
833 followers|66,403 views
AboutPostsPhotosVideos

Stream

 
 
Part II... 
This post follows up on the previous to report some timings. I've checked all the code into GitHub (johnmay/efficient-bits/fp-idx) and it has some stand alone programs that can be run from the command line. Currently there ar...
1
Add a comment...
 
More information on the ECFP/FCFP implementation...
As of now, the latest version of the popular open source Chemical Development Kit (CDK) has its own implementation of the highly regarded ECFP and FCFP classes of chemical structure fingerprints (s...
3
2
Noel O'Boyle's profile photoBjörn Grüning's profile photo
Add a comment...
 
 
Changes in CDK 1.6 #3: Constructors that now require a builder
The advantage of the builders in the CDK  is that code can be independent of data class implementations (and we have three of them in CDK 1.6, at this moment). Over the past years more and more code started using the approach, but that does involve that mor...
The advantage of the builders in the CDK is that code can be independent of data class implementations (and we have three of them in CDK 1.6, at this moment). Over the past years more and more code started using the approach, but that does involve that more and more class constructors take a ...
1
Add a comment...
 
Paper where +Ola Spjuth, +Arvid Berg, Sam Adams, and me where we outline of the InChI is integrated into the CDK and used in the +Bioclipse.
1
1
Egon Willighagen's profile photo
Add a comment...
 
Uses the CDK to predict a number of properties for compounds.
A; Accounts of Chemical Research · ACS Applied Materials & Interfaces · ACS Catalysis · ACS Chemical Biology · ACS Chemical Neuroscience · ACS Combinatorial Science · - Journal of Combinatorial Chemistry · ACS Macro Letters · ACS Medicinal Chemistry Letters · ACS Nano · ACS Photonics ...
1
Add a comment...
 
"STITCH is a database of protein–chemical interactions that integrates many sources of experimental and manually curated evidence with text-mining information and interaction predictions."

This paper use tanimoto calculations to remove similar compounds:

"To avoid biases, we first excluded highly similar chemicals, enforcing a maximum Tanimoto similarity of 0.9 using 2D chemical fingerprints calculated with the chemistry development kit."

BTW, much data in this database has a Creative Commons license flavor
1
4
Leobardo Oscar Alcántara Ocaña's profile photoRoland Haroutiounian's profile photo
Add a comment...
In their circles
22 people
Have them in circles
833 people
Erhan Yazan's profile photo
Arvid Berg's profile photo
KhalifA Alkaabi's profile photo
Olaf Prause's profile photo
Soroosh Naghdi's profile photo
Oshan Promodhya Edirisinghe's profile photo
jose taveras's profile photo
Zhihong Liu's profile photo
방영배's profile photo
 
Some things are unpredictable. For example, the impact of PaDEL by Chun Wei Yap. While seemingly just providing a simple API around CDK's descriptor and fingerprint functionality, it's impact is significant. Higher than, for example, that of Bioclipse and AMBIT which also provide GUIs around that functionality.

Web-of-Science lists 71 citations of this work, while Google Scholar guestimates it at around 120. Well done!
PaDEL-Descriptor is a software for calculating molecular descriptors and fingerprints. The software currently calculates 797 descriptors (663 1D, 2D descriptors, and 134 3D descriptors) and 10 types of fingerprints. These descriptors and fingerprints are calculated mainly using The Chemistry ...
1
Add a comment...
 
 
You can also read an SDfile more efficiently by repeated calls to MDLV2000Reader. The pattern is similar to BufferedReader.readLine() in a while loop.  
This post in a series about API changes in CDK 1.6 is about the iterating reader for SD files, which are basically a list of MDL molfile (Symyx, ... I lost track) complemented with properties for each structure. Since the CDK IO readers have a representation of the file format in the class name, ...
1
Egon Willighagen's profile photo
 
But a MDL molfile doesn't have "> <FIELDS>"... ??
Add a comment...
 
 
CDK Release 1.5.6
1
1
Martin Bohun's profile photo
Add a comment...
 
Metabolomics paper by +Steffen Neumann and others where +Rajarshi Guha's rcdk package is used to calculate tanimoto similarities.
Mass spectrometry (MS) has become the analytical method of choice in plant metabolomics. Nevertheless, metabolite annotation remains a major challenge and implies the integration of structural searches in compound libraries with biological knowledge inferred from metabolite regulation studies. Here we propose a novel integrative approach to process and exploit the rich structural information contained in in-source fragmentation patterns of high-r...
1
1
Rajarshi Guha's profile photo
Add a comment...
 
This NanoQSAR paper uses the CDK to calculate molecular descriptors for coating components.
1
Add a comment...
 
This paper adapts the CDK MACSS fingerprinter to study underlying structure-cytotoxicity pattersn of ionic liquids (ILs):

"This modification consists of the computation of the original 166 bits MACCS key for each IL constituent species, the differentiation of anion’s and cation’s MACCS keys by summing to each bit position on the MACCS key of one of the constituent ionic species a constant value equal to the length of the bits string (166 in this case), and further concatenation of both (cation + anion) bits strings. The result is a concatenated fingerprint of 332 bits codifying the molecular structure of ILs."
1
1
Egon Willighagen's profile photo
Add a comment...
People
In their circles
22 people
Have them in circles
833 people
Erhan Yazan's profile photo
Arvid Berg's profile photo
KhalifA Alkaabi's profile photo
Olaf Prause's profile photo
Soroosh Naghdi's profile photo
Oshan Promodhya Edirisinghe's profile photo
jose taveras's profile photo
Zhihong Liu's profile photo
방영배's profile photo
Links
Story
Tagline
The Open Source Cheminformatics and Bioinformatics Toolkit
Introduction
This G+ Page will be used to share news around the CDK, like links to pages discussing new release of software that uses the CDK, blog posts that analyze CDK functionality, etc.
Contact Information
Contact info
Email