ChemoHub tutorials:


ChemoHub (or Chemogene?) is a open platform to explore chemical gene/protein association using the state of art. The association does not limit to physical binding, but also covers other types of interaction (toxicogenomics,pharmagenomics). It's one of the applications of Chem2Bio2RDF project, in which the well known public chemogenomics resources are well integrated into a semantic format (ie. RDF). In ChemoHub, the user could find the association between chemical and protein /gene (directly, indirectly), chemical associated genes/proteins and gene associated chemicals. If the input association does not exist, three predictive models (SEA, Naive Bayes, SLAP) allow the user to estimate the probability that the chemical and gene are associated. Meanwhile, the literature results of such association can be explored in this platform. Similar works are ChemProt STITCH ChemBench. We promote Open--Open Data, Open Model and Open Source. We welcome people to share their data (i.e., via sparql endpoint) or share their predictive model (i.e., via web services).

Input:

Compound: You can input compound pubchem id (CID), smiles, or compound names (only drug name supported)
Gene: You can input Uniprot ID, Gene Symbol or gene/protein name

for slap:
If you only input in the compound box, the network will shows all the chemical associated genes.
If you only input in the gene box, the network will shows all the gene associated compounds.

others are required to input both compound and gene.

after input, you can click "Go" aside of "Semantic Link Association Prediction" to get the network as well as the predictive result if the direct association does not exist.

Network Visualization

The network is visualized in cytoscape web plugin, which provides plenty of functions to explore the network (let us know if you need add new functions). The node can be the uri of compound, drug, gene, pathway, side effect, disease, tissue, and GO. they are dereferencable URI, via which you will be able to get all other info of this node. the edge shows their link type. click the edge, you can be directed into the original source.and allows you to go to our one of REST services to explore more relations between the two nodes

the input nodes are shown in yellow, others are in white.
Also it's easy for you to use "panZoomControl" to zoom, drag....., on any nodes, edges, or whole network.

Output - Predictive Model

SEA:

Similarity Ensemble Approach (SEA) was originally used to relate proteins based on the ligand sets similarity, then it was applied to drug target prediction. We implemented this method into our datasets, instead, we are using public fingerprint (MACCS) to measure compound similarity.

E value presents the probability. The lower value, the higher probability of the association. In general, E value<1 is good. E value<10^-10 shows very strong association.

Naive Bayes

Naive Bayes shows its robustness to noisy data and its well performance in HTS has been described in a number of papers.

The result shows the probability along with the performance of the model. The higher probability shows the stronger association.
more info of the model....

SLAP

We developed SLAP to capture the direct and indirect association based on a statistical model of the linked data. Basically, it assumes that the objects are related if they share related objects.

The first line to the results indicate the strength of association, which is categorized into strong, weak, very weak, unknown. Second line shows the P value, the smaller, the stronger.

PubMed Occurrence

It lists all the pubmed uri, where the association occurs in its abstract or title.
All the results displayed there have been ranked by 1) sum of the frequency of gene and compound occurred in the same literature; 2) publication date when the frequency is the same. Cause some of cases, the number of associations are huge(runs hundreds of something), so we list at most top 10 ranked links there.

if no results found or some exception occurred during the process, it shows "Failed"

Use cases:

1) Find Chemical Gene association
some examples

2) Find Gene associated Chemicals
Input NR1I2

3) Find Chemical associated genes
Input 5991

Main Contributors

Bin Chen
Qian Zhu