Introduction


SPARQL endpoints provide a way to query Chem2Bio2RDF; however, it’s difficult for a user who does not quite understand the structure of RDF data, to write the query. Furthermore, SPARQL does not satisfy the requirement of the advanced exploration of the data, such as the demonstration of the associations between two subjects. This project is aiming to provide a platform for better interacting with the data and better understanding the data using ontology and network analysis. The project will eventually allow the user proving the proposed hypothesis based on chem2bio2rdf data and literature results

Data representation:

ontology.jpg

Currently, the Chem2Bio2RDF data is represented as RDF format and they are connected as a network (figure above). We will build ontology above the RDF data. With the top as chem2bio2rdf, the ontology will divide into five parts (chemical, protein, systems, disease and side effect). Each part will further extend to its child nodes hierarchically, which will be eventually linked to the instances in chem2bio2rdf. Given any node in the ontology, it can be directed to the associated instances. For example, while talking inhibitors, we can direct to its subclass kinase inhibitors, in which a child node VEGF inhibitor are associated with the instances Ranibizumab and Pegaptanib. Here are the links of ontology. We will have to integrate them into Chem2Bio2RDF.

infrastructure.JPG

The input can be any number of nodes in the ontology or instances. If the user wants to know the drug information, just input the drug name or stucture. If the user wants to sudy why the drug results in a side effect, just input the drug name and the side effect name. Then the search engine will map the inputs to the specific instances based on the ontology. Most of the times, the inputs are not directly connected, but somehow, they are linked through several other instances.For example, drug doxazosin and side effect necrosis are connected through protein PTGS1 and VEGF singaling pathway. The path generation will generat all the possible paths between the inputs. The output will be visualzed as a network(figure below). Besides, the relations of inputs could be found in the literature via PubMed search. This platform enables the user to propose a hypothesis and further prove the hypothesis based on the experimental data as well as literature search.

demo.JPG
The figure is a demo of the platform. In the network visualization, the user can further explore the data by the interactive functions.

Input:

This part will find all the instances of the inputs. For example, if the input is a drug name, it will retrieve its unique identifier (i.e., URI). If it’s the VEGF inhibitors, it will generate two instances drug Ranibizumab and drug Pegaptanib based on the ontology search. URI represents all the instances.

Besides, after the instances are found, key words of the instances will be created for PubMed search. The key words generally can be name, mesh terms and so on.

Path generation

This will implement the network path generation algorithm (e.g., breadth first search), but It’s different, since here we are using the RDF data, maybe we have to combine the jena api and network analysis algorithms together.

Network visualization

In the demo, we are using JUNG package to visualize the network, also we can use FLARE (flash actionscript language) to visualize them. The nodes can be the instances or the nodes in the ontology. We will implement the following interesting functionalities:

Node expansion:

Parallel Expansion

find the similar nodes. For example, for the drug node, find the drugs with Tanimoto similarity >0.95. Once the new nodes are decided, these nodes will be connected to other nodes that are already in the network.

Vertical Expansion

If the node is in the ontology, we can list all the instances of the nodes. If several instances belong to one node in the ontology, fold them into one node.

Node information:

list all the basic information of that node including the URI, the sources and the provenance; list the literatures talking about that node.

Edge expansion

Two nodes are connected by several edges. For drug target interaction, the experiment may give different results under different conditions, thus yield multiple edges between two nodes.

Edge information:

Provenance, literature validation.

Case study:

Why drug1 and drug2 have same side effect but have different structures? Via network, the drug 1 and drug 2 can be connected by some paths, that might explain this question.

A couple of chemicals are linked to one target and these chemicals have one scarffold, so we can reason that this scaffold is associated with the targets.

References:

1) Towards zoomable multidimensional maps of the cell. Nat Biotechnol, 25(5): p. 547-54.
2) http://www.genego.com/