

More recently, many genome and metagenome investigations have focused on identifying modules in PPI networks. Previous studies have shown modular structures in PPI (protein-protein interaction) networks. We build our networks using data and information extracted from several online databases along with supporting knowledge in the form of biological ontologies. We explore the hypothesis that many human diseases are linked by common genetic modules, therefore a defect in one of any of the cooperating genes in a module may lead to a specific disease or related symptom. In this paper we take these ideas further and build upon current knowledge to build up a network of human related diseases based on graph theory and the concept of overlap or shared function. Analysis of graph networks has revealed that genes and proteins cooperate in modules performing specific functions and that there is crosstalk or overlap between modules. Many interesting and novel discoveries have been made using graph based structures using links and nodes to represent the relationships between proteins and genes. Recently there has been a lot of interest in using computational techniques to build networks of protein-to-protein interactions, interacting gene networks and metabolic reactions.
#Datagraph mips code
All the programs developed are freely available through the Google Code Platform.

It is based on a set of in-house developed programs that includes the JBioWH and Taxoner. This workflow is applied either to create a marker database for taxonomy binning or just to obtain unique DNA segments among a group of targets sequences. Finally, a workflow for DNA sequence comparison is presented. As a result of alterations to the indexing used, this pipeline is fast enough to run evaluations on a single PC, and is highly sensitive as a result, it can be adapted to the analysis problems such as detecting pathogens in human samples. Also, a program (Taxoner) was developed to identify taxonomies by mapping NGS reads to a comprehensive sequence database. This system has been supplying integrative data for many bioinformatics projects. The framework is comprised of a Java API for external use, a desktop client and a webservices application. Hence, in this thesis, we developed a biological data integration framework (JBioWH) that has a modular design for the integration of the most important biological databases. Therefore, the integration of biological data, as a product of those technological advances, is far from being a solved task although it is one of the most important and basic element inside the bioinformatics research and/or System Biology projects. Unfortunately, the bioinformatics’ tools haven’t changed their algorithms and computational techniques to deal with this data explosion. Furthermore, next-generation sequencing (NGS) technologies created to sequence very long DNA pieces at low cost, are widely used to generate biological data. This fact is producing an avalanche of unmanageable data converting the biological sciences from a poor data discipline to a rich one. Recent advances in high-throughput technologies have been expanding the experimental scenario. Edgetics is a relatively new concept which is proving effective for modelling the relationships between genes, diseases and drugs which were previously considered intractable problems.īioinformatics experiments usually require efficient computational systems that streamline the data processing. Furthermore, diseases appear to be interlinked with hub genes causing multiple problems and this has led to the concept of the human disease network or diseasome. Disruptions caused by mutations can be explained as loss of connectivity such as the deletion of nodes or edges in the network (hence the term edgetics). We show how snps can be represented by complex graph structures, the connectivity patterns if represented by graphs can be related to human diseases, where the proteins are the nodes (vertices) and the interactions between them are represented by links (edges). These can potentially cause the amino acids to be changed and may affect protein function and thus structural stability which can contribute to developing diseases. Complex networks are a graph theoretic method that can model genetic mutations, in particular single nucleotide polymorphisms (snps) which are genetic variations that only occur at single position in a DNA sequence.
