B-Net: A database for Biochemical Networks
Functional genomic studies are beginning to combine microarray, proteomic and metabolomic data. While genes and proteins can be related by their sequence similarity, metabolites cannot easily be related with genes or proteins. So far, the most effective way to do this is through reference databases that collect the known information about metabolic reactions, i.e. their substrates, products, and enzymes and genes. While several such biochemical databases already exist, not one of them combined all the features necessary for interpretation of metabolomics data. We have developed B-Net (for Biochemical Network), a relational database to serve as a reference for the functional genomic studies. B-Net follows a simple schema that, nevertheless, reflects the relevant intricacies of biochemistry. Unlike most metabolic databases, B-Net is specific for each molecular species in each organism. Thus, it makes a distinction between different isozymes, it does not contain reactions between classes of compounds but rather has each of the specific reactions, and all proteins and genes are linked to specific biological species. Our curation policy is balanced on the side of quality, as we prefer to not have information than to have erroneous information. In importing data from other databases described above, we have developed stringent criteria to accept records, and have rejected all that do not fulfill the criteria.
B-Net is organized based on four important concepts: genes, protein, compound (metabolite), and reaction. These concepts act as logical organizers for the data to be entered and viewed. The gene section may be queried by gene name, GenBank accession number, and EC number of its protein products, and contains information about (gene name) synonyms, accession number, and the proteins that the gene encodes. Similarly, protein information may be queried by protein name, EC number, and compounds that are substrates or products of the reactions it catalyzes; useful information that can be retrieved is name synonyms, EC number, systematic and the reactions it catalyzes. The compound table may be searched by name, and it gives synonyms and formula of the query compound. Users may search biochemical reactions by substrate and/or product, and enzyme activity; the information returned from these queries include the substrates, products, and their stoichiometric coefficients, and EC number and name of related enzymes. We have initiated a policy that each specific fact supporting the information in the database is stored in an "evidence" table. Evidence is classified with the codes defined in the Gene OntologyTM (Consortium, 2001). Our policy is to archive each single piece of evidence known for the data represented in the database. This allows users to have more or less credibility in the information. Each piece of evidence is also associated with a bibliographic reference, or alternatively with a web site. We have currently used B-Net for Medicago truncatula functional genomics project. For compounds we used the data of the LIGAND database of the Kyoto group and TAIR, but eliminated entries that represent classes of compounds, polymers, proteins other than redox pairs, and adducts. We used the reaction and enzyme activity data from the IUBMB Nomenclature Committee, but only retained reactions that had compounds in the previous list (i.e. refused reactions with classes of compounds, etc.). We used protein data from SWISS-PROT and gene data from a variety of sequence databases. We are also making efforts to create links between the objects in our reference database and those of the Gene OntologyTM.
