BioMeta is intended to be complementary to the KEGG Ligand database by focusing on the application of organic chemical knowledge to small compounds, thus ensuring that the compounds and implicitly the reactions are correct. Hundreds of molecular structures were corrected or improved.
Table 1 gives a breakdown of the validation results and the corrections made in the 12,815 molecule entries present in both BioMeta and the KEGG Ligand compound section of October 25, 2005. Note that the absence of a structure does not need to be an error - it may be a generic compound such as "acceptor" or "phosphorylated protein". The validation program can detect only syntactical problems, e.g., valence violations, undefined enantiomer, or invalid stereochemistry. Some are real errors requiring correction, such as valence violations or ambiguously drawn stereocenters. Problems in the "undefined" categories suggest incomplete structural information, but not all such cases are necessarily incorrect; for instance, a drug that is a racemic compound would trigger the warning "unspecified enantiomer". Problems in the "incorrect" categories have not been detected by the validation program since these errors are semantic rather than syntactic - they were detected through visual inspection. In table 1, the row "total (stereochemistry)" is not the sum of the preceding cases because compounds may have multiple problems. The rows with the totals do not add up because of the "unknown" entries - if these numbers were known then the numbers would add up.
|Type of Problem||# in KEGG||# in BioMeta||# Corrected|
|Undefined stereo double bond(s)||35||32||3|
|Invalid sp3 stereocenter(s)||70||47||23|
|Ambiguous sp3 stereocenter(s)||46||0||46|
|Undefined sp3 stereocenter(s)||1398||865||533|
|Undefined sp3 stereochemistry||554||366||188|
A total of 1468 structures were corrected. The large majority of valence errors involved nitrogen atoms that were not trivalent. The most common of these were: 1) a nitrogen atom having one double bond and two single bonds, but no charge (i.e., intended to be a pyridinium- or nitro-type nitrogen), these were corrected by removing an attached hydrogen or else by adding a positive charge, and 2) coordinative bonds from a imine-type nitrogen to a metal indicated as covalent. Unfortunately, the molfile format does not support coordinative bonds, so these bonds had to be removed.
Table 2 gives a more detailed breakdown of the sp3 stereochemistry enhancements from Table 1 (in some places the numbers are slightly lower because double-bond stereochemistry is omitted). The numbers relate to the 12,815 molecule entries present in both BioMeta and the KEGG Ligand compound section of October 25, 2005, minus the 1,239 entries that had no structure in KEGG. The table lists 76 more entries for BioMeta than for KEGG because compounds with valence errors are not stereochemically analysed. The "unspecified enantiomer" cases from Table 1 are split here between two "relative" stereochemistry cases, incompletely and completely defined. Note again that not all "Completely defined - relative" cases need to be errors - a number of drugs may be racemic compounds. All cases (also for meso compounds) are listed so that the numbers add up.
|Stereochemistry||OK||# in KEGG||# in BioMeta||# Corrected|
|Undefined (i.e., left out)||−||554||366||188|
|Incompletely defined - meso||−||24||3||21|
|Incompletely defined - absolute||−||1080||691||389|
|Incompletely defined - relative||−||294||171||123|
|Completely defined - meso||+||56||89|
|Completely defined - absolute||+||3735||4823|
|Completely defined - relative||−||2032||1669||363|
|Total not OK||3984||2900||1084|