Guide to ptools-xml

Contents

The ptools-xml Tag

ptools-xml is an XML-based output format that is based on and is closely related to the internal Pathway Tools object representation. For a description of web services that generate ptools-xml documents, see here.

Each ptools-xml document is enclosed in a ptools-xml tag. The ptools-version attribute indicates which version of Pathway Tools was used to generate the document. This is important because the Pathway Tools schema (or the ptools-xml format) can change from version to version, and the changes may not always be backwards-compatible.

Example:

<ptools-xml ptools-version="15.0" xml:base="http://biocyc.org/getxml?ecoli:EG11025">

Metadata

Each ptools-xml document begins with a metadata section. In addition to information about the request that generated the document, this section includes a PGDB element for every Pathway/Genome Database that is referenced in the document. Most queries will return objects from only a single database, so there will be only a single PGDB element. However, it is possible to use BioVelo to construct queries that return objects from multiple databases, in which case a PGDB element will be generated for each database.

The PGDB tag contains the orgid attribute. It is this attribute that is used to link an object in the body of the document to the database to which it belongs. The orgid attribute can also be used when constructing further queries against this database. The PGDB element also contains elements that identify the organism it describes, including, where appropriate, a link to the NCBI Taxonomy Database.

The num_results element records the number of top-level object elements (including error elements) in the body of the document. An exception is made for queries that return PGDBs rather than objects (such as with the BioVelo dbs query). In that case, information about the returned PGDBs is included in the metadata section rather than in the document body, and those PGDBs are counted in the num_results field.

Sample Metadata section:

<metadata> <url>http://biocyc.org/</url> <service_name>getxml</service_name> <query>ecoli:EG11025</query> <num_results>1</num_results> <PGDB orgid='ECOLI' version='14.6'> <species datatype='string'>Escherichia coli</species> <strain datatype='string'>K-12 substr. MG1655</strain> <dblink> <dblink-db>NCBI-TAXONOMY-DB</dblink-db> <dblink-oid>511145</dblink-oid> <dblink-relationship>unification</dblink-relationship> <dblink-URL>http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=511145</dblink-URL> </dblink> </PGDB> </metadata>

Class Elements

The body of the document is made up of a series of class elements, one for each top-level object in the query results. The class element for an object contains a list of slot elements, representing the attributes of the object. The values of the slots can either be simple data values (such as a string or a number), or references to other objects in the database. Some of these objects will be included in their entirety in the document. Others will just be referred to, requiring an additional query to retrieve their information.

The names of the class elements correspond roughly to the top-level classes in the Pathway Tools schema, using the following mapping (in general, ptools-xml element names are singularized versions of the plural Pathway Tools class names):

ptools-xml Element Name Pathway Tools Class Name
ComplexComplexes
CompoundCompound
DNA-Binding-SiteDNA-Binding-Sites
Enzymatic-ReactionEnzymatic-Reactions
Evidence-CodeEvidence
FeatureProtein-Features
GeneAll-Genes
Genetic-ElementGenetic-Elements
GO-TermGene-Ontology-Terms
MRNA-Binding-SitemRNA-Binding-Sites
OrganismOrganisms
OrganizationOrganizations
PathwayPathways
PersonPeople
ProteinProteins
PromoterPromoters
PublicationPublications
ReactionReactions
RegulationRegulation
RNARNAs
TerminatorTerminators
Transcription-UnitTranscription-Units

Class elements all contain the orgid and frameid attributes. The orgid attribute should match the orgid attribute of one of the PGDB elements in the metadata section, and identifies the database to which the object belongs. The frameid attribute is the internal Pathway Tools unique identifier for the object. The combination of the orgid and frameid attributes is sufficient to uniquely identify the object.

If this is an element that has been inlined, i.e. all the slot-value data is included within the element, then the element tag will include the ID attribute, which is simply a string of the form [orgid]:[frameid]. Alternatively, if this is an element that does not include all its data, then the element tag will instead include the resource tag, which will provide a link to retrieving the object data. The link can point somewhere within the same document, referencing an element by its ID attribute, or to another URL.

The Pathway Tools schema distinguishes between objects that are classes and objects that are instances. A ptools-xml document can include both kinds of objects. Objects that are classes will have the class attribute set to true.

Examples of class element tags for assorted objects, including examples of both inlined elements and resource elements:

/* An inlined element for a gene object */ <Gene ID='ECOLI:EG11025' orgid='ECOLI' frameid='EG11025'> ... </Gene> /* An inlined element for a protein object */ <Protein ID="ECOLI:TRYPSYN-BPROTEIN" orgid="ECOLI" frameid="TRYPSYN-BPROTEIN"> ... </Protein> /* A resource element for the same protein object when the inlined element appears elsewhere in the same document */ <Protein resource="#ECOLI:TRYPSYN-BPROTEIN" orgid="ECOLI" frameid="TRYPSYN-BPROTEIN"/> /* A resource element for the same protein object when the inlined element is not in the same document */ <Protein resource='getxml?ECOLI:TRYPSYN-BPROTEIN' orgid='ECOLI' frameid='TRYPSYN-BPROTEIN'/> /* A resource element for a class object */ <Gene resource='getxml?ECOLI:BC-1.5.1.15' orgid='ECOLI' frameid='BC-1.5.1.15' class='true'/>

Slot Elements

Each slot (attribute) in an object has its own element type. If a slot has multiple values, then in some cases each value will be enclosed in its own slot element, whereas in other cases a single slot element encloses a sequence of all value elements (this can only be the case when all values are themselves objects in the database). Consult the XMLSchema Document to determine which is the case for any given slot.

In most cases, the name of the slot element is identical to the name of the corresponding slot in the Pathway Tools schema. There are a handful of cases, however, in which the names differ. These are listed in the following table:

ptools-xml Element Name Pathway Tools Slot Name
alternative-cofactoralternative-cofactors
alternative-substratealternative-substrates
authorauthors
citationcitations
cofactorcofactors
cofactor-or-prosthetic-groupcofactors-or-prosthetic-groups
componentcomponents
dblinkdblinks
evidencecitations
has-featurefeatures
has-go-termgo-terms
intron-or-removed-segmentsplice-form-introns
locationlocations
n-plus-1-namen+1-name
pathway-linkpathway-links
pgdb-authorpgdb-authors
prosthetic-groupprosthetic-groups
reaction-orderingpredecessors
synonymsynonyms

One slot element, parent, is not actually a slot in the underlying schema at all. Many of the top-level classes in Pathway Tools organize their instances into a hierarchical ontology. Thus, a particular pathway might be represented with a Pathway element, but its actual direct parent might be the class of all cofactor biosynthesis pathways. In such a case, if an object is not a direct child of its element class (i.e. a direct child of the Pathways class, Proteins class, etc.), then the parent element will link to its actual parent class. Similarly, if we are retrieving the data for a class object that has its own subclasses and instances, the subclass and instance elements will link to those.

Other Miscellaneous Notes

In general, the ptools-xml output maps very closely to the underlying object representation within Pathway Tools. One exception to this, however, is for Compound objects. Rather than output the internal Pathway Tools representation of a compound structure, we instead use the more well-known CML format.

Some slot values in Pathway Tools are neither simple datatypes nor object references, but rather a structured list (or string) which is interpreted a certain way by the Pathway Tools software. Examples include the dblinks slot, whose values are a structured list indicating the linked-to database, the object identifier in that database, and the relationship type, or the citations slot, whose values are a string that optionally incorporates a PubMed identifier and/or an evidence code and supporting information. In these cases, we create structured elements, specifically for ptools-xml, that directly encode the meaning of the different fields. Slots for which this is done are the following: alternative-cofactors, alternative-substrates, citations, credits, dblinks, isozyme-sequence-similarity, km, reaction-layout, pathway-links, predecessors, and splice-form-introns. Note that the citations slot can be represented by either a citation element or an evidence element, depending on whether or not an evidence code is associated with the citation.

Errors

If a request is received for an object that does not exist (and note that object identifiers are case-sensitive), an Error element is generated instead of a class element. For example, requesting http://biocyc.org/getxml?ecoli:xyz (no object with the identifier "xyz" currently exists in EcoCyc) will generate a ptools-xml document with the normal metadata, containing one result, but the result will be an error element:

<Error orgid="ECOLI" frameid="xyz"/>

However, at this time, issuing a BioVelo query that generates an error (meaning a malformed query, not just a query that returns no results) may not cause a well-formed ptools-xml document to be generated -- more typically it will redirect to a web page that describes the problem with the query.