首页 > 专利学习

Experience using OWL DL for the exchange of biological pathway information, OWL Experiences

Experience Using OWL DL for the Exchange of

Biological Pathway Information

Alan Ruttenberg1, Jonathan A. Rees2, Joanne S. Luciano3

1 Millennium Pharmaceuticals, Inc., Cambridge,

Massachusetts 02139 USA

alanr@pathways.mumble

2 CSAIL, Massachusetts Institute of Technology, Cambridge,

Massachusetts 02139 USA

jar@csail.mit.edu

3 Department of Genetics, Harvard Medical School Boston,

Massachusetts 02214 USA

d.harvard.edu

Abstract.We report on experiences using OWL DL in the design of an ex-

change format for biological pathway information. Although the working group

charged with this task was not initially very familiar with OWL and knew that

the technology around OWL wasn't mature, they chose it because of its ability

to express complex relationships in a formal and computable manner. The sub-

sequent journey has not been smooth. Delightful discoveries about OWL have

alternated with surprises about how difficult it is to operate correctly inside

open world description logics and the Semantic Web generally. This paper high-

lights experience that may be of interest to the OWL community, including on-

tology developers, tool developers, and those interested in promoting the adop-

tion of the Semantic Web.

1Introduction

In 2001 the biomolecular pathways research community rallied around the idea of creating an open pathways resource akin to GenBank [1], the hugely successful com-munity resource for genetics. The resource would collect pathway information, that is, information about interactions among biological entities and their effects on larger biological phenomena. Such a resource would require a common format for represen-tation and transmission of pathway information, so a working group formed to de-velop such a format. The initial working group consisted mostly of representatives of diverse and already mature data curation and compilation efforts: BioCyc [2], WIT [3] (now Puma2), and BIND [4]. Later it grew to include several more pathway data sources, parties interested in biological knowledge representation, and users and inte-grators of pathway information [5].

For background, and to illustrate various issues involving OWL [6] use and adop-tion, we will discuss some activities of the working group. However, the views ex-pressed in this paper are those of the authors and do not necessarily represent those of the working group.

直流系统绝缘监测装置

Several design criteria guide the development of the exchange format. It is to be a machine computa

ble formal representation to enhance the utility of the data and en-able reasoning. It should interface with existing standards to enable interoperability. It should be extensible in order to have the capacity to evolve with scientific knowledge. It should support expressive new curation adequate to represent the pathway knowl-edge expressed in scientific papers. Finally, because each participating source of pathway information represents its descriptions using its own semantics and data format, the common format should be suitable as a translation target for existing data.

Few in the working group had any prior stake in RDF [7], OWL, description logics, standards projects, or the Semantic Web. The focus was on exchanging application-level information; for many, work on the exchange format was a necessary evil.

The group was not initially very familiar with OWL, but after a one-day tutorial [8] it was sufficiently impressed by OWL’s merits to take it seriously as a specification vehicle. OWL’s ability to express complex relationships and constraints was judged a match to the group’s goals, and they chose OWL DL (over XML Schema) as its on-tology framework. The decision was not without controversy; XML Schema was favored by some because of its already wide adoption and abundance of tools. In the end OWL won because of its expressiveness and the expectation that if adopted by the W3C [9] tools and wide acceptance would follow.

The group initially used OWL as if it were like any of the other schema definition tools such as relational databases and XML Schema. There were certain expectations taken from these tools, such as the closed world assumption. In particular, they ex-pected to simply invent a new, federated schema that unified common elements of the schemas of the existing data sources and was similar in kind to the schemas of the existing data sources. Some use of OWL’s new features was expected, even though it wasn’t clear how, when, or whether to use them. But no one was considering radical change relative to the way the existing schemas had been built, such as mapping data records to classes instead of instances.

书法教学系统

In this paper we document a variety of issues that we hope contribute to the ongo-ing discussion of the use of ontologies in the context of the Semantic Web.

2Ambushed by the open world assumption

The open world assumption says that anything not known to be true or false might become so as a result of new information. There are positive and negative aspects of OWL’s open world assumption with respect to the stated design criteria. On the posi-tive side, the open world assumption seems particularly fitting in a domain that is characterized by information that is incomplete either because o

f limits in the state of knowledge or omissions inherent in curation processes. One can imagine a scenario in which this partial knowledge is augmented by subsequent contributors, in line with the goals of the Semantic Web.

On the negative side, the open world assumption has generated problems that were not anticipated:雪莲生发液

No way to require that information be supplied. Sometimes information that is to be exchanged cannot, by its nature, be reconstructed or added to. Consider a reference

to a paper. Currently this is represented by a pair of string valued properties: database name and database identifier. One wants to say that that each of these properties needs to have values if one is to make any sense of the reference. OWL can express some-thing like this using minCardinality constraints. However, if one of the properties doesn’t have a value, no OWL validator will complain, since under the open world assumption, the property could be asserted later. But consider the task of annotating that an interaction between two proteins was noted in a particular journal article. If one says that the database is PubMed but doesn't fill in the article identifier, one can-not identify the article. What one wants is the ability to express that within a given scope, certain restrictions must be verifiable with the assertions expressed. That way one could express, for instance, that within the

assertions in a single file, or at a single URL, any reference to a publication must have values for both the publication and identifier properties. Note that while this corresponds to "closing the world" over the specified scope, there is no requirement that it stay closed, nor that it affect the se-mantics of the document outside the scope.

No convenient way to assert that information is complete.On the other side of the open world assumption is the situation where we have a property whose value is com-pletely known. For example, in the specification of an instance of some protein com-plexes, we want to assert that we have listed all the components of the complex. In order to do this we need to "close" the components role, making such a complex an instance of a restriction of a cardinality constraint on the components1. In Protégé, for instance, there is no convenient way to assert such a constraint.

Unique name assumption difficult to understand and maintain. Removing the unique name assumption is a useful idea on the greater Semantic Web, where it is likely that people can name the same concept in different ways. However, within a single source of information we generally know that different names name different objects. It is inconvenient to maintain all the differentFrom assertions as a document evolves. It is also tricky to assess the consequences of getting this wrong. This is another case where the concept of scope might be useful, specifically the ability to assert that

all names within a scope represent different things.

Novices are confused about properties that are not asserted. For example, in the description of a chemical reaction there is a property for stoichiometry (multiplicity of a reactant). Since the most common case is that stoichiometry is 1, it was suggested that in order to make the documents less verbose that an unasserted stoichiometry would be taken to mean 1. However, in OWL an unasserted property means that the value is unknown. This was a surprise to most of the group. While we would like to propose some technical fix for this, we can’t think of one.

1 Closing a role by adding a property restriction type to an instance:

Individual(Instance type(complex) type(restriction(component cardinality(2)))

闸道机3Using other ontologies

There has been substantial prior work on developing ontologies relevant to repre-senting pathway information and the exchange format would like to be able to take advantage of this work. For example, post-translational modifications are described in RESID [10] and portions of PSI MI [11], while cellular locations are described in portions of the Gene Ontology (GO) [12]. How these external entities to be used in the ontology?

Few of these ontologies are provided as OWL DL. Currently, terms from such on-tologies are represented as values of two properties, one giving a name for the vo-cabulary from which the term was taken, and the other identifying the term in the vocabulary. Unfortunately for the Semantic Web, neither the terms nor the names of the vocabularies are URIs. Moreover, external terms are not just meaningless data; some understanding of them is required for reasoning and validation. Representing terms in this way, semantic relations, such as the containment relationship between cellular locations, are lost. Some properties should be restricted to particular classes in GO; a property denoting a cellular location cannot be filled with a term which is a subclass of molecular function.

An alternative approach would be to first create OWL versions of the needed on-tologies and then import them, thereby making all available information directly ac-cessible. As an experiment one of the authors (AR) wrote translators to convert the relevant portions of PSI MI to OWL DL. We first identified a portion of the vocabu-lary that would be used to annotate post-translational modifications, namely the terms in the hierarchy below MI:0120other than MI:0179. The is_a relationships were translated into subclass relations in OWL. Annotation properties were used to record additional information about the terms, such as synonyms, English definitions, and identifiers.

生产数据采集

Another question is the treatment of changes to the external ontology. If we choose to have references to terms in the external ontology, we may be left with incorrect identifiers in our documents when terms in the external ontology are deleted or depre-cated. This is particularly an issue with the rapidly changing Gene Ontology. How-ever, if we translate and import an external ontology, new and changed terms will not be available for use until we update our translation. On the other hand, a user of our ontology will benefit from the stability of knowing the potential term set in advance. 4Getting validation

The typical software engineer expects a rapid edit-compile-debug development cycle, and on starting to work with OWL, one expects to be able to iterate in a similar man-ner. In place of compilation one would like to check that the file is formatted cor-rectly, that the definitions make sense, and that the inferences one expects to make can in fact be made. Unfortunately one is immediately hindered by the inability to reliably do so.

Checking that the file is formatted correctly and that the definitions make sense is the role of a validator. One expects a validator to assess whether the file complies

with the specification and to generate specific detailed reports when it doesn’t. Not having such a tool makes it difficult for data providers to check whether their code is generating correct OWL.

Checking that the inferences one expects to make can in fact be made is the func-tion of a reasoner. For OWL DL we expect that a reasoner is able to test whether the ontology (including both classes and instances) is consistent, to respond to queries asking for equivalences, superclasses and subclasses of a given class, what instances are members of a class, what the classes of an instance are, and what the values of properties are. Not having a reasoner makes it difficult for the novice ontologist to check whether they understand the implications of their modeling choices. Without a reasoner one cannot build clients of the exchange format that can take advantage of the promised expressiveness of OWL. Since a good validator must make some use of a reasoner, lack of a reasoner hinders efforts to build a robust validator.

In trying to find reasoners and validators we first checked the OWL test site [13], which was not reassuring. Based on the test results presented there, it seems that a reasoner that is complete with respect to OWL DL does not yet exist.

We reviewed some of the available tools, using the most recent versions available when we did the evaluation in mid July, 2005: Protégé [14], SWOOP [15] with the Pellet [16] reasoner, Racer Pro [17] (both as a DIG [18] server for Protégé and as a standalone application), the BBN OWL Validator (vOWLidator) [19], and FaCT [20,21]. In response to reviewer’s comments we also reviewed the OW

L API [22], the WonderWeb OWL Ontology Validator [23] and the Pellet reasoner (standalone) using the versions available at the beginning of October, 2005. All these systems had issues.

First we explored validation and reasoning using Protégé. Protégé does some rea-soning on its own and also provides an interface to external DIG reasoners. Protégé's native validation and reasoning support is spotty. It doesn't do subsumption reasoning. It does do some role reasoning, such as inferring the values of properties when sub-Property values are asserted, but it doesn't mark inferred values distinctly in the inter-face, and doesn't serialize them to the saved OWL file. We think this patchwork ap-proach to reasoning support will be confusing to the general user.

Using external reasoners from Protege is unsatisfactory because the DIG protocol doesn't support some constructs available in OWL DL, so one gets many spurious warnings, leading one to question the completeness of the validation. In fact it isn't. Consider the following ontology:

DatatypeProperty(Property1 range(xsd:string))

Class (Class1 partial)

Class (Class2 partial Class1 restriction(Property1 minCardinality(1))) Class (Class3 partial Class2 restriction(Property1 maxCardinality(0)))

When we check ontology consistency we get the message Not able to convert datatype property cardinality restrictions to DIG (the language used to communicate with the reasoner). Ignoring this restriction and attempting to continue. Because of this, Protégé is not able to detect that Class3 is inconsistent. We tested this both with FaCT++ and the Pellet reasoner in DIG mode in late September, 2005. The Pellet web form, which accepts OWL directly, correctly notes the inconsistency.

天一辉远SWOOP was not particularly robust. Enabling the reasoner while working on an ontology with an inconsistency often caused application errors that could not be re-

covered from. The debugging alpha version that we used did supply us, in one case, with a chain of assertions that supposedly led to an inconsistency. However, it was difficult to follow the logic, and as the inconsistency was not noted by either Racer or FaCT, we assumed that it was spurious. More detail can be found on the BioPAX wiki [24].

The Pellet reasoner, used as a standalone tool, looks very promising. In a recent test we found it useful in validating and debugging a large set of instances (several megabytes), issuing informative comments describing problems. It is not without limitations. In the days before finishing this paper, w

e identified two issues. To the credit of the developers, these were promptly fixed. However we are still able to find examples which provoke incorrect behavior. The following example is incorrectly classified as OWL DL. It is OWL FULL because of the cardinality constraint on the transitive property part_of.

ObjectProperty(part_of Transitive domain(Class1) range(Class1))

Class(Class1 partial restriction(part_of cardinality(1)))

vOWLidator does not recognize oneOf dataRange restrictions, and so it generates many spurious complaints that need to be examined and filtered out in order to find useful warnings. It doesn't check certain RDF/XML requirements such as the need for a data type on a property value whenever the property’s range is restricted to a certain data type. When errors or warnings are reported, the notes often refer to the internal identifiers of blank nodes, which makes it difficult to find the source of the error in the ontology.

Racer Pro seemed robust and reliable at detecting inconsistencies and errors in some large OWL files. However, whereas it was able to detect inconsistencies (even in a property data type), it didn't report anything more than that the file was inconsis-tent. This made it difficult to find the source of th

e error. As we were trying to check an 18MB file containing the pathway content from HumanCyc [25], this wasn't very useful. Finally, Racer Pro is a commercial product. While some free licenses are available, they come under terms that were not satisfied by all members of the group.

FaCT is listed as an OWL DL reasoner. We downloaded the open source Common Lisp implementation hoping to use that. However, it has no defined OWL support, nor were we able to find a publication that showed how to translate even OWL DL TBox (class) reasoning into the API used by FaCT. We used Wilbur [26] to read the OWL RDF and wrote code (probably buggy) that translated the OWL primitives into the FaCT API and did get some useful information from it – the detection of an unsat-isfiable class caused by multiple inheritance from two disjoint classes. Since we had access to the source, we were able to turn on debugging switches to more easily iden-tify the source of the problem. However, since FaCT only supports TBox reasoning, we were unable to use it to validate any of our pathway data which primarily consists of instances.

Ian Horrocks pointed us to FaCT++ [27] as the current incarnation of FaCT, suit-able for OWL reasoning. However, FaCT++ is described as a reasoner for OWL Lite,

本文发布于:2024-09-21 00:35:44，感谢您对本站的认可！

本文链接：https://www.17tex.com/tex/1/204672.html

上一篇：人防楼梯计算

下一篇：胰岛素单位换算

标签：装置采集教学系统绝缘

留言与评论（共有 0 条评论）