|
 |
[Semantic Web]Powerful (Soft) Semantics |
Lee 发表于 2006/5/29 10:28:36 |
The statistical analysis of data allows the exploration of relationships that are not explicitly stated. Statistical techniques give us great insight into a corpus of documents or a large collection of data in general, when a program exists that can actually “pose the right questions to the data,” that is, analyze the data according to our needs. All derived relationships are statistical in nature, and we only have an idea or a likelihood of their validity.
The above-mentioned formal knowledge representation techniques give us certainty that the derived knowledge is correct, provided the explicitly stated knowledge was correct in the first place. Deduction is truth preserving. Another positive aspect of a formal representation is its
universal usability. Every system that adheres to a certain representation of knowledge will understand, and a well-founded formal semantics guarantees that the expressed statements are interpreted the same way on every system. The restriction of expressiveness to a subset of FOL also allows the system to verify the consistency of its knowledge.
But here also lies the crux of this approach. Even though it is desirable to have a consistent knowledge base, it becomes impractical as the size of the knowledge base increases or as knowledge from many sources is added. It is rare that human experts in most scientific domains have a full and complete agreement. In these cases it becomes more desirable that the system can deal with inconsistencies.
Sometimes it is useful to look at a knowledge base as a map. This map can be partitioned according to different criteria, for example, the source of the facts or their domain. While on such a map the knowledge is usually locally consistent, it is almost impossible and practically infeasible to maintain a global consistency. Experience in developing the Cyc ontology demonstrated this challenge. Hence, a system must be able to identify sources of inconsistency
and deal with contradicting statements in such a way that it can still produce
derivations that are reliable.
In the traditional bivalent-logic-based formalisms, we — that is, the users or the systems — have to make a decision. Once two contradictory statements are identified, one has to be chosen as the right one. While this is possible in domains that are axiomatized, fully explored, or in which statements are true by definition, it is not possible for most scientific domains. In the life sciences, for instance, hypotheses have to be evaluated, contradicting statements have promoting data, and so forth. Decisions have to be deferred until enough data is available that either verifies or falsifies the hypothesis.
Nevertheless, it is desirable to express these hypotheses formally to have means to computationally evaluate them on the one hand and to exchange them between different systems on the other. In order to allow the sort of reasoning that would allow this, the expressiveness of the formalism needs to be increased. It is known that increasing the expressive power of a KR language causes problems relating to computability. This has been the main reason for limiting the expressive power of KR languages. The real power behind human reasoning, however, is the ability to do so in the face of imprecision, uncertainty, inconsistencies, partial truth, and approximation. There have been attempts made in the past at building KR languages that allow such expressive power.
Major approaches to reasoning with imprecision are: (1) probabilistic reasoning, (2) possibilistic reasoning (Dubois, Lang, & Prade, 1994), and (3) fuzzy reasoning. Zadeh (2002) proposed a formalism that combines fuzzy logic with probabilistic reasoning to exploit the merits of both approaches.
Other formalisms have focused on resolving local inconsistencies in knowledge bases, for instance the works of Blair, Kifer, Lukasiewicz, Subrahmanian, and others in annotated logic and paraconsistent logic (see Kifer & Subrahmanian, 1992; Blair & Subrahmanian, 1989). Lukasiewicz (2004) proposes a weak probabilistic logic and addresses the problem of inheritance. Cao (2000) proposed an annotated fuzzy logic approach that is able to handle inconsistencies and imprecision; Straccia (e.g., 1998, 2004) has done extensive work on fuzzy description logics. With P-CLASSIC, Koller, Levi, and Peffer (1997) presented an early approach to probabilistic description logics implemented in Bayesian Networks.
Other probabilistic description logics have been proposed by Heinsohn (1994) and Jaeger (1994). Early research on Bayesian-style inference on OWL was done by Ding and Peng (2004). In her formalism, OWL is augmented to represent prior probabilities.
However, the problem of inconsistencies arising through inheritance of probability values (see Lukasiewicz, 2004) is not taken into account. The combination of probabilistic and fuzzy knowledge under one representation mechanism proposed in Zadeh (2002) appears to be a very promising approach. Zadeh argues that fuzzy logics and probability theory are “complementary rather than competitive.” Under the assumption that humans tend to linguistically categorize a continuous world into discrete classes, but in fact still perceive it as continuous, fuzzy set theory classifies objects into sets with fuzzy boundaries and gives objects degrees of set membership in different sets.
Hence it is a way of dealing with a multitude of sets in a computationally tractable way that also follows the human perception of the world. Fuzzy logic allows us to blur artificially imposed boundaries between different sets. The other powerful tool in soft computing is probabilistic reasoning. Definitely in the absence of complete knowledge of a domain and probably even in its presence, there is a degree of uncertainty or randomness in the ways we see real-world entities interact. OWL as a description language is meant to explicitly represent knowledge and to deductively derive implicit knowledge. In order to use a similar formalism as a basis for tools that help in the derivation of new knowledge, we need to give this formalism the ability to be used in abductive or inductive reasoning.
Bayesian-type reasoning is a way to do abduction in a logically feasible way by virtue of applying probabilities. In order to use these mechanisms, the chosen formalism needs to express probabilities in a meaningful way, that is, a reasoner must be able to meaningfully interpret the probabilistic relationships between classes and between instances. The same holds for the representation of fuzziness. The formalism must give a way of defining classes by their membership functions.
A major drawback of logics dealing with uncertainties is the required assignment of prior probabilities and/or fuzzy membership functions. Obviously, there are two ways of doing that — manual assignment by domain experts and automatic assignment using techniques such as machine learning. Manual assignments require the domain expert to assign these values to every class and every relationship. This assignment will be arbitrary, even if the expert has profound knowledge of the domain.
Automatic assignments of prior values require a large and representative dataset of annotated instances, and finding or agreeing on what is a representative set is difficult or at times impossible. Annotating instances instead of categorizing them in a top-down approach is tedious and time consuming. Often, however, the probability values for relationships can be obtained from the dataset using statistical methods, thus we categorize these relationships as implicit semantics.
Another major problem here is that machine learning usually deals with flat categories rather than with hierarchical categorizations. Algorithms that take these hierarchies into account need to be developed. Such an algorithm needs to change the prior values of the superclasses according to the changes in the subclasses, when necessary. Most likely, the best way will be a combination of both, when the domain expert assigns prior values that have to be validated and refined using a testing set from the available data.
In the end, powerful semantics will combine the benefits of both worlds: hierarchical composition of knowledge and statistical analysis; reasoning on available information, but with the advantage over statistical methods that it can be formalized in a common language and that general purpose reasoners can utilize it, and with the advantage over traditional formal DL representation that it allows abduction as well as induction in addition to deduction.
It might be argued that more powerful formalisms are already under development, such as SWRL (Straccia, 1998), which works on top of OWL. These languages extend OWL by a function-free subset of first-order logics, allowing the definition of new rules in the form of Horn clauses. The paradigm is still that of bivalent FOLs, and the lack of function symbols makes it impossible to define functions that can compute probability values. Furthermore, SWRL is undecidable. We believe that abilities to express probabilities and fuzzy membership functions, as well as to cope with inconsistencies, are important. It is desirable (and some would say necessary) that the inference mechanism is sound and complete with respect to the semantics of the formalism and the language is decidable. Straccia (1998) proves this for a restricted fuzzy DL; Giugno and Lukasiewicz (2002) prove soundness and completeness for the probabilistic description logic formalism P-SHOQ(D).
So far, this powerful semantic and soft computing research has not been utilized in the context of developing the Semantic Web. In our opinion, for this vision to become a reality, it will be necessary to go beyond RDFS and OWL, and work towards standardized formalisms that support powerful semantics.
|
|
| |
 | |
|
Blog 信 息 |
blog名称:风落沙 日志总数:348 评论数量:550 留言数量:52 访问次数:1605613 建立时间:2005年1月28日 |
|
友 情 连 接 |

|
|
|

| |
|