新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     >>W3CHINA.ORG讨论区<<     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> 本版讨论Semantic Web(语义Web,语义网或语义万维网, Web 3.0)及相关理论,如:Ontology(本体,本体论), OWL(Web Ontology Langauge,Web本体语言), Description Logic(DL, 描述逻辑),RDFa,Ontology Engineering等。
    [返回] W3CHINA.ORG讨论区 - 语义网·描述逻辑·本体·RDF·OWLW3CHINA.ORG讨论区 - Web新技术讨论『 Semantic Web(语义Web)/描述逻辑/本体 』 → Our vision of semantic Web search 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 120214 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: Our vision of semantic Web search 举报  打印  推荐  IE收藏夹 
       本主题类别: Ontology Language | RDF/RDFS | XML文档存取技术(DOM, SAX)    
     whfcarter 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      等级:计算机学士学位(贵宾)
      文章:143
      积分:2145
      门派:XML.ORG.CN
      注册:2005/3/8

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给whfcarter发送一个短消息 把whfcarter加入好友 查看whfcarter的个人资料 搜索whfcarter在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看whfcarter的博客楼主
    发贴心情 Our vision of semantic Web search

    We have proposed a layer cake for semantic web search. It includes four layers from bottom to up as follows:

    Knowledge Engineering Layer focuses on how to create semantic data. It includes knowledge annotation, knowledge extraction and knowledge fusion. In particular, we investigate collaborative annotation based on Wiki-technologies. Moreover, we pay much attention to automatically extract semantic data from Web 2.0 social corpus (e.g. Wikipedia, Del.icio.us).

    Indexing and Search Layer focuses on semantic data management. It includes scalable triple store design for the data Web. It further considers building suitable indices on top of those triple stores for fast lookup or query processing. Additionally, it integrates database and information retrieval perspective for efficient and effective search engines.

    Query Interface and User Interaction Layer focuses on usability issues of semantic search. It includes adapting different query interfaces (i.e. keyword interface, natural language interface) for semantic search. It aims at interpreting user queries into potential system queries with respect to the underlying semantic data. Furthermore, it involves faceted browsing to ease the process of expressing complex information needs from end users.

    These basic infrastructures enable us to build more intelligent applications. For example, we can provide semantic services for Wikipedia. We can exploit semantic technologies for e-tourism, semantic portal, life science and personal information management as well.

    In the Knowledge Engineering Layer, we have published the following work (2007 - 2008)

    Making More Wikipedians: Facilitating Semantics Reuse for Wikipedia Authoring
    Published in the 6th International Semantic Web Conference (ISWC 2007)

    Abstract
    Wikipedia, a killer application in Web 2.0, has embraced the power of collaborative editing to harness collective intelligence. It can also serve as an ideal Semantic Web data source due to its abundance, influence, high quality and well structuring. However, the heavy burden of up-building and maintaining such an enormous and ever-growing online encyclopedic knowledge base still rests on a very small group of people. Many casual users may still feel difficulties in writing high quality Wikipedia articles. In this paper, we use RDF graphs to model the key elements in Wikipedia authoring, and propose an integrated solution to make Wikipedia authoring easier based on RDF graph matching, expecting making more Wikipedians. Our solution facilitates semantics reuse and provides users with: 1) a link suggestion module that suggests internal links between Wikipedia articles for the user; 2) a category suggestion module that helps the user place her articles in correct categories. A prototype system is implemented and experimental results show significant improvements over existing solutions to link and category suggestion tasks. The proposed enhancements can be applied to attract more contributors and relieve the burden of professional editors, thus enhancing the current Wikipedia to make it an even better Semantic Web data source.

    PORE: Positive-Only Relation Extraction from Wikipedia Text
    Published in the 6th International Semantic Web Conference (ISWC 2007)

    Abstract
    Extracting semantic relations is of great importance for the creation of the Semantic Web content. It is of great benefit to semi-automatically extract relations from the free text of Wikipedia using the structured content readily available in it. Pattern matching methods that employ information redundancy cannot work well since there is not much redundancy information in Wikipedia, compared to the Web. Multi-class classification methods are not reasonable since no classification of relation types is available in Wikipedia. In this paper, we propose PORE (Positive-Only Relation Extraction), for relation extraction from Wikipedia text. The core algorithm B-POL extends a state-of-the-art positive-only learning algorithm using bootstrapping, strong negative identification, and transductive inference to work with fewer positive training examples. We conducted experiments on several relations with different amount of training data. The experimental results show that B-POL can work effectively given only a small amount of positive training examples and it significantly out per forms the original positive learning approaches and a multi-class SVM. Furthermore, although PORE is applied in the context of Wikipedia, the core algorithm B-POL is a general approach for Ontology Population and can be adapted to other domains.

    An Unsupervised Model for Exploring Hierarchical Semantics from Social Annotations
    Published in the 6th International Semantic Web Conference (ISWC 2007)

    Abstract
    This paper deals with the problem of exploring hierarchical semantics from social annotations. Recently, social annotation services have become more and more popular in Semantic Web. It allows users to arbitrarily annotate web resources, thus, largely lowers the barrier to cooperation. Furthermore, through providing abundant meta-data resources, social annotation might become a key to the development of Semantic Web. However, on the other hand, social annotation has its own apparent limitations, for instance, 1) ambiguity and synonym phenomena and 2) lack of hierarchical information. In this paper, we propose an unsupervised model to automatically derive hierarchical semantics from social annotations. Using a social bookmark service Del.icio.us as example, we demonstrate that the derived hierarchical semantics has the ability to compensate those shortcomings. We further apply our model on another data set from Flickr to testify our model’s applicability on different environments. The experimental results demonstrate our model’s efficiency.

    Catriple: Extracting Triples from Wikipedia Categories
    Published in the 3rd Asian Semantic Web Conference (ASWC 2008)

    Abstract
    As an important step towards bootstrapping the Semantic Web, many efforts have been made to extract triples from Wikipedia because of its wide coverage, good organization and rich knowledge. One kind of important triples is about Wikipedia articles and their non-isa properties, e.g. (Beijing, country, China). Previous work has tried to extract such triples from Wikipedia infoboxes, article text and categories. The infobox-based and text-based extraction methods depend on the infoboxes and suffer from low article coverage. In contrast, the category-based extraction methods exploit the widespread categories. However, they rely on predefined properties. It is too effort-consuming and explores only very limited knowledge in the categories. This paper automatically extracts properties and triples from the less explored Wikipedia categories so as to achieve wider article coverage with less manual effort. We manage to realize this goal by utilizing the syntax and semantics brought by super-sub category pairs in Wikipedia. Our prototype implementation outputs about 10M triples with a 12-level confidence ranging from 47.0% to 96.4%, which cover 78.2% of Wikipedia articles. Among them, 1.27M triples have confidence of 96.4%. Applications can on demand use the triples with suitable confidence.

    In the indexing and Search Layer, we have published the following work (2007 - 2008)

    SOR: a practical system for ontology storage, reasoning and search
    Published in the 33rd International Conference on Very Large Data Bases (VLDB 2007)
    Abstract
    Ontology, an explicit specification of shared conceptualization, has been increasingly used to define formal data semantics and improve data reusability and interoperability in enterprise information systems. In this paper, we present and demonstrate SOR (Scalable Ontology Repository), a practical system for ontology storage, reasoning, and search. SOR uses Relational DBMS to store ontologies, performs inference over them, and supports SPARQL language for query. Furthermore, a faceted search with relationship navigation is designed and implemented for ontology search. This demonstration shows how to efficiently solve three key problems in practical ontology management in RDBMS, namely storage, reasoning, and search. Moreover, we show how the SOR system is used for semantic master data management.

    Effective and Efficient Semantic Web Data Management on DB2
    Published in the 27th International Conference on Management of Data (SIGMOD 2008)
    Abstract
    With the fast growth of Semantic Web, more and more RDF data and ontologies are created and widely used in Web applications and enterprise information systems. It is reported that the W3C Linking Open Data community project consists of over two billion RDF triples, which are interlinked by about three million RDF links. Recently, efficient RDF data management on top of relational databases gains particular attentions from both Semantic Web community and database community. In this paper, we present effective and efficient Semantic Web data management over DB2, including efficient schema and indexes design for storage, practical ontology reasoning support, and an effective SPARQL-to-SQL translation method for RDF query. Moreover, we show the performance and scalability of our system by an evaluation among well-known RDF stores and discuss future work.

    CE2 – Towards a Large Scale Hybrid Search Engine with Integrated Ranking Support
    Published in the 17th Conference on Information and Knowledge Management (CIKM 2008)

    Abstract
    The Web contains a large amount of documents and increasingly, also semantic data in the form of RDF triples. Many of these triples are annotations that are associated with documents. While structured query is the principal mean to retrieve semantic data, keyword queries are typically used for document retrieval. Clearly, a form of hybrid search that seamlessly integrates these formalisms to query both documents and semantic data can address more complex information needs. In this paper, we present CE2, an integrated solution that leverages mature database and information retrieval technologies to tackle challenges in hybrid search on the large scale. For scalable storage, CE2 integrates database with inverted indices. Hybrid query processing is supported in CE2 through novel algorithms and data structures, which allow for advanced ranking schemes to be integrated more tightly into the process. Experiments conducted on Dbpedia and Wikipedia show that CE2 can provide good performance in terms of both effectiveness and efficiency.

    Semplore: An IR Approach to Scalable Hybrid Query of Semantic Web Data
    Published in the 6th International Semantic Web Conference (ISWC 2007)
    Abstract
    As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR engines in an efficient and scalable manner. We implemented this IR approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we briefly describe how Semplore is used for searching Wikipedia and an IBM customer’s product information.

    Efficient Index Maintenance for Frequently Updated Semantic Data
    Published in the 3rd Asian Semantic Web Conference (ASWC 2008)
    Abstract
    Nowadays, the demand on querying and searching the Semantic Web is increasing. Some systems have adopted IR (Information Retrieval) approaches to index and search the Semantic Web data due to its capability to handle the Web-scale data and efficiency on query answering. Additionally, the huge volumes of data on the Semantic Web are frequently updated. Thus, it further requires effective update mechanisms for these systems to handle the data change. However, the existing update approaches only focus on document. It still remains a big challenge to update IR index specially designed for semantic data in the form of finer grained structured objects rather than unstructured documents. In this paper, we present a well-designed update mechanism on the IR index for triples. Our approach provides a flexible and effective update mechanism by dividing the index into blocks. It reduces the number of update operations during the insertion of triples. At the same time, it preserves the efficiency on query processing and the capability to handle large scale semantic data. Experimental results show that the index update time is a fraction of that by complete reconstruction w.r.t. the portion of the inserted triples. Moreover, the query response time is not notably affected. Thus, it is capable to make newly arrived semantic data immediately searchable for users.

    In the Query Interface and User Interaction Layer, we have the following work (2007 - 2008)

    PANTO: A Portable Natural Language Interface to Ontologies
    Published in the 4th European Semantic Web Conference (ESWC 2007)

    Abstract
    Providing a natural language interface to ontologies will not only offer ordinary users the convenience of acquiring needed information from ontologies, but also expand the influence of ontologies and the semantic web consequently. This paper presents PANTO, a Portable nAtural laNguage inTerface to Ontologies, which accepts generic natural language queries and outputs SPARQL queries. Based on a special consideration on nominal phrases, it adopts a triple-based data model to interpret the parse trees output by an off-the-shelf parser. Complex modifications in natural language queries such as negations, superlative and comparative are investigated. The experiments have shown that PANTO provides state-of-the-art results.

    SPARK: Adapting Keyword Query to Semantic Search
    Published in the 6th International Semantic Web Conference (ISWC 2007)

    Abstract
    Semantic search promises to provide more accurate result than present-day keyword search. However, progress with semantic search has been delayed due to the complexity of its query languages. In this paper, we explore a novel approach of adapting keywords to querying the semantic web: the approach automatically translates keyword queries into formal logic queries so that end users can use familiar keywords to perform semantic search. A prototype system named ‘SPARK’ has been implemented in light of this approach. Given a keyword query, SPARK outputs a ranked list of SPARQL queries as the translation result. The translation in SPARK consists of three major steps: term mapping, query graph construction and query ranking. Specifically, a probabilistic query ranking model is proposed to select the most likely SPARQL query. In the experiment, SPARK achieved an encouraging translation result.

    Q2Semantic: A Lightweight Keyword Interface to Semantic Search
    Published in the 5th European Semantic Web Conference (ESWC 2008)

    Abstract
    The increasing amount of data on the Semantic Web offers opportunities for semantic search. However, formal query hinders the casual users in expressing their information need as they might be not familiar with the query’s syntax or the underlying ontology. Because keyword interfaces are easier to handle for casual users, many approaches aim to translate keywords to formal queries. However, these approaches yet feature only very basic query ranking and do not scale to large repositories. We tackle the scalability problem by proposing a novel clustered-graph structure that corresponds to only a summary of the original ontology. The so reduced data space is then used in the exploration for the computation of top-k queries. Additionally, we adopt several mechanisms for query ranking, which can consider many factors such as the query length, the relevance of ontology elements w.r.t. the query and the importance of ontology elements. The experimental results performed against our implemented system Q2Semantic show that we achieve good performance on many datasets of different sizes.

    Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data
    Published in the 25th International Conference on Data Engineering (ICDE 2009)

    Abstract
    Keyword queries enjoy widespread usage as they represent an intuitive way of specifying information needs. Recently, answering keyword queries on graph-structured data has emerged as an important research topic. The prevalent approaches build on dedicated indexing techniques as well as search algorithms aiming at finding substructures that connect the data elements matching the keywords. In this paper, we introduce a novel keyword search paradigm for graph-structured data, focusing in particular on the RDF data model. Instead of computing answers directly as in previous approaches, we first compute queries from the keywords, allowing the user to choose the appropriate query, and finally, process the query using the underlying database engine. Thereby, the full range of database optimization techniques can be leveraged for query processing. For the computation of queries, we propose a novel algorithm for the exploration of top-k matching sub graphs. While related techniques search the best answer trees, our algorithm is guaranteed to compute all k sub graphs with lowest costs, including cyclic graphs. By performing exploration only on a summary data structure derived from the data graph, we achieve promising performance improvements compared to other approaches.

    Snippet Generation for Semantic Web Search Engines
    Published in the 3rd Asian Semantic Web Conference (ASWC 2008)

    Abstract
    With the development of the Semantic Web, more and more ontologies are available for exploitation by semantic search engines. However, while semantic search engines support the retrieval of candidate ontologies, the final selection of the most appropriate ontology is still difficult for the end users. In this paper, we extend existing work on ontology summarization to support the presentation of ontology snippets. The proposed solution leverages a new semantic similarity measure to generate snippets that are based on the given query. Experimental results have shown the potential of our solution in this problem domain that is largely unexplored so far.

    Making them as a whole

    SearchWebDB: Searching the Billion Triples!
    Published in the 7th International Semantic Web Conference (ISWC 2008)

    Abstract
    In recent years, the amount of structured data in form of triples available on the Web is increasing rapidly and has reached more than one billion. In this paper, we propose an infrastructure for searching the billion triples -- called SearchWebDB -- that integrates data sources publicly available on the web in a way such that users can ask queries against the billion triples through a single interface. Approximate mappings between schemata as well as data elements are computed and stored in several indices. These indices are exploited by a query engine to perform query routing and result combination in an efficient way. As opposed to a standard distributed query engine requiring the use of formal languages, users can ask queries in terms of keywords through SearchWebDB. These keywords are translated to possible interpretations presented as structured queries. Thus, complex information need can be addressed without imposing too much of a burden to the casual users.

    Attached please find the document with a poster for each work. Wish you enjoy it.




       收藏   分享  
    顶(0)
      




    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/13 20:51:00
     
     whfcarter 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      等级:计算机学士学位(贵宾)
      文章:143
      积分:2145
      门派:XML.ORG.CN
      注册:2005/3/8

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给whfcarter发送一个短消息 把whfcarter加入好友 查看whfcarter的个人资料 搜索whfcarter在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看whfcarter的博客2
    发贴心情 
    这篇文章过长,当时也比较偷懒,写的全部是英语,可能对于某些初学者不是很容易阅读,我会将其分不同帖子分别讨论,希望对大家有所帮助。这是对2007年到2008年之间我们组的工作总结,对于我们提出的语义搜索框架作了初步的实现,虽然已经做了很多,但是大家可以发现还有很多没有涉及,同时大家可以从CSWS2008主页上下载俞勇老师的slides,其中对于语义搜索框架的补充以及第二轮实现的初步计划有更多的了解。
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/14 9:10:00
     
     Humphrey 帅哥哟,离线,有人找我吗?狮子座1981-7-23
      
      
      威望:1
      等级:研二(搞定了DL,再搞定F-Logic!)
      文章:937
      积分:5743
      门派:W3CHINA.ORG
      注册:2008/3/12

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给Humphrey发送一个短消息 把Humphrey加入好友 查看Humphrey的个人资料 搜索Humphrey在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看Humphrey的博客3
    发贴心情 
    一共有15篇之多?而且是两年之内?或许是接触时间较短的原因,我还看不出作为一个系列研究,这些文章之间有什么关系。不过能够感受到这个团队在语义搜索领域的分量。希望能看到您的进一步解说,也希望有机会能多和您交流。谢谢!

    ----------------------------------------------
    鸿丰

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/14 10:06:00
     
     whfcarter 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      等级:计算机学士学位(贵宾)
      文章:143
      积分:2145
      门派:XML.ORG.CN
      注册:2005/3/8

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给whfcarter发送一个短消息 把whfcarter加入好友 查看whfcarter的个人资料 搜索whfcarter在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看whfcarter的博客4
    发贴心情 
    前面和iamwym聊起,这里对我们的semantic search做进一步的说明。从狭义的semantic search来说,它就是研究如何对现有的search进行semantic化。而我们这里考虑search的整个生命周期,从crawling->pre-processing->indexing->search。那么在semantic search中,我们不仅有crawling还有extracting,因为相比document来说, semantic数据存在明显的不足,因此为了普及semantic search (无论是semantic data search还是semantic-based document search,甚至hybrid search)都需要考虑这个bootstrap的问题。而对于pre-processing来说原先需要清理html代码,解析HTML code获得dom tree, 抽取hyperlink,以及各种feature (e.g. title, headings ...)。现在我们需要清理semantic data,解析RDF甚至OWL, 可以利用offline reasoning来获得更多facts, 还需要进行data integration (schema level 和data level的)。对于indexing原来需要考虑index terms和documents, 以及根据document生成相关的snippet进行store;现在需要考虑store schema, store data, build indices来支持各种graph pattern的访问(最基本的是triple)。当然search就不用多说了,从document search到object或data search。最后的query interface and user interaction应该相对比较清晰,即支持非sparql之外的如natural language和keyword search,对于user interaction支持faceted browsing,同时考虑结果的展现如snippet的生成等问题。
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/14 21:45:00
     
     Humphrey 帅哥哟,离线,有人找我吗?狮子座1981-7-23
      
      
      威望:1
      等级:研二(搞定了DL,再搞定F-Logic!)
      文章:937
      积分:5743
      门派:W3CHINA.ORG
      注册:2008/3/12

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给Humphrey发送一个短消息 把Humphrey加入好友 查看Humphrey的个人资料 搜索Humphrey在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看Humphrey的博客5
    发贴心情 
    从全局考虑的语义搜索确实如此,不过我还是有些疑惑:如您所说,语义搜索又有广义狭义之分,但是搜索的语义化和从整个语义搜索流程(生命周期?我可以把它理解为流程吧?)考虑似乎并不存在很大的差异。只不过“搜索语义化”可以采用统计学方法来近似的表现出计算机对语义的“理解”;而您所说的搜索周期所体现的就是对文档数据的解析和查询的推理吧?

    ----------------------------------------------
    鸿丰

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/15 8:46:00
     
     whfcarter 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      等级:计算机学士学位(贵宾)
      文章:143
      积分:2145
      门派:XML.ORG.CN
      注册:2005/3/8

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给whfcarter发送一个短消息 把whfcarter加入好友 查看whfcarter的个人资料 搜索whfcarter在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看whfcarter的博客6
    发贴心情 
    "而您所说的搜索周期所体现的就是对文档数据的解析和查询的推理吧?"这只是一部分,我在上面的描述中放在预处理部分。而对于你提到的"只不过“搜索语义化”可以采用统计学方法来近似的表现出计算机对语义的“理解”",我觉得有些狭隘。semantifying的方法很多,既可以用semantic data增加很多metadata甚至推理,从数据的角度出发,利用smart data,而不改变现有的search algorithm,最典型的例子searchmonkey就是这个做法。另外一个是利用无论统计或者NLP等技术来增加对文档本身的理解和对查询本身的理解,那么这里的典型应用就是Hakia。他属于smart application的范畴。当然我们也可以把两者进行结合,典型的例子是Powerset, 利用NLP+freebase的数据。我前面贴子的意思是说我们关心的是从搜索架构出发需要涉及的各个过程(包括offline和online的),而并非只是search这样一个online的过程而已。
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/15 10:16:00
     
     Humphrey 帅哥哟,离线,有人找我吗?狮子座1981-7-23
      
      
      威望:1
      等级:研二(搞定了DL,再搞定F-Logic!)
      文章:937
      积分:5743
      门派:W3CHINA.ORG
      注册:2008/3/12

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给Humphrey发送一个短消息 把Humphrey加入好友 查看Humphrey的个人资料 搜索Humphrey在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看Humphrey的博客7
    发贴心情 
    首先感谢您的答复。您的意思是自然语言处理(NLP)还不是纯统计的方法?而且NLP也不能算作推理类的语义实现方法?
    “用语义数据增加元数据”我头一次听说,RDF/RDF Schema本身不就是表示语义的数据吗?还要把它做成元数据?那么一条元数据得有多大啊!我确实理解不上去了。如果您有时间可以再做些解释吗?谢谢!

    ----------------------------------------------
    鸿丰

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/15 11:46:00
     
     whfcarter 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      等级:计算机学士学位(贵宾)
      文章:143
      积分:2145
      门派:XML.ORG.CN
      注册:2005/3/8

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给whfcarter发送一个短消息 把whfcarter加入好友 查看whfcarter的个人资料 搜索whfcarter在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看whfcarter的博客8
    发贴心情 
    NLP当然不是纯统计的方法,这里有很多基于rule和基于知识表示和推理的。最典型的应用是sowa原来提出的conceptural graph (CG)。当然,近年来,统计的方法很流行,不过这也只是NLP的一个分支,称为计算语言学(computational linguistic)。其次,说一下metadata,故名思意就是对于data的描述的数据。semantic search的一个直观解释是对现有Web search的语义化或performance提高,而现在web search的主要任务是document search,那么我说的利用semantic data作为metadata就是将semantic以annotation的形式对document进行额外的描述,这些semantic data就是document的metadata。这也是yahoo的microsearch和searchmonkey的做法,很自然的处理。当然metadata不一定是semantic data,metadata代表的是对数据的描述,是一种用于建模的知识表示。所以metadata和semantic data是两个层面或者在不同context下说的词,我觉得没有必要去陷入其中问个所以然。
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/15 13:29:00
     
     Humphrey 帅哥哟,离线,有人找我吗?狮子座1981-7-23
      
      
      威望:1
      等级:研二(搞定了DL,再搞定F-Logic!)
      文章:937
      积分:5743
      门派:W3CHINA.ORG
      注册:2008/3/12

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给Humphrey发送一个短消息 把Humphrey加入好友 查看Humphrey的个人资料 搜索Humphrey在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看Humphrey的博客9
    发贴心情 
    就是说自然语言处理实际上包含两个分支:一个是通过统计的方法,计算词频和归纳相似的词以期获得尽量全面的搜索/检索结果;另一个是使用推理机和知识库对查询提问或文档中的关键词进行逻辑推理。而且在语义网方面的主要语义实现技术就是NLP,是这样吧?
    用语义数据做元数据看起来很像我们常用的“中国知网”里对论文的著录,每一篇都有关键词、摘要、作者和相关文献、引用文献及其链接。事实上这也就是一个元数据吧,不过用“语义数据”做元数据,给我的感觉就像是把上述数据用专门的RDF/XML文档格式描述了。其实我不是有意要在这个问题上较真,只是觉得似乎把元数据做得过于庞大和复杂(语义元数据要表示语义嘛,想必得遵循相应的描述格式,也要有比较高的精细程度)可能还是会拖累检索效率。真是难以两全啊!


    [此贴子已经被作者于2009-1-15 15:10:52编辑过]

    ----------------------------------------------
    鸿丰

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/15 13:48:00
     
     whfcarter 帅哥哟,离线,有人找我吗?
      
      
      
      威望:9
      等级:计算机学士学位(贵宾)
      文章:143
      积分:2145
      门派:XML.ORG.CN
      注册:2005/3/8

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给whfcarter发送一个短消息 把whfcarter加入好友 查看whfcarter的个人资料 搜索whfcarter在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 引用回复这个贴子 回复这个贴子 查看whfcarter的博客10
    发贴心情 
    如果希望对NLP有更加深入的了解,可以关注一下NLP的顶级会议ACL中的一些session,以及看一些NLP的介绍或许对你有帮助。你描述的关于论文中的作者等就是最典型的metadata。不过,我不是很理解,你为什么认为semantic data作为metadata就会把它做的很复杂很大。首先semantic data使用RDF/OWL等来进行表示本来就考虑web的特性,你可以在任何时间任何地点描述任何事物,那么你用一定的semantic data来标注一个document里面的attributes等是很自然的事情,如果说一些标准的话,你可以去看一下microformats, eRDF或者RDFa等,它们是一种dialet,基本模型还是RDF或者部分支持它。还有就是从你的描述中,感觉你认为semantic data是很复杂的东西,其实不然,他可以是包含很多axioms等支持复杂推理表达能力很强的数据,如life science等,也可以仅仅使用RDF data graph的结构来表达如FOAF等social network很简单的东西。semantic可以是复杂的也可以是轻量级的,关键是满足当前的应用就是最好的。
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2009/1/15 15:24:00
     
     GoogleAdSense
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 Semantic Web(语义Web)/描述逻辑/本体 』的所有贴子 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/12/18 16:44:05

    本主题贴数22,分页: [1] [2] [3]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    136.719ms