| trip report for SIGMOD 2009|
Finally I have some time to write the trip report for SIGMOD 2009 and wish to share some of my experiences with you. This year's SIGMOD/PODS was held in Providence, Rhode Isand, USA from June 29 to July 2. Note that PODS is always held together with SIGMOD. It mainly focuses on the database theoritical research.
This is my first time to attend SIGMOD. Although the program schedule was a bit tight, I listened to a lot of interesting talks, nice presentations and great demos during the main conference.
I list some valuable findings and new trends for you to track:
1) new hardware for DB?
You can refer to the invited talk "Storage Class Memory: Technology, Systems and Applications" by Richard F. Freitas (IBM). There is also one separate session named "Databases on Modern Hardware". I am sure it will continue to be popular since the characteristics of new hardware (especially SCM) breaks several old constraints or assumption of traditional data management, but brings new challenges.
2) cloud computing
In my opnion, cloud computing is the business realization of parallel computing, grid computing and distributed computing, which maximize the business value of these technologies. For the DB community, a new cross-field research topic emerges: data management in the cloud. I felt almost everyone was talking about the state of the art and the future of cloud computing. New comers also show their interests on that and raised interesting questions to those experts during coffee breaks. If you have interest, you can refer to "Distributed Data-Parallel Computing Using a High-Level Programming Language" by MSR. Moreover, you can have a look at "A Comparison of Approaches to Large-Scale Data Analysis" from the "Large-Scale Data Analysis". It proposed a benchmark (including several tasks both from parallel DBMS and map-reduce) to compare the map-reduce framework and the parallel DBMS (considering both row-oriented and column-oriented). It tries to tell us in which cases you need to select map-reduce while in other cases parallel DBMS will be better. While it is a bit biased and just shows some preliminary results within 100 nodes, it is worth having a closer look at the difference between two choices.
3) keyword search
While it has been studied for long, it is still a hot topic for DB. If you want to learn the basic knowledge of keyword search for structured data (DB, XML and graph data) and have a clear understanding of the existing work, I recommend you to look at a half day tutorial "Keyword Search on Structured and Semi-Structured Data". You can find the slides from Wei Wang's homepage. You can also have a look at a separate research session on keyword search and one paper "Combining Keyword Search and Forms for Ad Hoc Querying of Databases" from the Data on the Web session.
4) Data Fusion
With the growth of the data, it is still an open problem. This year there are two sessions about data fusion: one is Data Integration and the other is Entity Resolution. You can clearly see that one is interested in schema-level mapping while the latter one focuses on data-level. In fact, it is not limited to data fusion, you can also consider service composition or lightweight way (e.g. mashup). It also emphasizes the ability to handle data change and inconsistency in a large scale.
5) Semantic Web related
It is still not mainstream. However, since I am from the SW community, I pay much attention to the work presented in SIGMOD. There are three kinds of work: ontology matching ("A Gauss Funtion based Approach for Unbalanced Ontology Matching" done by Tsinghua University and IBM China Research Lab), RDF triple store ("Scalable Join Processing on Very Large RDF Graphs" done by MPi, Germany), and Semantic Search ("Hermes: A Travel through Semantics on the Data Web" done by us, Shanghai Jiao Tong University). If you have interests, you can have a look.
I do not plan to describe each interesting topic in detail. You can also track the development of Column Store. This year, cross-field research topics are recoginized as the trend for DB (i.e., Computer Human Interaction (CHI) and Information Visualization (IV) for DB, see the invited talk "Transforming Data Access Through Public Visualization", Data management in the online game see the tutorial "Database Research in Computer Games", and new hardware for DB see the tutorial "FPGA (Field Programmable Gate Array): What's in it for a Database?").
I also like the social events in this year's SIGMOD. I listened to the "new researcher symposium" and learned how to design your research career. I also listened to the "Relational Data model 40 years celebration" and heard interesting stories about "Edgar F. Codd". The business meeting and closing ceromony were also attractive. Daniel Abadi won this year's Jim Gray Dissertation Award for his excellent work on C-store (a kind of column-oriented store). The SIGMOD Edgar F. Codd Innovations Award Talk was given to Masaru Kitsuregawa for his contribution to hash-join and parallel computing.