The Knowledge Management Lab
University of Toronto

  Home Research Goals Application Areas Projects People Publications Join us
Software Reengineering for Network-Centric Computing
Project Home
Publications
People
Schema Mapping and Data Integration

Different program comprehension and reengineering tools represent their data in any number of different ways. In fact, the heterogeneous representation of and understanding of data is by no means limited to program comprehension and reengineering tools; it is ubiquitous. In order for different tools or information sources to understand each other, they need to know how their data schemas map to each other.

One area of research into schema mapping, the Clio project, involves the development of a novel framework for generating mappings semi-automatically between any combination of XML and relational schemas. The framework can translate high-level, user-specified mappings into semantically meaningful queries that transform source data into the target representation. The approach works in two phases. In the first phase, the high-level mappings, expressed as a set of inter-schema correspondences, are converted into a set of mappings that capture the design choices made in the source and target schemas. The design choices include the hierarchical organization of the data specified in XML Schemas, as well as schema constraints (i.e., foreign key constraints). During the second phase those mappings are translated to queries over the source schemas that populate the target schema. An important feature of the mapping algorithm is that it takes into consideration target schema constraints. That way, the approach is guaranteed to generate data that will not violate the integrity of the target schema. Furthermore, the mapping algorithm can generate data values for target schema elements that are important in the target but for which the user has not specified how they should be populated. In the case of a relational source schema, the output of the algorithm is a set of SQL queries while in the case of XML schemas or DTDs. It may be a set of XQueries or XSLT transformation scripts. Moreover, the algorithm is complete in that it produces all the mappings that are consistent with the schema constraints.

In a second line of research, a new approach to data semantics is used to solve problems in schema mapping, data translation and migration, and the semantic web. For example, the correspondence between different schemas is mediated through a common semantics.


  The Knowledge Management Lab is now part of the Bell University Labs

 

  UofT logo The Knowledge Management Lab - Depoartment of Computer Science - University of Toronto