Schema Mapping and Data Integration
Different program comprehension and reengineering tools represent their
data in any number of different ways. In fact, the heterogeneous
representation of and understanding of data is by no means limited
to program comprehension
and reengineering tools; it is ubiquitous. In order for different
tools or information sources to understand each other, they need
to know
how their data schemas map to each other.
One area of research into schema mapping, the Clio project, involves
the development of a novel framework for generating mappings semi-automatically
between any combination of XML and relational schemas. The framework
can translate high-level, user-specified mappings into semantically
meaningful queries that transform source data into the target representation.
The approach works in two phases. In the first phase, the high-level
mappings, expressed as a set of inter-schema correspondences, are
converted into a set of mappings that capture the design choices
made in the source and target schemas. The design choices include
the hierarchical organization of the data specified in XML Schemas,
as well as schema constraints (i.e., foreign key constraints). During
the second phase those mappings are translated to queries over the
source schemas that populate the target schema. An important feature
of the mapping algorithm is that it takes into consideration target
schema constraints. That way, the approach is guaranteed to generate
data that will not violate the integrity of the target schema. Furthermore,
the mapping algorithm can generate data values for target schema
elements that are important in the target but for which the user
has not specified how they should be populated. In the case of a
relational source schema, the output of the algorithm is a set of
SQL queries while in the case of XML schemas or DTDs. It may be a
set of XQueries or XSLT transformation scripts. Moreover, the algorithm
is complete in that it produces all the mappings that are consistent
with the schema constraints.
In a second line of research, a new approach to data semantics is
used to solve problems in schema mapping, data translation and
migration, and the semantic web. For example, the correspondence between
different
schemas is mediated through a common semantics.
|