Solutions and query rewriting in data Exchange.
Revista : Information and ComputationVolumen : 228229
Páginas : 2861
Tipo de publicación : ISI Ir a publicación
Abstract
Data exchange is the problem of taking data structured under a source schema and creatingan instance of a target schema. Given a source instance, there may be many solutions target instances that satisfy the constraints of the data exchange problem. Previous workhas identified two classes of desirable solutions: canonical universal solutions, and theircores. Query answering in data exchange amounts to rewriting a query over the targetschema to another query that, over a materialized target instance, gives the result that issemantically consistent with the source (specifically, the certain answers). Basic questionsare then: (1) how do these solutions compare in terms of query rewriting? and (2) howcan we determine whether a query is rewritable over a particular solution?Our goal is to answer these questions. Our first main result is that, in terms of rewritabilityby relational algebra queries, the core is strictly less expressive than the canonical universalsolution, which in turn is strictly less expressive than the source. To develop techniquesfor proving queries non-rewritable, we establish structural properties of solutions; in factthey are derived from the technical machinery developed in the rewritability proofs. Oursecond result is that both the canonical universal solution and the core preserve the localstructure of the data, and that every target query rewritable over any of these solutionscannot distinguish tuples whose neighborhoods in the source are similar. This gives usa first simple tool for checking whether a query is non-rewritable over the canonicaluniversal solution or over the core. We also show that these tools generalize to arbitrarytransformations that preserve the local structure of the data, and investigate an alternativesemantics of query answering in data exchange.