Pages

Monday, February 23, 2009

Making Warehouse Data Self-maintainable

Making Warehouse Data Self-maintainable

Abstract

Data in the warehouse can be seen as materialized views generated from the underlying multiple data sources. Materialized views are used to speed up query processing on large amounts of data. These views need to be maintained in response to updates in the source data. This is often done using incremental techniques that access data from underlying sources. In the data warehousing scenario, accessing base relations can be difficult, sometimes data sources may be unavailable, since these relations are distributed across different sources. For these reasons, the issue of self-maintainability of the view is an important issue in data warehousing. In this talk we show that the warehouse views can be made self-maintainable by materialising some additional information, called auxiliary relations, derived from the intermediate results of the view computation. We give an algorithm for determining what auxiliary relations need to be materialized in order to make a materialized view self-maintainable. We then discuss an efficient self-maintainable incremental algorithm that computes the updates to both the materialized view and the additional relations. The primary objective is to minimize the overall maintenance cost of a given view and the auxiliary relations. One important feature of our algorithm is that it derives the `exact change' to every materialized additional relation, including the materialized view, without accessing the view itself. This feature is important to ensure the correctness of the update to views defined by aggregate functions, and also important in active database applications where triggers are fired by updates to the view. Finally, we compare the maintenance cost of our incremental algorithm to that of counting algorithm and recomputing the view from scratch (naive algorithm).

No comments:

Post a Comment