Pages

Friday, February 20, 2009

Profile-Driven Data Management

Profile-Driven Data Management


INTRODUCTION

Large amounts of data populate the Internet, yet it is largely unmanaged. Access to this data requires a wide array of network resources including server capacity, bandwidth, and cache space. Applications compete for these resources with little overarching support for its intelligent allocation. The database community has seen this problem before. Data management techniques (e.g., indexing, data organization, buffer management) have long been used in DBMS's to offset the demands of limited resources made by competing applications' requests for data. Such techniques are application-driven; a database administrator (DBA) consults with users to determine their data needs, and tweaks the knobs of data management tools accordingly. But the
Internet environment, consisting of autonomous data sources and global in scale, cannot be tamed by a DBA. We take the view that in this environment, the DBA role must be automated, and the specification of application-level data requirements must be made formal by way of processible user.

Challenges of Data Management in Pervasive Environments

If each entity in pervasive environments is treated as being capable of both posing and answering queries, we can describe this model as a type of mobile distributed database. However, it is far more complex than the conventional client-proxy-server based model. We can illustrate this by classifying our environment in terms of four orthogonal axes, i.e., autonomy, distribution, heterogeneity, and mobility ([16, 9]). The system is highly autonomous since there is no centralized control of the individual client databases. It is heterogeneous as we only assume that entities can “speak” to each other in some neutral format. The system is clearly distributed as parts of data reside on different computers, and there is some replication as entities cache data/metadata. Mobility is of course a given – In ad-hoc networking environments, devices can change their locations, and no fixed set of entities is “always” accessible to a given device. This is distinct from disconnection management that traditional work in mobile databases addresses. In those systems, disconnections of mobile devices from the network are viewed as temporary events and when reconnected, any ongoing transactions between the mobile and the server will
simply continue from where they left off before the disconnection or be rolled back.
As devices move, their neighborhood changes dynamically. Hence, depending on the specific location and time a particular query is given, the originator may obtain different answers or none at all. Moreover, unlike traditional distributed database systems, the querying device cannot depend on a global catalog that would be able to route itsquery to the proper data source.
In addition, there is no guarantee that the device will be able to access information that resides on neighboring devices under high mobility conditions. In other words, data is pervasive – it is not stored in a single repository but is distributed and/or replicated in an unknown manner among a large number of devices only some of which are accessible at any given time. Querying is by similar reasoning serendipitous – if one asks a question to which the answer is stored in the vicinity then the query succeeds. Such a situation seems to leave too much to chance. To
improve the chances of getting an answer to a question no matter when it is asked, each device should have the option to cache the metadata (e.g who has what data) and perhaps even the data obtained from neighbors in its current vicinity. To further complicate matters, each data source may have its own schema. Not all possible schema mappings can be done a-priori, and some devices will be unable to translate on the fly the due to their limited computational capabilities. In addition, cooperation among information sources cannot be guaranteed. The issues of
privacy and trust will clearly be very important for a pervasive environment, where random entities interact in random transactions. There may be an entity that has reliable information but refuses to make it available to others. There may exist an entity in the ad-hoc environment that is willing to share information, which is unreliable. Lastly, when an entity makes information available to another entity, questions regarding its provenance, as well as protection of future changes and sharing of that information arise.
For pervasive systems to succeed in general, much of the interaction between the devices must happen in the background, often without explicit human intervention. Instead, such interactions should be executed based on information in the profile. For instance, a diabetic user’s profile can say “Always keep track of the nearest hospital”, and this will influence what data the InforMa will seek to obtain and which information sources it will interact with. Of course, this brings up the question of what exactly a profile should contain. As we mentioned,

perhaps the best work on profile driven data management is due to Franklin, Cherniak and Zdonik [6]. We argue that their profiles, which explicitly enumerate data and its utility, are not sufficient. A profile must contain a user’s “beliefs”, “desires”, and “intentions”, an idea which has been explored in multi-agent interactions [4]. These, along with contextual parameters such as location, time, battery power, storage space etc. allow the InforMa of each entity to determine what data to obtain and its relative worth.

No comments:

Post a Comment