Pages

Wednesday, October 29, 2008

DATA COMPRESSION

DATA COMPRESSION
Abstract
It is very interesting that data can be compressed because it insinuates that information that we generally pass on to each other can be said in shorter information units (infos). Instead of saying ‘yes’ a simple nod can do the same work, but by transmitting lesser data. In his 1948 paper, “A Mathematical Theory of Communication”, Shannon established that there is a fundamental limit to lossless data compression. This limit, called the entropy rate, is denoted by H. The exact value of H depends on the information source --- more specifically, the statistical nature of the source. It is possible to compress the source, in a lossless manner, with compression rate close to H. It is mathematically impossible to do better than H.
Shannon also developed the theory of lossy data compression. This is better known as rate-distortion theory. In lossy data compression, the decompressed data does not have to be exactly the same as the original data. Instead, some amount of distortion, D, is tolerated. Shannon showed that, for a given source (with all its statistical properties known) and a given distortion measure; there is a function, R (D), called the rate-distortion function. The theory says that if D is the tolerable amount of distortion, then R (D) is the best possible compression rate.
When the compression is lossless (i.e., no distortion or D=0), the best possible compression rate is R(0)=H (for a finite alphabet source). In other words, the best possible lossless compression rate is the entropy rate. In this sense, rate-distortion theory is a generalization of lossless data compression theory, where we went from no distortion (D=0) to some distortion (D>0).
Lossless data compression theory and rate-distortion theory are known collectively as source coding theory. Source coding theory sets fundamental limits on the performance of all data compression algorithms. The theory, in itself, does not specify exactly how to design and implement these algorithms. It does, however, provide some hints and guidelines on how to achieve optimal performance.

No comments:

Post a Comment