Unit 1 Introduction to Data Compression

Lecture Notes on Introduction to
Data Compression
for
Open Educational Resource
on
Data Compression(CA209)
by
Dr. Piyush Charan
Assistant Professor
Department of Electronics and Communication Engg.
Integral University, Lucknow

Content
• UNIT-I: Introduction to Compression Techniques: Loss less
compression, Lossy Compression, Measures of performance,
Modeling and coding, Mathematical Preliminaries for Lossless
compression.
• Introduction to Information Theory and Models: Physical
models, Probability models, Markov models.
2 February 2021 2
Dr. Piyush, Charan Dept. of ECE, Integral University, Lucknow

What is Data Compression?
• Data Compression = Modeling + Coding
• data compression consists of taking a stream of symbols and
transforming them into codes. If the compression is
effective, the resulting stream of codes will be smaller than
the original symbols.
• The decision to output a certain code for a certain symbol or
set of symbols is based on a model.
• The model is simply a collection of data and rules used to
process input symbols and determine which code(s) to
output.
2 February 2021 Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow 3

Other Definitions
• Data compression is the process of converting an input data stream
(the source stream or the original raw data) into another data stream
(the output, the bitstream, or the compressed stream) that has a
smaller size. A stream is either a file or a buffer in memory.
• The field of data compression is often called source coding. We
imagine that the input symbols (such as bits, ASCII codes, bytes,
audio samples, or pixel values) are emitted by a certain information
source and have to be coded before being sent to their destination.
The source can be memoryless, or it can have memory.

Need of Compression
• Why Data Compression?
– There are two practical motivations for compression:
• Make optimal use of limited storage space (Reduction of storage
requirements)
• Save time and help to optimize resources
– If compression and decompression are done in I/O processor,
less time is required to move data to or from storage
subsystem, freeing I/O bus for other work
– In sending data over communication line: less time to transmit
and less storage to host

Data Compression
• Data compression, source coding, or bit-rate reduction is the process of
encoding information using fewer bits than the original representation. Any
particular compression is either lossy or lossless.
• Lossless compression reduces bits by identifying and eliminating statistical
redundancy. No information is lost in lossless compression.
• Lossy compression reduces bits by removing unnecessary or less important
information.
• Typically, a device that performs data compression is referred to as an
encoder, and one that performs the reversal of the process (decompression)
as a decoder.
2 February 2021 6
Dr. Piyush Charan, Dept. of ECE, Integral University, Lucknow

Data Compression contd…
• In compression technique or compression algorithm,
we are actually referring to two algorithms.
• There is the compression algorithm that takes an input
and generates a representation that requires fewer
bits, and there is a reconstruction algorithm
(decompression algorithm) that operates on the
compressed representation to generate the
reconstruction .
2 February 2021 7

Data Compression contd…
Fig.1. Compression and Reconstruction
2 February 2021 8

Process of Data Compression

• Based on the requirements of reconstruction, data
compression schemes can be divided into two
broad classes:
• lossless compression schemes, in which is
identical to , and
• lossy compression schemes, which generally
provide much higher compression than lossless
compression but allow to be different from .
2 February 2021 10

Types of Data Compression
• Data compression is about storing and sending a smaller number of bits.
• There are two major categories for methods to compress data: lossless and lossy
methods.

Lossless Compression Methods
• In lossless methods, original data and the data
after compression and decompression are exactly
the same.
• Redundant data is removed in compression and
added during decompression.
• Lossless methods are used when we can’t afford
to lose any data: legal and medical documents,
computer programs.
2 February 2021 12

Lossy Compression Methods
• Used for compressing images and video files (our eyes
cannot distinguish subtle changes, so lossy data is
acceptable).
• These methods are cheaper, require less time and space.
• Several methods:
– JPEG: compress pictures and graphics
– MPEG: compress video
– MP3: compress audio

Measure of Performance
• A compression algorithm can be evaluated in a
number of different ways.
• We could measure-
– the relative complexity of the algorithm,
– the memory required to implement the algorithm,
– how fast the algorithm performs on a given machine,
– the amount of compression, and
– how closely the reconstruction resembles the original.

1. Compression Ratio
• A very logical way of measuring how well a compression
algorithm compresses a given set of data is to look at the ratio
of the number of bits required to represent the data before
compression to the number of bits required to represent the
data after compression. This ratio is called the compression
ratio.

Example
• Suppose storing an image made up of a square array of
256×256 pixels requires 65,536 bytes. The image is
compressed and the compressed version requires 16,384 bytes.
• The compression Ratio for the above compression is given by-
Compression Ratio= Original Size
Compressed Size
 Compression Ratio= 65536 = 4:1
16384

• We can also represent the compression ratio by expressing the
reduction in the amount of data required as a percentage of the size of
the original data.
• Total Compression in percentage = Original-Compressed ×100%
Original
= 65536-16384 ×100%
65536
= 75%
• In this particular example, the compression ratio calculated in this
manner would be 75%.

2. Rate of Compression
• Compression performance can also be reported by providing
the average number of bits required to represent a single
sample.
• This is generally referred to as the rate.
• For example, in the case of the compressed image described
previously, if we assume 8 bits per byte (or pixel), the average
number of bits per pixel in the compressed representation is 2.
• Thus, we would say that the rate is 2 bits per pixel.

3. Distortion
• In lossy compression, the reconstruction differs from the
original data.
• Therefore, in order to determine the efficiency of a
compression algorithm, we have to have some way of
quantifying the difference.
• The difference between the original and the reconstruction is
often called the distortion.

Unit 1 Introduction to Data Compression

More Related Content

What's hot (20)

Similar to Unit 1 Introduction to Data Compression (20)

More from Dr Piyush Charan (20)

Recently uploaded (20)

Unit 1 Introduction to Data Compression