unit 1.pptx for advanced cloud computing unit

19PCSPC101
ADVANCED DATA STRUCTURES
AND ALGORITHMS
UNIT-I

HASHING
Dictionaries: Definition, Dictionary Abstract Data
Type, Implementation of Dictionaries. Hashing:
Review of Hashing, Hash Function, Collision
Resolution Techniques in Hashing, Separate
Chaining, Open Addressing, Linear Probing,
Quadratic Probing, Double Hashing, Rehashing,
Extendible Hashing.

Data Structure
• A data structure is a specialized format for
organizing, processing, retrieving and storing
data.
• While there are several basic and advanced
structure types, any data structure is designed
to arrange data to suit a specific purpose so
that it can be accessed and worked with in
appropriate ways.
2

Data Structure
• In computer programming, a data structure
may be selected or designed to store data for
the purpose of working on it with various
algorithms.
• Each data structure contains information about
the data values, relationships between the data
and functions that can be applied to the data.
3

5
Data Structure
• The data structure is basically a technique of
organizing and storing of different types of data
items in computer memory.
• It is considered as not only the storing of data
elements but also the maintaining of the logical
relationship existing between individual data
elements.
• The Data structure can also be defined as a
mathematical or logical model, which relates to a
particular organization of different data elements.

6
Data Structure
• Data:
– Data is the basic entity of fact that is used in calculations
or manipulation process.
– The way of organizing of the data &
performing the operations is called as data structure.
Data structure=organized data+ operations
– Operations
• Insertion
• Deletions
• Searching
• Traversing

7
Data Structure
• The organization must be convenient for users.
• Data structures are implemented in the real
time in the following situations:
– Car park
– File storage
– Machinery
– Shortest path
– Sorting
– Networking
– Evaluation of expressions

Data Structure
• Specification of data structure :
– Data structures are considered as the main building
blocks of a computer program.
• Organization of data
• Accessing methods
• Degree of associativity
• Processing alternatives for information
8

Data Structure
• at the time of selection of data structure we
should follow these two things so that our
selection is efficient enough to solve our
problem.
– The data structure must be powerful enough to
handle the different relationship existing between
the data.
– The structure of data also to be simple, so that we
can efficiently process data when required.
9

1
0
Characteristics of data structures
• Linear or non-linear: This characteristic
describes whether the data items are arranged
in chronological sequence,
such as with an array,
or in an unordered
sequence, such as with a
graph.

• Homogeneous or non-homogeneous: This
characteristic describes whether all data items
in a given repository are of the same type or of
various types.
11

• Static or dynamic: This characteristic
describes how the data structures are compiled.
Static data structures have fixed sizes,
structures and memory locations at compile
time.
• Dynamic data structures have sizes, structures
and memory locations that can shrink or
expand depending on the use.
12

Types of data structures
These data structures are directly
operated upon by the machine
instructions.
13

14
• Primitive data structure :
–The primitive data structures are
known as basic data structures.
–These data structures are directly
operated upon by the machine instructions.
–The primitive data structures have
different representation on different
computers.

15
• Non-Primitive data structure :
–The non-primitive data structures are highly
developed complex data structures.
–Basically these are developed from
the primitive data structure.
– The non-primitive data
responsible for organizing the group
structure
is
of
homogeneous and heterogeneous
data elements.

• Data structure types are determined by what
types of operations are required or what kinds
of algorithms are going to be applied.
• Arrays-
– An array stores a collection of items at adjoining
memory locations.
– Items that are the same type get stored together so
that the position of each element can be calculated
or retrieved easily.
– Arrays can be fixed or flexible in length.
16

• Arrays-
17

• Stacks-
18

• Queues-
– A queue stores a collection of items similar to a
stack; however the operation order can only be
first in first out.
19

• Linked lists-
– A linked list stores a collection of items in a linear
order. Each element or node in a linked list
contains a data item as well as a reference or link
to the next item in the list.
20

• Trees-
– A tree stores a collection of items in an abstract hierarchical
way.
– Each node is linked to other nodes and can have multiple sub-
values also known as children.
21

• A Tree has the following characteristics :
– The top item in a hierarchy of a tree is referred as
the root of the tree.
– The remaining data elements are partitioned into a
number of mutually exclusive subsets and they
itself a tree and are known as the subtree.
– Unlike natural trees trees in the data structure
always grow in length towards the bottom.
22

• Graphs-
– A graph stores a collection of items in a non-linear fashion.
– Graphs are made up of a finite set of nodes also known as
vertices and lines that connect them also known as edges.
– These are useful for representing real-life systems
such as computer networks.
23

24
• The different types of Graphs are :
– Directed Graph
– Non-directed Graph
– Connected Graph
– Non-connected Graph
– Simple Graph
– Multi-Graph

• Tries-
– A trie or keyword tree, is a data structure that
stores strings as data items that can be organized in
a visual graph.
25

26
• Hash tables-
– A hash table or a hash map stores a collection of
items in an associative array that plots keys to
values.
– A hash table uses a hash function to convert an
index into an array of buckets that contain the
desired data item.
– Overcoming the drawbacks of linear data
structures hashing is introduced.

27
• Files :
– Files contain data or
information,
stored
permanently in the secondary storage device such
as Hard Disk and Floppy Disk.
– It is usefulwhen we have to store
and process a large amount of data.
– A file stored
in a
identified using
storage
a
device is
always file
name
like HELLO.DAT or TEXTNAME.TXT and
so on.

• Files :
– A file name normally contains a
primary and a secondary name which
is separated by a dot(.).
28

29
Fundamentals of data structures:
• Fundamental Data Structures
– The following four data structures are used ubiquitously in
the description of algorithms and serve as basic building
blocks for realizing more complex data structures.
• Sequences (also called as lists)
• Dictionaries
• Priority Queues
• Graphs
– Dictionaries and priority queues can be classified under a
broader category called dynamic sets.
– binary and general trees are very popular building blocks
for implementing dictionaries and priority queues.

Dictionaries
• A dictionary is a general-purpose data
structure for storing a group of objects.
• A dictionary has a set of keys and each key has
a single associated value.
• When presented with a key the dictionary will
return the associated value.
• A dictionary is also called a hash, a map, a
hashmap in different programming languages.
30

30
29
Dictionaries
• For example the results of a classroom test could be represented as a
dictionary with pupil's names as keys and their scores as the values
• results = { 'Detra' : 17,
'Nova' : 84,
'Charlie' : 22,
'Henry' : 75,
'Roxanne' : 92,
'Elsa' : 29 }
• Instead of using the numerical index of the data we can use
the dictionary names to return values
• >>> results['Nova']
84
• >>> results['Elsa']

32
Dictionaries
• The keys in a dictionary must be simple types (such
as integers or strings) while the values can be of any
type.
• Different languages enforce different type restrictions
on keys and values in a dictionary.
• Dictionaries are often implemented as hash tables.
• Keys in a dictionary must be unique an attempt to
create a duplicate key will typically overwrite the
existing value for that key.

33
Dictionaries
• Dictionary is an abstract data structure that
supports the following operations:
– search(K key) (returns the value associated with the given
key)
– insert(K key, V value)
– delete(K key)
• Each element stored in a dictionary is identified by a
key of type K.
• Dictionary represents a mapping from keys to values.

34
Dictionaries
• Dictionaries have numerous applications.
– contact book
• key: name of person; value:
– telephone number table of program variable identiers
• key: identier; value: address in memory
– property-value collection
• key: property name; value: associated value
– natural language dictionary
• key: word in language X; value: word in language Y
– etc

35
operations on dictionaries
• Dictionaries typically support several operations:
– retrieve a value (depending on language, attempting to
retrieve a missing key may give a default value or throw an
exception)
– insert or update a value (typically, if the key does not
exist in the dictionary, the key-value pair is inserted; if the
key already exists, its corresponding value is overwritten
with the new one)
– remove a key-value pair
– test for existence of a key
• Note that items in a dictionary are unordered, so loops
over dictionaries will return items in an arbitrary order.

Implementations on dictionaries
sorted or unsorted
• simple
implementations:
sequences, direct addressing
• hash tables
• binary search trees (BST)
• AVL trees
• self-organising BST
• red-black trees
• (a,b)-trees (in particular: 2-3-trees)
• B-trees and other
36

37
The Dictionary ADT
• The abstract data type that corresponds to the
dictionary metaphor is known by several names.
• Other terms for keyed containers include the
names map, table, search table, associative array,
or hash.
• Whatever it is called, the idea is a data structure
optimized for a very specific type of search.
• Elements are placed into the dictionary in
key/value pairs.

The Dictionary ADT
• To do a retrieval, the user supplies a key, and
the container returns the associated value.
• Each key identifies one entry; that is, each key
is unique.
• data is removed from a dictionary by specifying
the key for the data value to be deleted
38

39
Dictionary Implementation with
Hash-Table
• Hash Table is a data structure which store data in
associative manner.
• In hash table, data is stored in array format where each
data values has its own unique index value.
• Access of data becomes very fast if we know the index of
desired data.
• a data structure in which insertion and search operations
are very fast irrespective of size of data.
• Hash Table uses array as a storage medium and uses hash
technique to generate index where an element is to be
inserted or to be located from.

Hash-Table
• Hashing is a technique to convert a range of
key values into a range of indexes of an array.
• We're going to use modulo operator to get a range of
key values.
• Consider an example of hashtable of size 20,
and following items are to be stored.
• Item are in key, value format.
40

Hash-Table
41

Hash-Table
• Linear Probing
• the hashing technique used create already used index
of the array.
• In such case, we can search the next empty location in
the array by looking into the next cell until we found
an empty cell.
• This technique is called linear probing
42

Hash-Table
43

Hash-Table
• Following are basic primary operations of a hashtable
which are following.
– Search − search an element in a hashtable.
– Insert − insert an element in a hashtable.
– delete − delete an element from a hashtable
• DataItem Define a data item having some data, and
key based on which search is to be conducted in
hashtable.
struct DataItem {
int data;
int key;
}; 44

Hash-Table
 Hash Method Define a hashing method to
compute the hash code of the key of the data item.
int hashCode(int key)
{
return key % SIZE;
}
45

Hash-Table
• Insert Operation
• Whenever an element is to be inserted.
• Compute the hash code of the key passed and locate
the index using that hashcode as index in the array.
• Use linear probing for empty location if an element is
found at computed hash code.
46

Hash-Table
• Insert Operation
47

Hash-Table
• Delete Operation Whenever an element is to
be deleted.
the index using that hashcode as index in the array.
• Use linear probing to get element ahead if an element
is not found at computed hash code.
• When found, store a dummy item there to
keep performance of hashtable intact
48

Hash-Table
49

Hash-Table
• Search Operation Whenever an element is to
be searched.
the element using that hashcode as index in the array.
• Use linear probing to get element ahead if
element not found at computed hash code.
50

Hash-Table
• Search Operation
51

What is Hashing
• Widely used in many kinds of computer
software, particularly for associative arrays,
database indexing, caches, and sets

Hash Functions
• simple/fast to compute,
• Avoid collisions
• have keys distributed evenly among cells
• Each uses a hash table for average complexity
to insert , erase, and find in O(1)
• hash function is a one-to-one mapping between
keys and hash values. So no collision occurs

characteristics of a good hash
function
• The characteristics of a good hash function are
as follows.
– It avoids collisions.
– It tends to spread keys evenly in the array.
– It is easy to compute (i.e., computational time of a
hash function should be O(1)).

Collision Resolution
• Collision: when two keys map to the
same location in the hash table.
• Collisions occur when two keys, k1 and k2, are
not equal, but h(k1) = h(k2).
• Two ways to resolve collisions:
– Separate Chaining (open hashing)
– Open Addressing (linear probing, quadratic
probing, double hashing) (closed hashing )

Several approaches for dealing with
collisions are
• Example: K = {0, 1, ..., 199}, M = 10, for
each key k in K, f(k) = k % M

on.
Collusion Resolution Methods
• Three methods in open addressing are linear
probing, quadratic probing, and double
hashing.
• These methods are of the division hashing
method because the hash function is f( k) =
k
% M.
• Some other hashing methods are middle-
square hashing method, multiplication hashing
method, and Fibonacci hashing method, and so

Linear Probing Method
• The hash table in this case is implemented
using an array containing M nodes, each node
of the hash table has a field k used to contain
the key of the node.
• M can be any positive integer but M is often
chosen to be a prime number.
• When the hash table is initialized, all fields k
are assigned to -1.

• When a node with the key k needs to be added
into the hash table, the hash function
f( k) = k % M
• will specify the address i = f( k) (i.e., an index
of an array) within the range [0, M - 1].

• If there is no conflict, then this node is added into
the hash table at the address i.
• If a conflict takes place, then the hash function
rehashes first time f 1 to consider the next address
(i.e., i + 1).
• If conflict occurs again, then the hash function
rehashes second time f 2 to examine the next
address (i.e., i + 2).
• This process repeats until the available address
found then this node will be added at this address.

• The rehash function at the time t (i.e., the collision
number t = 1, 2, ...) is presented as follows
• When searching a node, the hash function f( k) will
identify the address i (i.e., i = f( k)) falling between 0
and M - 1.

• Let us consider a simple hash function as “key mod
7” and sequence of keys as 50, 700, 76, 85, 92, 73,
101.
Step-01:
Draw the hash table
For the given hash function, the possible range of hash values is [0,
6]. So, draw an empty hash table consisting of 7 buckets as

101.
Step-02:
Insert the given keys in the hash table one by
one. The first key to be inserted in the hash table
= 50.
Bucket of the hash table to which key 50 maps = 50 mod 7 =
1. So, key 50 will be inserted in bucket-1 of the hash table as

101.
Step-03:
The next key to be inserted in the hash table = 700.
0. So, key 700 will be inserted in bucket-0 of the hash table as-

101.
Step-04:
The next key to be inserted in the hash table = 76.
6. So, key 76 will be inserted in bucket-6 of the hash table as-

Step-05: The next key to be inserted in the hash table = 85.
1. Since bucket-1 is already occupied, so collision occurs.
To handle the collision, linear probing technique keeps probing
linearly until an empty bucket is found.
The first empty bucket is bucket-2.
So, key 85 will be inserted in bucket-2 of the hash table as-

So, key 92 will be inserted in bucket-3 of the hash table as

So, key 101 will be inserted in bucket-5 of the hash table as

• Example: insert keys 32, 53, 22, 92, 17, 34, 24,
37, and 56 into a hash table of size M = 10
1. insert keys 32 into a hash table of size M = 10

0
1
2
3
4
5
6
7
8
9
insert keys 32 into a hash table of size M = 10 i.e M-1=9
Hash Functions Distribute keys to locations in hash table
Hash function is then applied to the integer value 32
such that it maps to a value between 0 to M-1 where M
is the table size then modulo hashing is used
Here k=32 M=10
f( k) = k % M
f( k) = 32 % 10=2
will specify the address i = f( k) (i.e., an index
Index position i =2 then insert 32 in 3
position

0
1
2 32
3
4
5
6
7
8
9
Here k=32 M=10
f( k) = k % M
f( k) = 32 % 10=2
position

0
1
2 32
3 53
4
5
6
7
8
9
Here k=53 M=10
f( k) = k % M
f( k) = 53 % 10=3
position

0
1
2 32/22
3 53
4
5
6
7
8
9
Here k=22 M=10
f( k) = k % M
f( k) = 22 % 10=2
Index position i = then insert 32 in 2 position
If a conflict takes place, then the hash
function rehashes
first time f 1 to consider the next address

0
1
2 32/22
3 53
4
5
6
7
8
9
Then must be probe (move) for one time for finding empty slot
Here k=22 M=10
f( k) = k % M
f( k) = 22 % 10=2
Index position i = then insert 32 in 2 position
If a conflict takes place, then the hash
function rehashes
first time f 1 to consider the next address

Quadratic probing
• Quadratic probing operates
by
taking
the
original
values
hash indexand adding
of an
arbitrary
successive
quadratic
polynomial until an open slot is found.
• An example sequence using quadratic
probing is:

Quadratic probing
• it better avoids the clustering problem that can
occur with linear probing.
• Let h(k) be a hash function that maps an
element k to an integer in [0,m-1], where m is
the size of the table.
• Let the ith probe position for a value k be given
by the function

Quadratic probing
• When a node with the key k needs to be added
into the hash table, the hash function
• will specify the address i within the range [0,
M - 1] (i.e., i = f( k))

Quadratic probing
• If there is no conflict, then this node is added into
the hash table at the address i.
• If a conflict takes place, then the hash
function rehashes first time f 1 to consider the
address f( k)
+
• If conflict occurs again, then the hash
function rehashes second time f 2 to examine the
address f(
k) +
• This process repeats until the available
address found then this node will be added at this
address.

Quadratic probing
• The rehash function at the time t (i.e., the
collision number t = 1, 2, ...) is presented as
follows.
• When searching a node, the hash function f( k)
will identify the address i (i.e., i = f( k))
falling between 0 and M - 1

Quadratic probing
• Example: insert the
keys :76,40,48,5,20
Draw the hash table
For the given hash function, the possible range of hash values is [0,
6]. So, draw an empty hash table consisting of 7 buckets as
Step-01:

Quadratic probing
keys :76,40,48,5,20
Step-01:
Insert the given keys in the hash table one by
one. The first key to be inserted in the hash table
= 76.
76%7=6
0
1
2
3
4
5
6 76

Quadratic probing
keys :76,40,48,5,20
The next key to be inserted in the hash table =40
Step-02:
40%7=5
0
1
2
3
4
5 40
6 76

Quadratic probing
keys :76,40,48,5,20
Step-03:
To handle the collision, quadratic probing technique keeps probing until
an empty bucket is found.
48+ %7=6
0
1
2
3
4
5 40
6 76

Quadratic probing
keys :76,40,48,5,20
Step-04:
To handle the collision, quadratic probing technique keeps probing until
an empty bucket is found.
48+ %7=49%7=0
0 48
1
2
3
4
5 40
6 76

Quadratic probing
Bucket of the hash table to which key 5 maps = 5 mod 7
=5 . Since bucket-5 is already occupied, so collision occurs.
To handle the collision, quadratic probing technique keeps
probing until an empty bucket is found
0 48
1
2 5
3
4
5 40
6 76
5+ %7=6%7=6
5+
%7=9%7=2
5 %7=5
150

Quadratic probing
Bucket of the hash table to which key 20 maps = 20 mod 7
=6 . Since bucket-6 is already occupied, so collision occurs.
To handle the collision, quadratic probing technique keeps
probing until an empty bucket is found
0 48
1
2 5
3 20
4
5 40
6 76
20+ %7=21%7=3
20+ %7=24%7=3
20 %7=6
151

Quadratic probing
insert the keys 10, 15, 16, 20, 30, 25, 26, and 36 into a hash table of size M =
10

unit 1.pptx for advanced cloud computing unit

Extensible hashing
• It is a technique which handles a large amount
of data.
• The data to be placed in the hash table is
by extracting certain number of bits
• Extensible hashing grow and shink similar
to B-tress
• Inextensible hashing referring the size of
directory the elements are to be placed in
buckets.

Extensible hashing
• Extendible hashing uses a directory to access
its buckets.
• This directory is usually small enough to be
kept in main memory and has the form of an
array with 2d entries, each entry storing a
bucket address (pointer to a bucket).
• The variable d is called the global depth of the
directory

Extensible hashing
• Multiple directory entries may point to
the same bucket.
• Every bucket has a local depth leqd.
• The difference between local depth and global
depth affects overflow handling.

Extensible hashing
• Suppose that g=2 and bucket size = 3.
• Suppose that we have records with these keys
and hash function h(key) = key mod 64:

Extensible hashing
• Suppose that we have records with these keys
and hash function h(key) = key mod 64:

Extensible hashing
• Insert 1111 i.e 110111

Extensible hashing
• Insert 3333 i.e
000101

Extensible hashing
• Insert 1235 i.e 010011

Extensible hashing
• Insert 2378 i.e
000010
000010
1111 1235
preparedy by p venkateswarlu dept of IT
JNTUK-UCEV
169
1212
3333
2378

Extensible hashing
• Insert 1212 i.e 111100
111100
1212
3333
2378
1111 1235
JNTUK-UCEV
170

Extensible hashing
• Insert 1456 i.e 110000
110000
1111 1235
JNTUK-UCEV
171
3333
2378
1212 1456

Extensible hashing
• Insert 2134 i.e 010110
010110
2378
1111 1235
JNTUK-UCEV
3333
1212 1456
2134

Extensible hashing
• Insert 2345 i.e
101001
101001
2378
1111 1235
JNTUK-UCEV
3333
1212 1456
2134
2345

Extensible hashing
• Insert 1111 i.e 110111
110111
2378
1111 1235 1111
JNTUK-UCEV
3333
1212 1456
2134
2345

Extensible hashing
• Insert 8231 i.e 100111
100111
1212 1456
3333 2345
2378 2134
1111 1235
1111
JNTUK-UCEV
175

Extensible hashing
• Insert 8231 i.e 100111
JNTUK-UCEV
176

Extensible hashing
• Insert 8231 i.e 100111
JNTUK-UCEV
177

Extensible hashing
• Insert 8231 i.e 100111
JNTUK-UCEV
178

Extensible hashing
• Insert 2222 i.e 101110
JNTUK-UCEV
179

Extensible hashing
• Insert 9999 i.e 001111
JNTUK-UCEV
180

Extensible hashing
• The bucket can hold the data of its
global depth.
• If data in bucket is more than global depth then
split the bucket and double the directory

Extensible hashing
• Consider we have to insert 1, 4, 5, 7, 8, 10
assume each page can hold 2 data entries (2 is
the depth)
• Step 1: insert 1, 4

Extensible hashing
the depth)
• Step 2: insert 5 the bucket is full hence double
the directory.

Extensible hashing
the depth)
• Step 3: insert 7 but as the depth is full we can
not insert 7 here then double the directory and
split the bucket.
JNTUK-UCEV
184

Extensible hashing
• After insertion of 7 consider the last two
bits

Extensible hashing
the depth)
• Step 4: insert 8 i.e 1000

Extensible hashing
the depth)
• Step 5: insert 10 i.e 1000

unit 1.pptx for advanced cloud computing unit

More Related Content

Similar to unit 1.pptx for advanced cloud computing unit (20)

Recently uploaded (20)

unit 1.pptx for advanced cloud computing unit