Presentation1

1. 0829173645ExampleAdjacency ListVisited Table (T/F)sourcePredRDFS( 2 ) RDFS(8) RDFS(9) RDFS(1) RDFS(3) RDFS(5) RDFS(6) visit 7 -> RDFS(7)Mark 6 as visitedMark Pred[6]Recursivecalls

2. 0829173645ExampleAdjacency ListVisited Table (T/F)sourcePredRDFS( 2 ) RDFS(8) RDFS(9) RDFS(1) RDFS(3) RDFS(5) RDFS(6) RDFS(7) -> Stop no more unvisited neighborsMark 7 as visitedMark Pred[7]Recursivecalls

3. 0829173645ExampleAdjacency ListVisited Table (T/F)sourcePredRDFS( 2 ) RDFS(8) RDFS(9) RDFS(1) RDFS(3) RDFS(5) RDFS(6) -> StopRecursivecalls

4. 0829173645ExampleAdjacency ListVisited Table (T/F)sourcePredRDFS( 2 ) RDFS(8) RDFS(9) RDFS(1) RDFS(3) RDFS(5) -> StopRecursivecalls

5. 0829173645ExampleAdjacency ListVisited Table (T/F)sourcePredRDFS( 2 ) RDFS(8) RDFS(9) RDFS(1) RDFS(3) -> StopRecursivecalls

6. 0829173645ExampleAdjacency ListVisited Table (T/F)sourcePredRDFS( 2 ) RDFS(8) RDFS(9) RDFS(1) -> StopRecursivecalls

7. 0829173645ExampleAdjacency ListVisited Table (T/F)sourcePredRDFS( 2 ) RDFS(8) RDFS(9) -> StopRecursivecalls

8. 0829173645ExampleAdjacency ListVisited Table (T/F)sourcePredRDFS( 2 ) RDFS(8) -> StopRecursivecalls

9. 0829173645ExampleAdjacency ListVisited Table (T/F)sourcePredRDFS( 2 ) -> StopRecursivecalls

10. Example0829173645Adjacency ListVisited Table (T/F)sourcePredCheck our paths, does DFS find valid paths? Yes.Try some examples.Path(0) ->Path(6) ->Path(7) ->

11. Time Complexity of DFS(Using adjacency list)We never visited a vertex more than onceWe had to examine all edges of the verticesWe know Σvertex v degree(v) = 2m where m is the number of edgesSo, the running time of DFS is proportional to the number of edges and number of vertices (same as BFS)O(n + m)You will also see this written as:O(|v|+|e|) |v| = number of vertices (n) |e| = number of edges (m)

12. DFS TreeResulting DFS-tree.Notice it is much “deeper”than the BFS tree.Captures the structure of the recursive calls when we visit a neighbor w of v, we add w as child of v

13. whenever DFS returns from a vertex v, we climb up in the tree from v to its parentHashingCOMP171Fall 2005

14. Hash tableSupport the following operations

15. Find

16. Insert

17. Delete. (deletions may be unnecessary in some applications)

18. Unlike binary search tree, AVL tree and B+-tree, the following functions cannot be done:

19. Minimum and maximum

20. Successor and predecessor

21. Report data within a given range

22. List out the data in orderUnrealistic solution Each position (slot) corresponds to a key in the universe of keysT[k] corresponds to an element with key kIf the set contains no element with key k, then T[k]=NULL

23. Unrealistic solutioninsert, delete and find all take O(1) (worst-case) timeProblem:The scheme wastes too much space if the universe is too large compared with the actual number of elements to be stored. E.g. student IDs are 8-digit integers, so the universe size is 108, but we only have about 7000 students

24. HashingUsually, m << N.h(Ki) = an integer in [0, …, m-1] called the hash value of Ki

25. Example applicationsCompilers use hash tables (symbol table) to keep track of declared variables.On-line spell checkers. After prehashing the entire dictionary, one can check each word in constant time and print out the misspelled word in order of their appearance in the document.Useful in applications when the input keys come in sorted order. This is a bad case for binary search tree. AVL tree and B+-tree are harder to implement and they are not necessarily more efficient.

26. HashingWith hashing, an element of key k is stored in T[h(k)]

27. h: hash function

28. maps the universe U of keys into the slots of a hash table T[0,1,...,m-1]

29. an element of key k hashes to slot h(k)

30. h(k) is the hash value of key kHashingProblem: collision

31. two keys may hash to the same slot

32. can we ensure that any two distinct keys get different cells?

33. No, if |U|>m, where m is the size of the hash table

34. Design a good hash function

35. that is fast to compute and

36. can minimize the number of collisions

37. Design a method to resolve the collisions when they occurHash FunctionThe division method

38. h(k) = k mod m

39. e.g. m=12, k=100, h(k)=4 Requires only a single division operation (quite fast)Certain values of m should be avoided

40. e.g. if m=2p, then h(k) is just the p lowest-order bits of k; the hash function does not depend on all the bits

41. Similarly, if the keys are decimal numbers, should not set m to be a power of 10

42. It’s a good practice to set the table size m to be a prime number

43. Good values for m: primes not too close to exact powers of 2

44. e.g. the hash table is to hold 2000 numbers, and we don’t mind an average of 3 numbers being hashed to the same entry

45. choose m=701Hash Function...Can the keys be strings?

46. Most hash functions assume that the keys are natural numbers

47. if keys are not natural numbers, a way must be found to interpret them as natural numbers

48. Method 1

49. Add up the ASCII values of the characters in the string

50. Problems:

51. Different permutations of the same set of characters would have the same hash value

52. If the table size is large, the keys are not distribute well. e.g. Suppose m=10007 and all the keys are eight or fewer characters long. Since ASCII value <= 127, the hash function can only assume values between 0 and 127*8=1016Hash Function...Method 2

53. If the first 3 characters are random and the table size is 10,0007 => a reasonably equitable distribution

54. Problem

55. English is not random

56. Only 28 percent of the table can actually be hashed to (assuming a table size of 10,007)

57. Method 3

58. computes

59. involves all characters in the key and be expected to distribute wella,…,z and space272

60. Collision Handling: (1) Separate ChainingInstead of a hash table, we use a table of linked listkeep a linked list of keys that hash to the same valueh(K) = K mod 10

61. Separate ChainingTo insert a key K

62. Compute h(K) to determine which list to traverse

63. If T[h(K)] contains a null pointer, initiatize this entry to point to a linked list that contains K alone.

64. If T[h(K)] is a non-empty list, we add K at the beginning of this list.

65. To delete a key K

66. compute h(K), then search for K within the list at T[h(K)]. Delete K if it is found.Separate ChainingAssume that we will be storing n keys. Then we should make m the next larger prime number. If the hash function works well, the number of keys in each linked list will be a small constant.

67. Therefore, we expect that each search, insertion, and deletion can be done in constant time.

68. Disadvantage: Memory allocation in linked list manipulation will slow down the program.

69. Advantage: deletion is easy.Collision Handling:(2) Open AddressingOpen addressing:

70. relocate the key K to be inserted if it collides with an existing key. That is, we store K at an entry different from T[h(K)].

71. Two issues arise

72. what is the relocation scheme?

73. how to search for K later?

74. Three common methods for resolving a collision in open addressing

75. Linear probing

76. Quadratic probing

77. Double hashingOpen AddressingTo insert a key K, compute h0(K). If T[h0(K)] is empty, insert it there. If collision occurs, probe alternative cell h1(K), h2(K), .... until an empty cell is found.hi(K) = (hash(K) + f(i)) mod m, with f(0) = 0f: collision resolution strategy

78. Linear Probingf(i) =icells are probed sequentially (with wraparound) hi(K) = (hash(K) + i) mod mInsertion:Let K be the new key to be inserted. We compute hash(K)For i = 0 to m-1compute L = ( hash(K) + I ) mod mT[L] is empty, then we put K there and stop. If we cannot find an empty entry to put K, it means that the table is full and we should report an error.

79. Linear Probinghi(K) = (hash(K) + i) mod mE.g, inserting keys 89, 18, 49, 58, 69 with hash(K)=K mod 10To insert 58, probe T[8], T[9], T[0], T[1]To insert 69, probe T[9], T[0], T[1], T[2]

80. Primary ClusteringWe call a block of contiguously occupied table entries a clusterOn the average, when we insert a new key K, we may hit the middle of a cluster. Therefore, the time to insert K would be proportional to half the size of a cluster. That is, the larger the cluster, the slower the performance. Linear probing has the following disadvantages:Once h(K) falls into a cluster, this cluster will definitely grow in size by one. Thus, this may worsen the performance of insertion in the future.If two cluster are only separated by one entry, then inserting one key into a cluster can merge the two clusters together. Thus, the cluster size can increase drastically by a single insertion. This means that the performance of insertion can deteriorate drastically after a single insertion.Large clusters are easy targets for collisions.

81. Quadratic Probingf(i) = i2hi(K) = ( hash(K) + i2 ) mod mE.g., inserting keys 89, 18, 49, 58, 69 withhash(K) = K mod 10To insert 58, probe T[8], T[9], T[(8+4) mod 10]To insert 69, probe T[9], T[(9+1) mod 10], T[(9+4) mod 10]

82. Quadratic ProbingTwo keys with different home positions will have different probe sequences

83. e.g. m=101, h(k1)=30, h(k2)=29

84. probe sequence for k1: 30,30+1, 30+4, 30+9

85. probe sequence for k2: 29, 29+1, 29+4, 29+9

86. If the table size is prime, then a new key can always be inserted if the table is at least half empty (see proof in text book)

87. Secondary clustering

88. Keys that hash to the same home position will probe the same alternative cells

89. Simulation results suggest that it generally causes less than an extra half probe per search

90. To avoid secondary clustering, the probe sequence need to be a function of the original key value, not the home positionDouble HashingTo alleviate the problem of clustering, the sequence of probes for a key should be independent of its primary position => use two hash functions: hash() and hash2()f(i) = i * hash2(K)E.g. hash2(K) = R - (K mod R), with R is a prime smaller than m

91. Double Hashinghi(K) = ( hash(K) + f(i) ) mod m; hash(K) = K mod mf(i) = i * hash2(K); hash2(K) = R - (K mod R),Example: m=10, R = 7 and insert keys 89, 18, 49, 58, 69To insert 49, hash2(49)=7, 2nd probe is T[(9+7) mod 10]To insert 58, hash2(58)=5, 2nd probe is T[(8+5) mod 10]To insert 69, hash2(69)=1, 2nd probe is T[(9+1) mod 10]

92. Choice of hash2()Hash2() must never evaluate to zero

93. For any key K, hash2(K) must be relatively prime to the table size m. Otherwise, we will only be able to examine a fraction of the table entries.

94. E.g.,if hash(K) = 0 and hash2(K) = m/2, then we can only examine the entries T[0], T[m/2], and nothing else!

95. One solution is to make m prime, and choose R to be a prime smaller than m, and set hash2(K) = R – (K mod R)Quadratic probing, however, does not require the use of a second hash function

96. likely to be simpler and faster in practiceDeletion in open addressingActual deletion cannot be performed in open addressing hash tablesotherwise this will isolate records further down the probe sequenceSolution: Add an extra bit to each table entry, and mark a deleted slot by storing a special value DELETED (tombstone)

Presentation1

More Related Content

What's hot (20)

Viewers also liked (7)

Similar to Presentation1 (20)

More from Saurabh Mishra (8)

Recently uploaded (20)

Presentation1