Assume all elements’ keys are different and are from $$U={0, 1, …, m-1}$$. Then the direct-address table is $$T[0..m-1]$$, that is to say, the key is the index.

return T[k]
T[x.key]=x
T[x.key]=NIL

2. Hash tables

The actual keys set $$K$$ is so small relative to the universe of keys set $$U$$. So a hash function should be used.

$$h: U\rightarrow \left \{0,1,…,m-1 \right \}$$

$$h(k)$$ is the hash value of $$k$$

Collision: Two keys may hash to the same slot.

Collision resolution by chaining

CHAINED-HASH-INSERT(T,x)
insert x at the head of list T[h(x,key)]
CHAINED-HASH-SEARCH(T,k)
search for an element with key k in list T[h(k)]
CHAINED-HASH-DELETE(T,x)
delete x from the list T[h(x,key)]

Load Factor $$\alpha=n/m$$

Insert: $$O(1)$$

Search:$$O(n)$$

Delete:$$O(1)$$

Simple Uniform Hashing:

For$$j=0,1,…,m-1$$, denote the length of $$T[j]$$ by  $$n_j$$, so that $$n=n_0+n_1+…+n_(m-1)$$. Then $$E[n_j]=\alpha=n/m$$.

If we assume simple uniform hashing, then the average-case time of search is $$\theta(1+\alpha)$$.

3. Hash Functions

Good hash function: simple uniform hashing

Assume the keys are natural numbers. If the keys are not natural numbers, we should interpret them as natural numbers.

3.1 The division method

$$h(k)=k\ mod\ m$$

m: a prime not too close to an exact power of 2 is often a good choice

3.2 The multiplication method

$$h(k)=\left \lfloor m(kA\ mod\ 1)\right \rfloor$$

m: typically a power of 2

The optimal choice of A is$$A\approx(\sqrt(5)-1)/2\approx=0.618$$

3.3 Universal hashing

Select a hash function randomly.

universal hash function collection: for different $$k,l \in U$$, the number of hash function $$h \in H$$ is not larger than $$\left|H\right|/m$$.

For $$h \in H$$,$$E[n_{h(k)}]$$is at most $$1+\alpha$$

A universal class of hash functions:

$$H_{pm}=\left \{h_ab:a \in Z^*_p\ and\ b \in Z_p\right \}$$

$$h_{ab}(k)=((ak+b)\ mod\ p)\ mod\ m$$

Open addressing avoids pointers. All the elements are stored inside the hash table.

So one key may hash to more than one slot. The hash function becomes a probe sequence.

$$\left \{h(k,0), h(k,1),…,h(k,m-1)\right \}$$

HASH-INSERT(T,k)
i=0
repeat j←h(k,i)
j=h(k,i)
if T[j]==NIL
T[j]=k
return j
else i=i+1
until i==m
error "hash table overflow"
HASH-SEARCH(T,k)
i←0
repeat j←h(k,i)
if T[j]==k
then return j
i←i+1
until T[j]=NIL or i=m
return NIL

But it is not easy to delete.

Uniform hashing: The probe sequence of each key is equally likely to be any of the $$m!$$ permutations of $$<0,1,..,m-1>$$.

Linear Probing

$$h(k,i)=(h^{‘}(k)+i)\ mod\ m,i=0,1,…,m-1$$

It has a problem known as primary clustering. The average search time increases.

$$h(k,i)=(h^{‘}(k)+c_1i+c_2i^2)\ mod\ m$$

Secondary clustering: a milder form of clustering

Double Hashing

$$h(k,i)=(h_1(k)+ih_2(k))\ mod\ m$$

$$\alpha=n/m<1$$, expected numbers of probe in an unsuccessful search or an insert is at most $$1/{1-\alpha}$$

5. Perfect Hashing

If the keys are static, then the worst case time of searching can be $$O(1)$$, which is known as perfect hashing.

Perfect hashing has two levels.

The first level is same as chaining.

But the second level is a secondary hash table.

Reference

 Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms. MIT press.