1. Direct-address tables

Assume all elements’ keys are different and are from \(U={0, 1, …, m-1}\). Then the direct-address table is \(T[0..m-1]\), that is to say, the key is the index.

DIRECT-ADDRESS-SEARCH(T,k)
    return T[k]
DIRECT-ADDRESS-INSERT(T,x)
    T[x.key]=x
DIRECT-ADDRESS-DELETE(T,x)
    T[x.key]=NIL

2. Hash tables

The actual keys set \(K\) is so small relative to the universe of keys set \(U\). So a hash function should be used.

$$h: U\rightarrow \left \{0,1,…,m-1 \right \}$$

\(h(k)\) is the hash value of \(k\)

 

Collision: Two keys may hash to the same slot.

 

Collision resolution by chaining

CHAINED-HASH-INSERT(T,x)
    insert x at the head of list T[h(x,key)]
CHAINED-HASH-SEARCH(T,k)
    search for an element with key k in list T[h(k)]
CHAINED-HASH-DELETE(T,x)
    delete x from the list T[h(x,key)]

Load Factor \(\alpha=n/m\)

Insert: \(O(1)\)

Search:\(O(n)\)

Delete:\(O(1)\)

 

Simple Uniform Hashing:

For\(j=0,1,…,m-1\), denote the length of \(T[j]\) by  \(n_j\), so that \(n=n_0+n_1+…+n_(m-1)\). Then \(E[n_j]=\alpha=n/m\).

If we assume simple uniform hashing, then the average-case time of search is \(\theta(1+\alpha)\).

 

3. Hash Functions

Good hash function: simple uniform hashing

 

Assume the keys are natural numbers. If the keys are not natural numbers, we should interpret them as natural numbers.

 

3.1 The division method

$$h(k)=k\ mod\ m$$

m: a prime not too close to an exact power of 2 is often a good choice

 

3.2 The multiplication method

$$h(k)=\left \lfloor m(kA\ mod\ 1)\right \rfloor$$

m: typically a power of 2

 

The optimal choice of A is\(A\approx(\sqrt(5)-1)/2\approx=0.618\)

 

3.3 Universal hashing

Select a hash function randomly.

universal hash function collection: for different \(k,l \in U\), the number of hash function \(h \in H\) is not larger than \( \left|H\right|/m\).

 

For \(h \in H\),\(E[n_{h(k)}]\)is at most \(1+\alpha\)

 

A universal class of hash functions:

$$H_{pm}=\left \{h_ab:a \in Z^*_p\ and\ b \in Z_p\right \}$$

$$h_{ab}(k)=((ak+b)\ mod\ p)\ mod\ m$$

 

4. Open addressing

Open addressing avoids pointers. All the elements are stored inside the hash table.

So one key may hash to more than one slot. The hash function becomes a probe sequence.

$$\left \{h(k,0), h(k,1),…,h(k,m-1)\right \}$$

HASH-INSERT(T,k)
    i=0
    repeat j←h(k,i)
        j=h(k,i)
        if T[j]==NIL
            T[j]=k
            return j
        else i=i+1
    until i==m
    error "hash table overflow"
HASH-SEARCH(T,k)
    i←0
    repeat j←h(k,i)
        if T[j]==k
            then return j
            i←i+1
    until T[j]=NIL or i=m
    return NIL

 

But it is not easy to delete.

 

Uniform hashing: The probe sequence of each key is equally likely to be any of the \(m!\) permutations of \(<0,1,..,m-1>\).

 

Linear Probing

$$h(k,i)=(h^{‘}(k)+i)\ mod\ m,i=0,1,…,m-1$$

It has a problem known as primary clustering. The average search time increases.

 

Quadratic Probing

$$h(k,i)=(h^{‘}(k)+c_1i+c_2i^2)\ mod\ m$$

Secondary clustering: a milder form of clustering

 

Double Hashing

$$h(k,i)=(h_1(k)+ih_2(k))\ mod\ m$$

\(\alpha=n/m<1\), expected numbers of probe in an unsuccessful search or an insert is at most \(1/{1-\alpha}\)

5. Perfect Hashing

 

If the keys are static, then the worst case time of searching can be \(O(1)\), which is known as perfect hashing.

Perfect hashing has two levels.

The first level is same as chaining.

But the second level is a secondary hash table.

 

Reference

[1] Cormen, T. H., Leiserson, C. E., Rivest, R. L., & Stein, C. (2009). Introduction to algorithms. MIT press.

Leave a Reply

Your email address will not be published. Required fields are marked *