Week06 Hashing Notes

Hashing

Hash

A dish of chopped meat, potatoes, and sometimes vegetables, usually browned.

#

Hash

Hashing in Java

public class Object {

    public Object() {}

    public final native Class<?> getClass();

    public native int hashCode();

    public boolean equals(Object obj) {
        return (this == obj);
    }

    protected native Object clone() throws CloneNotSupportedException;

    public String toString() {
        return getClass().getName() + "@" + Integer.toHexString(hashCode());
    }

    // ...

}

Calculate the hashcode of a compound object, used by Objects.hash(Object ...)

    public static int hashCode(Object a[]) {
        if (a == null)
            return 0;

        int result = 1;

        for (Object element : a)
            result = 31 * result + (element == null ? 0 : element.hashCode());

        return result;
    }

The default hashcode of an object in Java is its memory address

Benefits of coprime

result = 31 * result + element.hashCode();

Why we are using a prime number here ?

We like the hash code distribute as evenly as possible after MOD so that the chance of two hash code fall in one bucket is low.

Assume the data is periodical, for a division hash code, the bucket id repeats after the Least Common Multiple of the bucket size and the period of the data.

Bad Case (Common Factor)

Suppose:

  • Bucket size = 6

  • Hash codes generated are multiples of 3 (e.g., 3, 6, 9, …)

  • hash % 6 will always yield {0, 3} → only 1/3 of the buckets are used!

0
1
2
3
4
5

0

3

6

9

12

15

Good Case (Coprime)

  • Bucket size = 5 (a prime number)

  • Hash codes generated as before (multiples of 3)

  • hash % 5 results in a more even spread across all buckets.

0
1
2
3
4

0

3

6

9

12

15

Prime Numbers and the Magic Cicada

Periodical Cicadas A collective term for the seven species of cicadas in the genus Magicicada (magic cicadas) of the Cicadidae family, distributed in the eastern part of North America.

Periodical Cicadas
Nymph skin shedded

Cicadas spend most of their lives underground, only coming up to the surface to mate and lay eggs. They crawl up from the ground as nymphs (shown here). The back of the nymph splits open, the old exoskeleton falls away, and a winged, fully mature adult emerges.

Period

17-year

13-year

After hatching, the nymphs burrow underground, spending the majority of their lives beneath the surface, surviving by feeding on tree root sap.

A hypothesis about their prime-numbered life cycles : help them evade predators.

If cicada predators have specific breeding cycles, such as 2 years, 3 years, 4 years, 5 years, or 6 years, cicadas with prime-numbered cycles, such as the 13-year and 17-year cicadas, can avoid predators more effectively. For instance, if periodical cicadas initially encounter a certain predator, they will only meet predators with 1-year and 17-year cycles again after 17 years. To encounter a predator with a 2-year cycle, they would need to wait 34 years, by which time the predator population may have significantly declined.

Data Structures used in Buckets

Linked list

Red-Black Tree

Further Reading

Implementing Hash Tables in C

A very lately progress on theoretic Hash table performance

Let x be the inverse of (1 - fullness) of a hash table, e.g. for 99% fullness, x = 100, for 99.9% fullness, x = 1000

"In a 1985 paper(opens a new tab), the computer scientist Andrew Yao(opens a new tab), who would go on to win the A.M. Turing Award, asserted that among hash tables with a specific set of properties, the best way to find an individual element or an empty spot is to just go through potential spots randomly — an approach known as uniform probing. He also stated that, in the worst-case scenario, where you’re searching for the last remaining open spot, you can never do better than x. for 40 years, most computer scientists assumed that Yao’s conjecture was true.

Krapivin was not held back by the conventional wisdom for the simple reason that he was unaware of it. “I did this without knowing about Yao’s conjecture,” he said. His explorations with tiny pointers led to a new kind of hash table — one that did not rely on uniform probing. And for this new hash table, the time required for worst-case queries and insertions is proportional to (log x)^2 — far faster than x. This result directly contradicted Yao’s conjecture. Farach-Colton and Kuszmaul helped Krapivin show that (log x)^2 is the optimal, unbeatable bound for the popular class of hash tables Yao had written about."

"In this paper, we revisit one of the simplest problems in data structures: the task of inserting elements into an open-addressed hash table so that elements can later be retrieved with as few probes as possible. We show that, even without reordering elements over time, it is possible to construct a hash table that achieves far better expected probe complexities (both amortized and worst-case) than were previously thought possible. Along the way, we disprove the central conjecture left by Yao in his seminal paper “Uniform Hashing is Optimal” "

Last updated