初步了解 non-cryptographic hash
hash 方法的定义 1
分类
0) cryptographic hash 1) non-cryptographic hash
好的标准是什么
A good hash function satisfies two basic properties:
1) it should be very fast to compute;
2) it should minimize duplication of output values (collisions).
常用算法
1) hashing by division
h(key) = key mod table_size
- Since it requires only a single division operation, hashing by division is quite fast.
- When using the division method, we usually avoid certain values of table_size like table_size should not be a power of a number suppose r, since if table_size = r^p, then h(key) is just the p lowest-order bits of key.
- A prime not too close to an exact power of 2 is often good choice for table_size.
2) hashing by multiplication
h(k) = floor (m * frac (k * c))
- floor(x), yields the integer part of the real number x, and frac(x) yields the fractional part.
- An advantage of the multiplication method is that the value of m is not critical, we typically choose it to be a power of 2 (m = 2p for some integer p).
- the word size of the machine is w bits and that key fits into a single word.
- c to be a fraction of the form s / (2w), where s is an integer in the range 0 < s < 2w.
实例 2
Suppose k = 123456, p = 14,
m = 2^14 = 16384, and w = 32.
Adapting Knuth’s suggestion, c to be fraction of the form s / 2^32.
Then key * s = 327706022297664 = (76300 * 2^32) + 17612864,
The 14 most significant bits of r0 yield the value h(key) = 67.
17612864 = 0x010CC040 =
0000 0001 0000 1100 1100 0000 0100 0000
Most significant 14 bits of that is
0000 0001 0000 11
测试下
设计完后, 用 smhasher
测试一下。 3
-
A hash function is any function that can be used to map data of arbitrary size to fixed-size values. The values returned by a hash function are called hash values, hash codes, digests, or simply hashes. The values are usually used to index a fixed-size table called a hash table. Use of a hash function to index a hash table is called hashing or scatter storage addressing. ↩