文章作者:Tyan
博客:noahsnail.com | CSDN | 簡書
Item 9: Always override hashCode when you override equals
A common source of bugs is the failure to override the hashCode
method. You must override hashCode
in every class that overrides equals
. Failure to do so will result in a violation of the general contract for Object.hashCode
, which will prevent your class from functioning properly in conjunction with all hash-based collections, including HashMap
, HashSet
, and Hashtable
.
一個常見的錯誤來源是沒有重寫hashCode
方。在每個重寫equals
方法的類中,你必須重寫hashCode
方法。不這樣做會違反Object.hashCode
的通用約定,這會使你的類不能在功能上與所有基于哈希的集合進行恰當的結合,包括HashMap
,HashSet
和Hashtable
。
Here is the contract, copied from the Object
specification [JavaSE6]:
下面是這些約定,從Object
規范中拷貝的[JavaSE6]:
Whenever it is invoked on the same object more than once during an execution of an application, the
hashCode
method must consistently return the same integer, provided no information used inequals
comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.假設同一個對象在進行
equals
比較時沒有修改信息,那么在一個應用執行期間,無論什么時候對同一個對象調用多次hashCode
方法,它的hashCode
方法都必須返回一個一致的整數。這個整數在應用多次執行期間不必保持一致。If two objects are equal according to the
equals
(Object
) method, then calling thehashCode
method on each of the two objects must produce the same integer result.如果兩個對象根據
equals
(Object
)方法是相等的,那么調用每一個對象的hashCode
方法必須產生同樣的整數結果。It is not required that if two objects are unequal according to the
equals
(Object
) method, then calling thehashCode
method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.如果兩個對象根據
equals
(Object
)方法不相等,不要求調用每一個對象的hashCode
方法必須產生同樣的整數結果。然而,程序員應該意識到對于不等的對象產生不同的整數結果可能改善哈希表的性能。
The key provision that is violated when you fail to override hashCode
is the second one: equal objects must have equal hash codes. Two distinct instances may be logically equal according to a class’s equals
method, but to Object
’s hashCode
method, they’re just two objects with nothing much in common. Therefore Object
’s hashCode
method returns two seemingly random numbers instead of two equal numbers as required by the contract.
當不重寫hashCode
時,違反的第二條是關鍵約定:相等對象必須具有相等的哈希值。兩個不同的對象根據類的equals
方法可能在邏輯上是相等的,但對于Object
的hashCode
方法,它們是兩個對象,沒有共同的東西,因此Object
的hashCode
方法返回兩個看似隨機的數字來代替約定要求的相等數字。
For example, consider the following simplistic PhoneNumber
class, whose equals
method is constructed according to the recipe in Item 8:
例如,考慮下面簡化的PhoneNumber
類,它的equals
方法是根據Item 8的流程構建的:
public final class PhoneNumber {
private final short areaCode;
private final short prefix;
private final short lineNumber;
public PhoneNumber(int areaCode, int prefix, int lineNumber) {
rangeCheck(areaCode, 999, "area code");
rangeCheck(prefix, 999, "prefix");
rangeCheck(lineNumber, 9999, "line number");
this.areaCode = (short) areaCode;
this.prefix = (short) prefix;
this.lineNumber = (short) lineNumber;
}
private static void rangeCheck(int arg, int max, String name) {
if (arg < 0 || arg > max)
throw new IllegalArgumentException(name + ": " + arg);
}
@Override
public boolean equals(Object o) {
if (o == this)
return true;
if (!(o instanceof PhoneNumber))
return false;
PhoneNumber pn = (PhoneNumber) o;
return pn.lineNumber == lineNumber && pn.prefix == prefix
&& pn.areaCode == areaCode;
}
// Broken - no hashCode method!
... // Remainder omitted
}
Suppose you attempt to use this class with a HashMap
:
假設你試圖在HashMap
中使用這個類:
Map<PhoneNumber, String> m = new HashMap<PhoneNumber, String>();
m.put(new PhoneNumber(707, 867, 5309), "Jenny");
At this point, you might expect m.get(new PhoneNumber(707, 867, 5309))
to return "Jenny", but it returns null. Notice that two PhoneNumber
instances are involved: one is used for insertion into the HashMap
, and a second, equal, instance is used for (attempted) retrieval. The PhoneNumber
class’s failure to override hashCode
causes the two equal instances to have unequal hash codes, in violation of the hashCode
contract. Therefore the get
method is likely to look for the phone number in a different hash bucket from the one in which it was stored by the put
method. Even if the two instances happen to hash to the same bucket, the get
method will almost certainly return null, as HashMap
has an optimization that caches the hash code associated with each entry and doesn’t bother checking for object equality if the hash codes don’t match.
這時候,你可能期待m.get(new PhoneNumber(707, 867, 5309))
返回Jenny
,但它返回空。注意涉及到兩個PhoneNumber
實例:一個用來插入到HashMap
,第二個相等的實例用來(試圖)檢索。PhoneNumber
類沒有重寫hashCode
方法引起兩個相等的實例有不等的哈希值,違反了hashCode
約定。因此get
方法可能在一個與put
方法儲存的哈希桶不同的哈希桶中查找電話號碼。即使兩個實例碰到哈希到同一個桶中,get
幾乎必定返回空,因為HashMap
緩存了每個輸入相關的哈希嗎,如果哈希碼不匹配,不會檢查對象的相等性。
Fixing this problem is as simple as providing a proper hashCode
method for the PhoneNumber
class. So what should a hashCode
method look like? It’s trivial to write one that is legal but not good. This one, for example, is always legal but should never be used:
修正這個問題很簡單,為PhoneNumber
類提供一個合適的hashCode
方法。因此hashCode
方法應該看起來是什么樣的?編寫一個合法但不好的方法是沒意義的。例如,下面的方法合法但從未被用到:
// The worst possible legal hash function - never use!
@Override
public int hashCode() {
return 42;
}
It’s legal because it ensures that equal objects have the same hash code. It’s atrocious because it ensures that every object has the same hash code. Therefore, every object hashes to the same bucket, and hash tables degenerate to linked lists. Programs that should run in linear time instead run in quadratic time. For large hash tables, this is the difference between working and not working.
它是合法的因為它保證了相等的對象有同樣的哈希值。它是極差的因為它保證了每個對象都有同樣的哈希值。因此,每個對象哈希到相同的桶中,哈希表退化成鏈表。程序從應該運行在線性時間內變成運行在平方時間內。對于打的哈希表,這是工作和不工作的區別。
A good hash function tends to produce unequal hash codes for unequal objects. This is exactly what is meant by the third provision of the hashCode
contract. Ideally, a hash function should distribute any reasonable collection of unequal instances uniformly across all possible hash values. Achieving this ideal can be difficult. Luckily it’s not too difficult to achieve a fair approximation. Here is a simple recipe:
一個好的哈希函數對于不等的對象趨向于產生不等的哈希值。這與hashCode
約定中的第三條是一個意思。理想情況下,一個哈希函數應該將任何合理的不等的實例集合,統一散列在所有可能的哈希值上。要取得這樣的目標是非常困難的。幸運的是不難取得一個公平的近似。下面是簡單的流程:
Store some constant nonzero value, say, 17, in an
int
variable calledresult
.For each significant field
f
in your object (each field taken into account by theequals
method, that is), do the following:
a. Compute an int
hash code c
for the field:
i. If the field is a boolean
, compute (f ? 1 : 0)
.
ii. If the field is a byte
, char
, short
, or int
, compute (int) f
.
iii. If the field is a long
, compute (int)(f^(f>>>32))
.
iv. If the field is a float
, compute Float.floatToIntBits(f)
.
v. If the field is a double
, compute Double.doubleToLongBits(f)
, and then hash the resulting long
as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals
method compares the field by recursively invoking equals
, recursively invoke hashCode
on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode
on the canonical representation. If the value of the field is null
, return 0
(or some other constant, but 0
is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each significant element by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode
methods added in release 1.5.
b. Combine the hash code c
computed in step 2.a into result as follows: result = 31 * result + c
;
Return result.
When you are finished writing the
hashCode
method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition! If equal instances have unequal hash codes, figure out why and fix the problem.存儲一些非零常量值,例如17,存儲在變量名為
result
的int
變量中。對于對象中每一個有意義的字段
f
(每一個equals
方法考慮的字段),按以下做法去做:
a. 為這個字段計算一個int
型的哈希碼c
:
i. 如果這個字段是一個boolean
,計算(f ? 1 : 0)
。
ii. 如果這個字段是一個byte
,char
,short
或int
,計算(int) f
。
iii. 如果這個字段是一個long
,計算(int)(f^(f>>>32))
。
iv. 如果這個字段是一個float
,計算Float.floatToIntBits(f)
。
v. 如果這個字段是一個double
,計算Double.doubleToLongBits(f)
,然后對結果long
進行2.a.iii處理。
vi. 如果這個字段是一個對象引用并且這個類的equals
方法通過遞歸調用equals
方法來比較這個字段,那么對這個字段遞歸的調用hashCode
方法。如果需要更復雜的比較,為這個字段計算一個“標準表示”然后在標準表示上調用hashCode
方法。如果字段值為null
,返回0
(或一些其它常量,但0
是傳統表示).
vii. 如果字段是一個數組,將它每一個元素看做是一個單獨的字段。也就是說,通過遞歸的應用這些規則為每一個有效元素計算一個哈希值,并結合這些值對每一個用步驟2.b處理。如果數組的每個元素都是有意義的,你可以用JDK 1.5中的Arrays.hashCode
方法。
b. 結合步驟2.a計算的哈希碼c
得到結果如下:result = 31 * result + c
;
返回結果。
當你完成了
hashCode
方法的編寫后,問一下自己相等的對象是否有相同的哈希碼。寫單元測試來驗證你的直覺!如果相等的實例有不等的哈希碼弄明白為什么并修正這個問題。
You may exclude redundant fields from the hash code computation. In other words, you may ignore any field whose value can be computed from fields included in the computation. You must exclude any fields that are not used in equals
comparisons, or you risk violating the second provision of the hashCode
contract.
你可以從哈希碼計算中排除冗余字段。換句話說,你可以忽略那些可以從根據計算中的字段計算出值的字段。你必須排除那些equals
比較沒有使用的字段,或者你冒險違反hashCode
約定中的第二條。
A nonzero initial value is used in step 1 so the hash value will be affected by initial fields whose hash value, as computed in step 2.a, is zero. If zero were used as the initial value in step 1, the overall hash value would be unaffected by any such initial fields, which could increase collisions. The value 17 is arbitrary.
步驟1中使用了一個非零初始值,因此哈希值會受到哈希值為0的最初字段的影響,最初字段的哈希值是在步驟2.a中計算的。如果0作為初始值在步驟1中使用,全部的哈希值將不受任何這樣的最初字段的影響,這將會增加哈希碰撞。
The multiplication in step 2.b makes the result depend on the order of the fields, yielding a much better hash function if the class has multiple similar fields. For example, if the multiplication were omitted from a String
hash function, all anagrams would have identical hash codes. The value 31 was chosen because it is an odd prime. If it were even and the multiplication overflowed, information would be lost, as multiplication by 2 is equivalent to shifting. The advantage of using a prime is less clear, but it is traditional. A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance: 31 * i == (i << 5) - i
. Modern VMs do this sort of optimization automatically.
Let’s apply the above recipe to the PhoneNumber
class. There are three significant fields, all of type short:
步驟2.b中的乘積使結果依賴于字段的順序,如果這個類有多個相似的字段會取得一個更好的哈希函數。例如,String
哈希函數忽略了乘積,所有的字母順序將有相同的哈希碼。選擇值31是因為它是一個奇素數。如果它是偶數并且乘積溢出,會損失信息,因為與2想乘等價于位移運算。使用一個素數的優勢不是那么明顯,但習慣上都使用素數。31的一個很好的特性是乘積可以用位移和減法運算替換從而取得更好的性能:31 * i == (i << 5) - i
。現代的虛擬機能自動進行排序的優化。讓我們對PhoneNumber
類應用上面的步驟。這兒有三個字段,所有的類型縮寫:
@Override public int hashCode() {
int result = 17;
result = 31 * result + areaCode;
result = 31 * result + prefix;
result = 31 * result + lineNumber;
return result;
}
Because this method returns the result of a simple deterministic computation whose only inputs are the three significant fields in a PhoneNumber
instance, it is clear that equal PhoneNumber
instances have equal hash codes. This method is, in fact, a perfectly good hashCode
implementation for PhoneNumber
, on a par with those in the Java platform libraries. It is simple, reasonably fast, and does a reasonable job of dispersing unequal phone numbers into different hash buckets.
因為這個方法返回一個簡單的確定性運算的結果,唯一的輸入是PhoneNumber
實例中的三個有效字段,很明顯相等的PhoneNumber
有相等的哈希值。事實上,這個方法對于PhoneNumber
來說是一個完美的很好的hashCode
實現,與Java平臺庫的實現是等價的。它是簡單的,相當的快,做者合理的工作——將不等的電話號碼分散到不同的哈希桶里。
If a class is immutable and the cost of computing the hash code is significant, you might consider caching the hash code in the object rather than recalculating it each time it is requested. If you believe that most objects of this type will be used as hash keys, then you should calculate the hash code when the instance is created. Otherwise, you might choose to lazily initialize it the first time hashCode
is invoked (Item 71). It is not clear that our PhoneNumber
class merits this treatment, but just to show you how it’s done:
如果一個類是不可變的,計算哈希碼的代價是很明顯的,你可能想緩存對象中的哈希碼而不是每次請求時重新計算它。如果你認為這種類型的大多數對象將作為哈希鍵使用,那當實例創建時你應該計算哈希碼。此外,當第一次調用hashCode
時(Item 71),你可以選擇延遲初始化。我們的PhoneNumber
類進行這樣處理的優點不是很明顯,但可以顯示一下它是怎么做的:
// Lazily initialized, cached hashCode
private volatile int hashCode; // (See Item 71)
@Override
public int hashCode() {
int result = hashCode;
if (result == 0) {
result = 17;
result = 31 * result + areaCode;
result = 31 * result + prefix;
result = 31 * result + lineNumber;
hashCode = result;
}
return result;
}
While the recipe in this item yields reasonably good hash functions, it does not yield state-of-the-art hash functions, nor do the Java platform libraries provide such hash functions as of release 1.6. Writing such hash functions is a research topic, best left to mathematicians and theoretical computer scientists. Perhaps a later release of the platform will provide state-of-the-art hash functions for its classes and utility methods to allow average programmers to construct such hash functions. In the meantime, the techniques described in this item should be adequate for most applications.
雖然在本條目中這些步驟取得了合理的好的哈希函數,但它不是最新的哈希函數,也不是Java 1.6平臺庫提供的哈希函數。寫這樣一個哈希函數是一個研究課題,最好留給數學家和理論科學家。也許Java平臺后面的版本會為它的類和工具方法提供最新的哈希函數來允許普通的程序員構建這樣的哈希函數。同時,本條目描述的技術應該足夠滿足大部分應用了。
Do not be tempted to exclude significant parts of an object from the hash code computation to improve performance. While the resulting hash function may run faster, its poor quality may degrade hash tables’ performance to the point where they become unusably slow. In particular, the hash function may, in practice, be confronted with a large collection of instances that differ largely in the regions that you’ve chosen to ignore. If this happens, the hash function will map all the instances to a very few hash codes, and hash-based collections will display quadratic performance. This is not just a theoretical problem. The String
hash function implemented in all releases prior to 1.2 examined at most sixteen characters, evenly spaced throughout the string, starting with the first character. For large collections of hierarchical names, such as URLs, this hash function displayed exactly the pathological behavior noted here.
不要試圖將對象的有效部分排除在哈希碼計算之外來提高性能。雖然最終結果的哈希函數可能運行更快,但它的質量很差可能會降低哈希表的性能,使哈希表變成慢的不可用的狀態。尤其是在實踐中,哈希函數可能面臨在你選擇忽略的區域中存在很大不同的實例集合。如果這種情況發生了,哈希函數會映射所有的實例到一個非常小的哈希碼上,基于哈希的集合的性能將會變成平方級的。這不僅僅是一個理論問題。String
哈希函數在1.2之前的實現中,最多檢查16個字符,整個字符串等間距,從第一個字符開始。對于名字分層的大集合,例如URLs,哈希函數正好展現了這里提到的病態行為。
Many classes in the Java platform libraries, such as String
, Integer
, and Date
, include in their specifications the exact value returned by their hashCode
method as a function of the instance value. This is generally not a good idea, as it severely limits your ability to improve the hash function in future releases. If you leave the details of a hash function unspecified and a flaw is found or a better hash function discovered, you can change the hash function in a subsequent release, confident that no clients depend on the exact values returned by the hash function.
Java平臺庫中的許多類,例如String
,Integer
和Date
,包含了類規范中它們的hashCode
方法返回的確定值。這通常不是一個好注意,因為它嚴重限制了你在將來版本中改進哈希函數的能力。如果沒有指定哈希函數的細節,當發現有缺陷或一個更好的哈希函數時,你可以在接下來的版本中改變哈希函數,確信沒有用戶依賴哈希函數返回的確定值。