From cnblogs yangecnu / cnblogs 有心故我在 / wikipedia 2-3 tree
定義
The 2-3 tree is also a search tree like the binary search tree, but this tree tries to solve the problem of the unbalanced tree.
Imagine that you have a binary tree to store your data. The worst possible case for the binary tree is that all of the data is entered in order. Then the tree would look like this:
This tree has basically turned into a linked list. This is definitely a problem, for with a tree unbalanced like this, all of the advantages of the binary search tree disappear: searching the tree is slow and cumbersome, and there is much wasted memory because of the empty left child pointers.
In computer science, a 2–3 tree is a tree data structure, where every node with children internal node has either two children (2-node) and one data element or three children (3-nodes) and two data elements. According to Knuth, "a B-tree of order 3 is a 2-3 tree." Nodes on the outside of the tree leaf nodes have no children and one or two data elements. 2?3 trees were invented by John Hopcroft in 1970.
2 node
3 node
The 2-3 tree tries to solve this by using a different structure and slightly different adding and removing procedure to help keep the tree more or less balanced. The biggest drawback with the 2-3 tree is that it requires more storage space than the normal binary search tree.
-
One big difference with the 2-3 tree is that each node can have up to two data fields. You can see the three children extending from between the two data fields.
2–3 trees are balanced, meaning that each right, center, and left subtree contains the same or close to the same amount of data.
<Wikipedia>
We say that an internal node is a 2-node if it has one data element and two children.
We say that an internal node is a 3-node if it has two data elements and three children.
We say that T is a 2–3 tree if and only if one of the following statements hold:
- T is empty. In other words, T does not have any nodes.
- T is a 2-node with data element a. If T has left child L and right child R, then
L and R are non-empty 2–3 trees of the same height;
a is greater than each element in L; and
a is less than or equal to each data element in R. - T is a 3-node with data elements a and b, where a < b. If T has left child L, middle child M, and right child R, then
L, M, and R are non-empty 2–3 trees of equal height;
a is greater than each data element in L and less than or equal to each data element in M; and
b is greater than each data element in M and less than or equal to each data element in R.
一棵2-3樹具有下例性質(zhì):
- 一個(gè)節(jié)點(diǎn)包含一個(gè)或者兩個(gè)關(guān)鍵碼;
- 每個(gè)內(nèi)部節(jié)點(diǎn)有2個(gè)子女(如果它包含一個(gè)關(guān)鍵碼),或者3個(gè)子女(包含2個(gè)關(guān)鍵碼);
- 所有葉子節(jié)點(diǎn)在樹的同一層,因此樹總是高度平衡的。
- 2-3樹每一個(gè)節(jié)點(diǎn)的左子樹中所有后繼節(jié)點(diǎn)的值都小于其父節(jié)點(diǎn)第一個(gè)關(guān)鍵碼的值;
- 而中間子樹所有后繼節(jié)點(diǎn)的值都大于或等于其父節(jié)點(diǎn)第一個(gè)關(guān)鍵碼的值而小于第二個(gè)關(guān)鍵碼的值;
- 如果有右子樹,則右子樹所有后繼節(jié)點(diǎn)都大于或等于其父節(jié)點(diǎn)第二個(gè)關(guān)鍵碼的值。
另一種解釋:
- 對(duì)于2節(jié)點(diǎn),該節(jié)點(diǎn)保存一個(gè)key及對(duì)應(yīng)value,以及兩個(gè)指向左右節(jié)點(diǎn)的節(jié)點(diǎn),左節(jié)點(diǎn)也是一個(gè)2-3節(jié)點(diǎn),所有的值都比key有效,有節(jié)點(diǎn)也是一個(gè)2-3節(jié)點(diǎn),所有的值比key要大。
- 對(duì)于3節(jié)點(diǎn),該節(jié)點(diǎn)保存兩個(gè)key及對(duì)應(yīng)value,以及三個(gè)指向左中右的節(jié)點(diǎn)。左節(jié)點(diǎn)也是一個(gè)2-3節(jié)點(diǎn),所有的值均比兩個(gè)key中的最小的key還要小;中間節(jié)點(diǎn)也是一個(gè)2-3節(jié)點(diǎn),中間節(jié)點(diǎn)的key值在兩個(gè)跟節(jié)點(diǎn)key值之間;右節(jié)點(diǎn)也是一個(gè)2-3節(jié)點(diǎn),節(jié)點(diǎn)的所有key值比兩個(gè)key中的最大的key還要大。
如果中序遍歷2-3查找樹,就可以得到排好序的序列。在一個(gè)完全平衡的2-3查找樹中,根節(jié)點(diǎn)到每一個(gè)為空節(jié)點(diǎn)的距離都相同。
查找
在進(jìn)行2-3樹的平衡之前,我們先假設(shè)已經(jīng)處于平衡狀態(tài),我們先看基本的查找操作。
2-3樹的查找和二叉查找樹類似,要確定一個(gè)樹是否屬于2-3樹,我們首先和其跟節(jié)點(diǎn)進(jìn)行比較,如果相等,則查找成功;否則根據(jù)比較的條件,在其左中右子樹中遞歸查找,如果找到的節(jié)點(diǎn)為空,則未找到,否則返回。查找過程如下圖:
插入

一、往一個(gè)2-node節(jié)點(diǎn)插入
往2-3樹中插入元素和往二叉查找樹中插入元素一樣,首先要進(jìn)行查找,然后將節(jié)點(diǎn)掛到未找到的節(jié)點(diǎn)上。2-3樹之所以能夠保證在最差的情況下的效率的原因在于其插入之后仍然能夠保持平衡狀態(tài)。如果查找后未找到的節(jié)點(diǎn)是一個(gè)2-node節(jié)點(diǎn),那么很容易,我們只需要將新的元素放到這個(gè)2-node節(jié)點(diǎn)里面使其變成一個(gè)3-node節(jié)點(diǎn)即可。但是如果查找的節(jié)點(diǎn)結(jié)束于一個(gè)3-node節(jié)點(diǎn),那么可能有點(diǎn)麻煩。
二、往一個(gè)3-node節(jié)點(diǎn)插入
往一個(gè)3-node節(jié)點(diǎn)插入一個(gè)新的節(jié)點(diǎn)可能會(huì)遇到很多種不同的情況,下面首先從一個(gè)最簡(jiǎn)單的只包含一個(gè)3-node節(jié)點(diǎn)的樹開始討論。
(1)只包含一個(gè)3-node節(jié)點(diǎn)
如上圖,假設(shè)2-3樹只包含一個(gè)3-node節(jié)點(diǎn),這個(gè)節(jié)點(diǎn)有兩個(gè)key,沒有空間來插入第三個(gè)key了,最自然的方式是我們假設(shè)這個(gè)節(jié)點(diǎn)能存放三個(gè)元素,暫時(shí)使其變成一個(gè)4-node節(jié)點(diǎn),同時(shí)他包含四個(gè)子節(jié)點(diǎn)。然后,我們將這個(gè)4-node節(jié)點(diǎn)的中間元素提升,左邊的節(jié)點(diǎn)作為其左節(jié)點(diǎn),右邊的元素作為其右節(jié)點(diǎn)。插入完成,變?yōu)槠胶?-3查找樹,樹的高度從0變?yōu)?。
(2)節(jié)點(diǎn)是3-node,父節(jié)點(diǎn)是2-node
和第一種情況一樣,我們也可以將新的元素插入到3-node節(jié)點(diǎn)中,使其成為一個(gè)臨時(shí)的4-node節(jié)點(diǎn),然后,將該節(jié)點(diǎn)中的中間元素提升到父節(jié)點(diǎn)即2-node節(jié)點(diǎn)中,使其父節(jié)點(diǎn)成為一個(gè)3-node節(jié)點(diǎn),然后將左右節(jié)點(diǎn)分別掛在這個(gè)3-node節(jié)點(diǎn)的恰當(dāng)位置。操作如下圖:
(3)節(jié)點(diǎn)是3-node,父節(jié)點(diǎn)也是3-node
當(dāng)我們插入的節(jié)點(diǎn)是3-node的時(shí)候,我們將該節(jié)點(diǎn)拆分,中間元素提升至父節(jié)點(diǎn),但是此時(shí)父節(jié)點(diǎn)是一個(gè)3-node節(jié)點(diǎn),插入之后,父節(jié)點(diǎn)變成了4-node節(jié)點(diǎn),然后繼續(xù)將中間元素提升至其父節(jié)點(diǎn),直至遇到一個(gè)父節(jié)點(diǎn)是2-node節(jié)點(diǎn),然后將其變?yōu)?-node,不需要繼續(xù)進(jìn)行拆分。
(4)根節(jié)點(diǎn)分裂
當(dāng)根節(jié)點(diǎn)到字節(jié)點(diǎn)都是3-node節(jié)點(diǎn)的時(shí)候,這是如果我們要在字節(jié)點(diǎn)插入新的元素的時(shí)候,會(huì)一直查分到跟節(jié)點(diǎn),在最后一步的時(shí)候,跟節(jié)點(diǎn)變成了一個(gè)4-node節(jié)點(diǎn),這個(gè)時(shí)候,就需要將跟節(jié)點(diǎn)查分為兩個(gè)2-node節(jié)點(diǎn),樹的高度加1,這個(gè)操作過程如下:
(5)本地轉(zhuǎn)換
將一個(gè)4-node拆分為2-3node涉及到6種可能的操作。這4-node可能在跟節(jié)點(diǎn),也可能是2-node的左子節(jié)點(diǎn)或者右子節(jié)點(diǎn)。或者是一個(gè)3-node的左,中,右子節(jié)點(diǎn)。所有的這些改變都是本地的,不需要檢查或者修改其他部分的節(jié)點(diǎn)。所以只需要常數(shù)次操作即可完成2-3樹的平衡。
(6)性質(zhì)
這些本地操作保持了2-3樹的平衡。對(duì)于4-node節(jié)點(diǎn)變形為2-3節(jié)點(diǎn),變形前后樹的高度沒有發(fā)生變化。只有當(dāng)跟節(jié)點(diǎn)是4-node節(jié)點(diǎn),變形后樹的高度才加一。如下圖所示:
分析
完全平衡的2-3查找樹如下圖,每個(gè)根節(jié)點(diǎn)到葉子節(jié)點(diǎn)的距離是相同的:
2-3樹的查找效率與樹的高度是息息相關(guān)的。
- 在最壞的情況下,也就是所有的節(jié)點(diǎn)都是2-node節(jié)點(diǎn),查找效率為lgN
-
在最好的情況下,所有的節(jié)點(diǎn)都是3-node節(jié)點(diǎn),查找效率為log3N約等于0.631lgN
距離來說,對(duì)于1百萬個(gè)節(jié)點(diǎn)的2-3樹,樹的高度為12-20之間,對(duì)于10億個(gè)節(jié)點(diǎn)的2-3樹,樹的高度為18-30之間。
對(duì)于插入來說,只需要常數(shù)次操作即可完成,因?yàn)樗恍枰薷呐c該節(jié)點(diǎn)關(guān)聯(lián)的節(jié)點(diǎn)即可,不需要檢查其他節(jié)點(diǎn),所以效率和查找類似。下面是2-3查找樹的效率:
實(shí)現(xiàn)
直接實(shí)現(xiàn)2-3樹比較復(fù)雜,因?yàn)椋?br>
1.需要處理不同的節(jié)點(diǎn)類型,非常繁瑣
2.需要多次比較操作來將節(jié)點(diǎn)下移
3.需要上移來拆分4-node節(jié)點(diǎn)
4.拆分4-node節(jié)點(diǎn)的情況有很多種
2-3查找樹實(shí)現(xiàn)起來比較復(fù)雜,在某些情況插入后的平衡操作可能會(huì)使得效率降低。在2-3查找樹基礎(chǔ)上改進(jìn)的紅黑樹不僅具有較高的效率,并且實(shí)現(xiàn)起來較2-3查找樹簡(jiǎn)單。