Introduction to data.tree
the data.tree package lets you create hierarchies,called data.tree structures.
traversal search sort operations, recursive tree programming
decorate Nodes with own fields and methods, extend the package to your nodes
neatly printing and plotting trees
conversion from and to data.frames lists and other tree structures such as dendrogram phylo objects from ape packages igraph and other
data.tree structures are bi-directional,ordered trees.
Node
#attribute: active,field,method. Get
#active: (property) a field on a node like node$position(node本身有的字段)
#field: a named value on a Node like node$cost <- 2500(用戶自定義的字段)
#method: a function acting on an node like node$Revert() (OO) or Revert(node) (traditional)
#inheritance: a child Node inherits attribute from one of its ancestors. Get,SetNodeStyle```
#Tree creation
建造樹的時候需要注意的是,只能使用$AddChild建造下一層的樹葉子,不能跨層建立
建造的時候可以指定此次建造的葉子的變量,可以使用這個變量用$AddChild建造下一層的葉子
library(data.tree)
acme <- Node$new("Acme")
accounting <- acme$AddChild("Accounting")
software <- accounting$AddChild("software")
print(acme)
each Node is identified by its name
the name needs to be unique among siblings,such that paths to Nodes are unambiguous(清楚的)
如果不指定每個Node的name,那么將會使用AddChild()里面的默認字符串作為name
acme$AddChild("data")
acme$Get("name")
另外一種獲取name的方法
acme$children[[1]]$name
輸出字符串 Acme Accounting software data
向量 "Acme" "Accounting" "software" "data"
那我就可以寫一個算法,將所有name存在一樣的先改過來,不改變字符串的值,只是改變name
已經搞定
從data.frame建造樹
要點是要確定數據框數據的pathString
library(data.tree)
library(treemap)
data(GNI2014)
head(GNI2014)
這一步是關鍵
GNI2014$pathString <- paste("world",GNI2014$continent,GNI2014$country,sep="/")
population <- as.Node(GNI2014)
print的打印的時候可以選擇的參數
print(population,"iso3","population","GNI",limit=30)
Node exhibits reference semantics.R中所有的變量都是引用,直接修改當地的值
OO-style actives: Node$isRoot
OO-style methods: Node$Prune(pruneFun)
Classical R methods: Clone(node)
對,可以先復制過來樹,然后建造刪除函數,把每層位置大于10的節點都刪除掉
然后給每一層添加一個省略號節點,然后再繪圖
paste(rep(".",3),sep = "")
只顯示15個項目的內容
print(population, limit = 15)```
Actives
#population$isRoot
#population$height
#children of node 可以用這個進行判斷,如果children大于10就開始進行調用那個刪除函數
population$count
#total node in the tree
population$totalCount
#population$fields
#population$fieldsAll
#population$averageBranchFactor```
#OO-style Methods
population$Get("population",filterFun = isLeaf)
population$Prune(pruneFun = function(x) !x$isLeaf || x$population>10000)
那我要定義的刪除函數就是刪除每個節點下位置大于10的節點
population$Get("population",filterFun = isLeaf)```
Traditional R methods
popClone <- Clone(acme)```
#tree navigation
by path 一層一層的索引
acme$IT$outsource
by position通過位置
acme$children[[1]]$children[[2]]$name
by fields
navigate by name, also by other fields with Climb method.
levelName
1 Acme Inc.
2 |--Accounting
3 | |--New Software
4 | °--New Accounting Standards
5 |--Research
6 | |--New Product Line
7 | °--New Labs
8 °--IT
9 |--Outsource
10 |--Go agile
11 °--Switch to R
acme$Climb(position=1,name="software")$path
and as a shortcut you can climb mutiple levels with a single argument
tree <- CreateRegularTree(5,5)
tree$Climb(position=c(2,3,4))$name
下面是一個比較增深理解的例子,先找到根節點的2位置的child,然后找到child的3位置的child
然后找到名字是1.2.3.4的child,繼續最后尋找到名字是1.2.3.4.5的child
tree$Climb(position=c(2,3),name=c("1.2.3.4","1.2.3.4.5"))$path```
Custom fields
#we can add any custom field to any Node in a data.frame structure
software$cost <- 1000000
software$p <- 0.5
print(acme, "cost", "p")
#and there is a list of reserved names you cannot use as Node fields
#custom fields in constructor
#assign custom fields in the constructor or in the Node$AddChild method
birds <- Node$new("Aves", vulgo = "Bird")
birds$AddChild("Neognathae", vulgo = "New Jaws", species = 10000)
birds$AddChild("Palaeognathae", vulgo = "Old Jaws", species = 60)
print(birds, "vulgo", "species")
#custom fields as function
#setting a function as a field
#計算子節點這個字段的總和
birds$species <- function(self) sum(sapply(self$children, function(x) x$species))
print(birds, "species")
#and data.tree maps the self argument to the Node at hand
#this and Set method and recursion(遞歸), becomes a very powerful tool```
#Printing
on the left, you have the hierachy,then you have a column per variable you want to print
print(acme,"cost","p")
Formatters
you can use formatters to output a variable in a certain way
first you can set them on a Node using the SetFormat
and the formatter will be picked up as a default formatter
So you can overwrite a formatter for a sub-tree
SetFormat(acme,"p",formatFun = FormatPercent)
SetFormat(acme,"cost",formatFun = function(x) FormatFixedDecimal(x,digits=2))
print(acme,"p","cost")
Printing using Get
Formatting with the Get method overwrites any formatters found along the path
data.frame(cost=acme$Get("cost",format=function(x) FormatFixedDecimal(x,2)),
p=acme$Get("p",format=FormatPercent))```
Plotting
plot(acme)
#現在plot的當務之急是要把這個葉子分開,要不擠到那里還是非常難看的
#igraph
library(igraph)
plot(as.igraph(acme,directed=T,direction=T))```