欧妇女乱妇女乱视频,俄罗斯美女视频,公翁挺进苏玥的小泬视频

R語言基礎系列：

字符串的處理與正則表達式關系密切，參考：R語言中的正則表達式

1. 字符串的初步處理

生成字符串

x <- c('huake','wuda')

1.1 nchar函數：查看字符串有多少個字符

nchar(x)
# [1] 5 4

??注意nchar函數與length函數的區別，如果用length(x)，返回的是2(有兩個字符串)，但可以使用str_length()函數

length(x)
# [1] 2
str_length(x)
# [1] 5 4

1.2 大小寫的轉換

toupper函數：小寫變大寫
tolower函數：大寫變小寫

toupper('huake')
# [1] "HUAKE"
tolower('WUDA')
# [1] "wuda"

1.3 `paste()`函數和`paste0()`函數：連接字符串

paste函數

stringa <- LETTERS[1:5]
STRINGB <- 1:5
paste(stringa,STRINGB)
# [1] "A 1" "B 2" "C 3" "D 4" "E 5"

# sep參數可以定義黏貼參數間的連接方法
paste(stringa,STRINGB,sep='-')
# [1] "A-1" "B-2" "C-3" "D-4" "E-5"

#collapse參數，把所有參數粘貼在一起，并定義連接方法
paste(stringa,STRINGB,collapse ='-')
# [1] "A 1-B 2-C 3-D 4-E 5"

paste0函數 (0代表粘貼在一起后沒有間隔)

paste0(stringa,STRINGB)
# [1] "A1" "B2" "C3" "D4" "E5"

#使用sep和collapse也無法插入到中間
paste0(stringa,STRINGB,sep='-')
# [1] "A1-" "B2-" "C3-" "D4-" "E5-"
paste0(stringa,STRINGB,collapse ='-')
# [1] "A1-B2-C3-D4-E5"

若對paste函數設置sep=""，效果和paste0一樣

1.4 拆分函數`strsplit()`

拆分后生成列表

stringC <- paste(stringa,STRINGB,sep='/')
stringC
# [1] "A/1" "B/2" "C/3" "D/4" "E/5"

M <- strsplit(stringC,split = '/')
M
# [[1]]
# [1] "A" "1"

# [[2]]
# [1] "B" "2"

# [[3]]
# [1] "C" "3"

# [[4]]
# [1] "D" "4"

# [[5]]
# [1] "E" "5"

class(M)
# [1] "list"

1.5 字符串的截取函數 substr

# 從2-4位截取
stringd <- c('python','java','ruby','php','huazhongda')
sub_str <- substr(stringd,start = 2,stop = 4)
sub_str
# [1] "yth" "ava" "uby" "hp"  "uaz"

# 除了截取，還可以賦值 #將2-4位換成aaa
substr(stringd,start = 2,stop = 4) <- 'aaa'
stringd
# [1] "paaaon"     "jaaa"       "raaa"       "paa"        "haaahongda"

1.6 `grep()`函數和`grepl()`函數

處理比較復雜的字符串

# 語法
grep(pattern, x, ignore.case = FALSE, perl = FALSE, value = FALSE,
     fixed = FALSE, useBytes = FALSE, invert = FALSE)
grepl(pattern, x, ignore.case = FALSE, perl = FALSE,
      fixed = FALSE, useBytes = FALSE)

生成向量

seq_names <- c('EU_FRA02_C1_S2008','AF_COM12_B0_2004','AF_COM17_F0_S2008',
               'AS_CHN11_C3_2004','EU-FRA-C3-S2007','NAUSA02E02005',
               'AS_CHN12_N0_05','NA_USA03_C2_S2007','NA USA04 A3 2004',
               'EU_UK01_A0_2009','eu_fra_a2_s98','SA/BRA08/B0/1996')
# 有大寫有小寫，有斜杠有下劃線，有確定年份有不確定年份。。。
# 如第一個，EU是歐洲，FRA是法國，02是法國的第二個序列，C1是序列亞型，2008是樣本收集年份，S是2008年是一個推測的數值，并不確定。

grep()函數提取法國的元素

fra_seq <- grep(pattern = 'FRA|fra',x=seq_names)
fra_seq
# [1]  1  5 11
seq_names[fra_seq]
# [1] "EU_FRA02_C1_S2008" "EU-FRA-C3-S2007"   "eu_fra_a2_s98" 

#也可通過設置value = TRUE來返回得到的元素
fra_seq <- grep(pattern = 'FRA|fra',x=seq_names,value = TRUE)
fra_seq
# [1] "EU_FRA02_C1_S2008" "EU-FRA-C3-S2007"   "eu_fra_a2_s98"

# 通過設置ignore.case = T來忽略大小寫
grep(pattern = 'FRA|fra',x=seq_names,value = TRUE,ignore.case = T)
#  [1] "EU_FRA02_C1_S2008" "EU-FRA-C3-S2007"   "eu_fra_a2_s98" 
#這里用到了正則表達式

grepl()函數返回的是TRUE或FALSE

grepl(pattern = 'FRA|fra',x=seq_names)
# [1]  TRUE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
# 用[]提取
seq_names[grepl(pattern = 'FRA|fra',x=seq_names)]
# [1] "EU_FRA02_C1_S2008" "EU-FRA-C3-S2007"   "eu_fra_a2_s98"

?練習：提取如上向量中有明確收集年份的序列。
（思路：找出不明確年份的序列（含s或S的），然后取非。）

spe_seq <- seq_names[!grepl(pattern = '[s|S][0-9]{2,4}\\b',seq_names)]
spe_seq
# [1] "AF_COM12_B0_2004" "AS_CHN11_C3_2004" "NAUSA02E02005"    "AS_CHN12_N0_05"  
# [5] "NA USA04 A3 2004" "EU_UK01_A0_2009"  "SA/BRA08/B0/1996"

# \\是轉義符，\\b是去匹配boundary，放在右邊說明是去匹配字符的結尾。
# 前面[s｜S]的意思是在s或S中取值，[0-9]的意思是在0-9中取值，{2,4}緊跟在[0-9]后面的意思在0-9中取值取2-4次。

1.7 `gsub()`函數和`sub()`函數

# 語法
sub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
    fixed = FALSE, useBytes = FALSE)
gsub(pattern, replacement, x, ignore.case = FALSE, perl = FALSE,
     fixed = FALSE, useBytes = FALSE)

money <- c('$1888','$2888','$3888')

# 由于美元符的存在，不能直接使用as.numeric
as.numeric(money)
# [1] NA NA NA

gsub()函數

# $本身也有含義，不能直接使用，需要在前面加上轉義符\\，之后再用as.numeric轉換。
money1 <- gsub('\\$',replacement = '',money)
money1
[1] "1888" "2888" "3888"
as.numeric(money1)
# [1] 1888 2888 3888

gsub函數可以替換它找到的所有的字符
sub函數只能替換它找到的第一個字符

sub('\\$',replacement = '',money)
# [1] "1888" "2888" "3888"

money <- c('$1888 $2888 $3888')
sub('\\$',replacement = '',money)
# [1] "1888 $2888 $3888"
gsub('\\$',replacement = '',money)
# [1] "1888 2888 3888"

1.8 `regexpr()`函數、`gregexpr()`函數和`regexec()`函數

功能非常類似

# 語法
regexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
        fixed = FALSE, useBytes = FALSE)
gregexpr(pattern, text, ignore.case = FALSE, perl = FALSE,
         fixed = FALSE, useBytes = FALSE)
regexec(pattern, text, ignore.case = FALSE, perl = FALSE,
        fixed = FALSE, useBytes = FALSE)

以regexpr()為例：

# 尋找test_string里含有pp的字符串
test_string <- c('happy','apple','application','apolotoc')
regexpr('pp',test_string)
# [1]  3  2  2 -1
# attr(,"match.length")
# [1]  2  2  2 -1
# attr(,"index.type")
# [1] "chars"
# attr(,"useBytes")
# [1] TRUE
# 返回的3 2 2 -1的意思是，第一個字符串里的pp出現在第三位，第二個和第三個出現在第二位。最后一個沒有找到，返回-1。

1.9 agrep()函數和agrepl()函數

以agrep()為例

string1 <- c('I need a favour','my favorite sport','you made an error')
agrep('favor',string1)
# [1] 1 2

英式英語和美式英語的寫法可以自動被識別

2. stringr和stringi包

stringr和stringi功能類似，stringi功能更強大，但更依賴于正則表達式的使用。

# 查看這兩個包中的函數
library(stringr)
library(stringi)
ls('package:stringr')
ls('package:stringi')
# stringr中有52個函數， stringi中有252個函數。

2.1 stringr包??

常用函數	功能
str_split() /str_c()	字符串拆分與組合
str_length()	檢測字符串長度
str_sub()	按位置提取字符
str_dup	識別重復的字符串
str_trim	去除字符串首尾的空格
str_to_upper()/str_to_lower()/str_to_title()	大小寫轉換
str_locate()	字符串定位
str_detect(x,“h”)	字符檢測 –返回邏輯值
str_extract()/ str_extract_all()	字符提取
str_remove()/ str_remove_all()	字符刪除
str_replace()/str_replace_all()	字符串替換

2.1.1 str_c()和str_split()
str_c()函數與paste函數類似

library(stringr)
str_c('a','b')
# [1] "ab"
str_c('a','b',sep='-')
# [1] "a-b"

str_split()??

x <- "The birch canoe slid on the smooth planks."
x
# [1] "The birch canoe slid on the smooth planks."
str_split(x," ") #生成的是列表
# [[1]]
# [1] "The"     "birch"   "canoe"   "slid"    "on"     
# [6] "the"     "smooth"  "planks."
x[[1]]  #得到向量
[1] "The birch canoe slid on the smooth planks."

y = c("john 150","mike 140","lucy 152")
str_split(y," ")
# [[1]]
# [1] "john" "150" 

# [[2]]
# [1] "mike" "140" 

# [[3]]
# [1] "lucy" "152" 
str_split(y," ",simplify = T) #‘simplify = T’生成矩陣??
#      [,1]   [,2] 
# [1,] "john" "150"
# [2,] "mike" "140"
# [3,] "lucy" "152"

2.1.2 str_length()函數
對字符串進行計數，與nchar()類似
2.1.3 str_sub()函數：按位置提取字符

aaa <- 'huake tongji cardio'
str_sub(aaa,c(1,4,8),c(2,7,11))
# [1] "hu"   "ke t" "ongj"
#第一個是1-2個字符，第二個是4-7個字符，第三個是8-11個字符

2.1.4 str_dup

fruit <- c('apple','pear','banana')
str_dup(fruit,2) #2表示把字符串重復兩次
# [1] "appleapple"   "pearpear"     "bananabanana"

str_dup(fruit,2:4)
# [1] "appleapple"               "pearpearpear"             "bananabananabananabanana"

2.1.5 str_trim 去除字符串首尾的空格

string <- c('  Huake is good    ')
string
# [1] "  Huake is good    "
str_trim(string,side = 'both')
# [1] "Huake is good"

2.1.6 str_locate 字符串定位

fruit <- c("apple", "banana", "pear", "pineapple")
str_locate(fruit, "$")
#      start end
# [1,]     6   5
# [2,]     7   6
# [3,]     5   4
# [4,]    10   9
str_locate(fruit, "a")
# start end
# [1,]     1   1
# [2,]     2   2
# [3,]     3   3
# [4,]     5   5
str_locate(fruit, c("a", "b", "p", "p"))
#      start end
# [1,]     1   1
# [2,]     1   1
# [3,]     1   1
# [4,]     1   1

2.1.7 str_detect 字符檢測??

fruit <- c("apple", "banana", "pear", "pinapple")
str_detect(fruit, "a")
# [1] TRUE TRUE TRUE TRUE
str_detect(fruit, "^a")
# [1]  TRUE FALSE FALSE FALSE
str_detect(fruit, "a$")
# [1] FALSE  TRUE FALSE FALSE
str_detect(fruit, "b")
# [1] FALSE  TRUE FALSE FALSE
str_detect(fruit, "[aeiou]")
# [1] TRUE TRUE TRUE TRUE

??：str_detect()和ifelse()聯合使用可以根據字符串中是否存在某字符將字符串分為兩類，常用于GEO等分析時根據樣本名判斷該樣本是正常樣本還是病例（如腫瘤）樣本。

用法：

ifelse(str_detect(colname(a), ''tumor), 'tumor', 'normal' )
# 如果在數據框a的列名中搜索到tumor，返回tumor，沒有搜索到返回normal。

2.1.8 str_extract和str_extract_all

shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
str_extract(shopping_list, "\\d")
# [1] "4" NA  NA  "2"
str_extract(shopping_list, "[a-z]+")
# [1] "apples" "bag"    "bag"    "milk"  
str_extract(shopping_list, "[a-z]{1,4}")
# [1] "appl" "bag"  "bag"  "milk"
str_extract(shopping_list, "\\b[a-z]{1,4}\\b")
# [1] NA     "bag"  "bag"  "milk"

str_extract_all(shopping_list, "[a-z]+")
# [[1]]
# [1] "apples" "x"     

# [[2]]
# [1] "bag"   "of"    "flour"

# [[3]]
# [1] "bag"   "of"    "sugar"

# [[4]]
# [1] "milk" "x"

2.1.9 str_remove()和str_remove_all()

fruits <- c("one apple", "two pears", "three bananas")
str_remove(fruits, "[aeiou]")
# [1] "ne apple"     "tw pears"     "thre bananas"
str_remove_all(fruits, "[aeiou]")
# [1] "n ppl"    "tw prs"   "thr bnns"

2.1.10 str_replace()和str_replace_all()

fruits <- c("one apple", "two pears", "three bananas")
str_replace(fruits, "[aeiou]", "-")
# [1] "-ne apple"     "tw- pears"     "thr-e bananas"
str_replace_all(fruits, "[aeiou]", "-")
# [1] "-n- -ppl-"     "tw- p--rs"     "thr-- b-n-n-s"

2.2 stringi包

2.2.1 stri_join 字符串的粘貼

stri_join(1:13, letters)
#  [1] "1a"  "2b"  "3c"  "4d"  "5e"  "6f"  "7g"  "8h"  "9i"  "10j" "11k"
# [12] "12l" "13m" "1n"  "2o"  "3p"  "4q"  "5r"  "6s"  "7t"  "8u"  "9v" 
# [23] "10w" "11x" "12y" "13z"
stri_join(1:13, letters, sep=',')
#  [1] "1,a"  "2,b"  "3,c"  "4,d"  "5,e"  "6,f"  "7,g"  "8,h"  "9,i"  "10,j"
# [11] "11,k" "12,l" "13,m" "1,n"  "2,o"  "3,p"  "4,q"  "5,r"  "6,s"  "7,t" 
# [21] "8,u"  "9,v"  "10,w" "11,x" "12,y" "13,z"
stri_join(1:13, letters, collapse='; ')
# [1] "1a; 2b; 3c; 4d; 5e; 6f; 7g; 8h; 9i; 10j; 11k; 12l; 13m; 1n; 2o; 3p; 4q; 5r; 6s; 7t; 8u; 9v; 10w; 11x; 12y; 13z"

2.2.2 stri_cmp_eq和stri_cmp_neq

stri_cmp_eq 判斷兩個字符串是否完全一樣
stri_cmp_neq 判斷兩個字符串是否不一樣

stri_cmp_eq('AB','AB')
# [1] TRUE
stri_cmp_eq('AB','aB')
# [1] FALSE
stri_cmp_neq('AB','aB')
# [1] TRUE

2.2.3 stri_cmp_lt和stri_cmp_gt
stri_cmp_lt 小于
stri_cmp_gt 大于
字符串之間的比較，針對數字時按數字大小，針對字母的時候按字母表的順序，后出現的大

stri_cmp_lt('121','221')
# [1] TRUE
stri_cmp_lt('a121','b221')
# [1] TRUE

2.2.4 stri_count

s <- 'Lorem ipsum dolor sit amet, consectetur adipisicing elit.'
stri_count(s, fixed='dolor')
# [1] 1
stri_count(s, regex='\\p{L}+')
# [1] 8

2.2.5 stri_dup

stri_dup('a', 1:5)
# [1] "a"     "aa"    "aaa"   "aaaa"  "aaaaa"
stri_dup(c('a', NA, 'ba'), 4)
# [1] "aaaa"     NA         "babababa"
# stri_dup(c('abc', 'pqrst'), c(4, 2))
[1] "abcabcabcabc" "pqrstpqrst"

2.2.6 stri_detect_fixed

stri_detect_fixed(c('stringi R', 'R STRINGI', '123'), c('i', 'R', '0'))
# [1]  TRUE  TRUE FALSE

向量化的從前面那個里面尋找后面那個，找到了就返回TRUE，找不到就返回FALSE

2.2.7 stri_detect_regex

# 尋找以ab開頭的和以t結尾的
stri_detect_regex(c('above','abort','about','abnormal','abandon'),'^ab')
# [1] TRUE TRUE TRUE TRUE TRUE
stri_detect_regex(c('above','abort','about','abnormal','abandon'),'t\\b')
# [1] FALSE  TRUE  TRUE FALSE FALSE

# case_insensitive=TRUE是忽視大小寫
stri_detect_regex(c('ABove','abort','About','aBnormal','abandon'),'^ab',case_insensitive=TRUE)
# [1] TRUE TRUE TRUE TRUE TRUE

2.2.8 stri_startswith_fixed 判斷是不是以某個字符開始

stri_startswith_fixed(c('a1','a2','b3','a4','c5'),'a1')
# [1]  TRUE FALSE FALSE FALSE FALSE
stri_startswith_fixed(c('abaDc','asdfh','abiude'),'ba',from=2)
# [1]  TRUE FALSE FALSE
# from定義從第幾個字符開始匹配

2.2.9 stri_endswith_fixed 判斷是不是以某個字符結束

stri_endswith_fixed(c('abaDc','asdfh','abiudba'),'ba')
# [1] FALSE FALSE  TRUE
stri_endswith_fixed(c('abaDc','asdfh','abiudba'),'ba',to=3)
# [1]  TRUE FALSE FALSE
# to表示匹配到第幾位

2.2.10 stri_extract_all

stri_extract_all('XaaaaX', regex=c('\\p{Ll}', '\\p{Ll}+', '\\p{Ll}{2,3}', '\\p{Ll}{2,3}?'))
# [[1]]
# [1] "a" "a" "a" "a"

# [[2]]
# [1] "aaaa"

# [[3]]
# [1] "aaa"

# [[4]]
# [1] "aa" "aa"

stri_extract_all('Bartolini', coll='i')
# [[1]]
# [1] "i" "i"

stri_extract_all('stringi is so good!', charclass='\\p{Zs}') # all white-spaces
# [[1]]
# [1] " " " " " "

2.2.11 stri_extract_all_fixed
參數overlap=TRUE意思是可以重復的對字符串進行匹配

stri_extract_all_fixed('abaBAba', 'Aba', case_insensitive=TRUE)
# [[1]]
# [1] "aba" "Aba"
stri_extract_all_fixed('abaBAba', 'Aba', case_insensitive=TRUE, overlap=TRUE)
# [[1]]
# [1] "aba" "aBA" "Aba"

2.2.12 stri_extract_all_boundaries 提取字符串的邊界
根據空格提取的。問題是提取出來的字符串也帶空格

stri_extract_all_boundaries('stringi: THE string processing package 123.48...')
# [[1]]
# [1] "stringi: "   "THE "        "string "    
# [4] "processing " "package "    "123.48..."

2.2.13 stri_extract_all_words 提取單詞

stri_extract_all_words('stringi: THE string processing package 123.48...')
# [[1]]
# [1] "stringi"    "THE"        "string"     "processing"
# [5] "package"    "123.48"

2.2.14 stri_isempty 判斷字符串中是否存在空字符
注意：空格不算空字符

stri_isempty(letters[1:3])
# [1] FALSE FALSE FALSE
stri_isempty(c(',', '', 'abc', '123', '\u0105\u0104'))
# [1] FALSE  TRUE FALSE FALSE FALSE
stri_isempty(character(1))
[1] TRUE

2.2.15 stri_locate_all 定位函數可以找到匹配字符在字符串中出現的位置

stri_locate_all('Bartolini', fixed='i')
# [[1]]
#      start end
# [1,]     7   7
# [2,]     9   9

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

R語言基礎9--字符串的處理（初步處理+stringr&stringi）

R語言基礎9--字符串的處理（初步處理+stringr&stringi）

1. 字符串的初步處理

1.1 nchar函數：查看字符串有多少個字符

1.2 大小寫的轉換

1.3 `paste()`函數和`paste0()`函數：連接字符串

1.4 拆分函數`strsplit()`

1.5 字符串的截取函數 substr

1.6 `grep()`函數和`grepl()`函數

1.7 `gsub()`函數和`sub()`函數

1.8 `regexpr()`函數、`gregexpr()`函數和`regexec()`函數

1.9 agrep()函數和agrepl()函數

2. stringr和stringi包

2.1 stringr包??

2.2 stringi包

2.2.2 stri_cmp_eq和stri_cmp_neq

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

R語言基礎9--字符串的處理（初步處理+stringr&stringi）

1. 字符串的初步處理

1.1 nchar函數：查看字符串有多少個字符

1.2 大小寫的轉換

1.3 paste()函數和paste0()函數：連接字符串

1.4 拆分函數strsplit()

1.5 字符串的截取函數 substr

1.6 grep()函數和grepl()函數

1.7 gsub()函數和sub()函數

1.8 regexpr()函數、gregexpr()函數和regexec()函數

1.9 agrep()函數和agrepl()函數

2. stringr和stringi包

2.1 stringr包??

2.2 stringi包

2.2.2 stri_cmp_eq和stri_cmp_neq

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

1.3 `paste()`函數和`paste0()`函數：連接字符串

1.4 拆分函數`strsplit()`

1.6 `grep()`函數和`grepl()`函數

1.7 `gsub()`函數和`sub()`函數

1.8 `regexpr()`函數、`gregexpr()`函數和`regexec()`函數