sao虎视频最新网站入口,欲罢不能第六季的综艺,那些年我们一起追的女孩

天下苦正則久矣，尤其當(dāng)字符串中有自己想要的一段數(shù)據(jù)時(shí)候苦苦提取不出來的感覺真是難受，今天給大家介紹3個(gè)R包來解決這種正則帶來煩惱的包，希望對(duì)大家能有所幫助。

regexplain

RegExplain是一個(gè)RStudio插件，用于正則表達(dá)式。以交互方式構(gòu)建regexp、檢查常用字符串匹配函數(shù)的輸出、查閱交互式幫助頁(yè)面或使用包含的資源學(xué)習(xí)正則表達(dá)式。

github地址：https://github.com/gadenbuie/regexplain

安裝

# install.packages("remotes")
remotes::install_github("gadenbuie/regexplain")

安裝完成后，Addins處會(huì)多出一些插件，其中常用的就是RegExplain Selection和RegExplain File功能，即可以將文本導(dǎo)入RegExplain的兩種方法。

在RStudio窗口中中選擇對(duì)象名稱或文本行或代碼行，然后運(yùn)行RegExplain Selection即可。

要從文件中導(dǎo)入文本，請(qǐng)使用RegExplain file來導(dǎo)入要用正則表達(dá)式處理的文本。在導(dǎo)入文本時(shí)，RegExplain自動(dòng)將文本縮減為唯一的條目，并限制行數(shù)。

image.png

另外也有CheetSheet輔助我們查看基本正則的寫法。

image.png

RegExplain也提供了函數(shù)view_regex，您可以使用它作為字符串::str_view()的替換。除了突出顯示文本中匹配的部分之外，view_regex()還會(huì)為組著色，并嘗試為regex表達(dá)式本身著色。

text <- c("breakfast=eggs;lunch=pizza",
          "breakfast=bacon;lunch=spaghetti", 
          "no food here")
pattern <- "((\\w+)=)(\\w+).+(ch=s?p)"

view_regex(text, pattern)

image.png

詳情查看Github，有較詳細(xì)的示例..

stringr.plus

github地址：https://github.com/johncassil/stringr.plus

stringr.plus提供了一些stringr中沒有的額外函數(shù)來處理字符串。函數(shù)運(yùn)行可以很好的與tidyverse搭配。這些函數(shù)在處理url和文件路徑數(shù)據(jù)時(shí)特別有用，可以從字符串中提取特定的文本。

示例

#install.packages('stringr.plus')
#remotes::install_github("johncassil/stringr.plus")
library(stringr.plus)
url <- 'www.carfax.com/vehicle/3GCPKTE77DG348900'

#僅提取基本的url
str_extract_before(string = url, pattern = '/')
#> [1] "www.carfax.com"

## 提取基礎(chǔ)url的后面的部分
str_extract_after(string = url, pattern = '/')
#> [1] "vehicle/3GCPKTE77DG348900"

## 僅提取最后一部分
str_extract_after(string = url, pattern = 'vehicle/')
#> [1] "3GCPKTE77DG348900"

## 提取最后一部分的前5個(gè)字母
str_extract_after(string = url, pattern = 'vehicle/', num_char = 5)
#> [1] "3GCPK"

##通過last 和 first(默認(rèn)) 更方便的指定
str_extract_after(string = url, pattern = '/', which = "last")
#> [1] "3GCPKTE77DG348900"
str_extract_before(string = url, pattern = '/', which = "last")
#> [1] "www.carfax.com/vehicle"

## 提取兩個(gè)格式之間的文本
file_path <- "C:/Users/pingu/Downloads/a-very-poorly-named-file_08_09_2020.csv"
str_extract_between(string = file_path, pattern1 = '_', pattern2 = ".csv")
#> [1] "08_09_2020"

有時(shí)，檢測(cè)字符串是否包含多個(gè)模式是有幫助的，str_detect_multiple通常用于過濾目的，或與且的概念。

############### 且模式#######
#str_detect_multiple with the "and" 方法確保文本中含有多個(gè)匹配默認(rèn)
file_path <- "C:/Users/pingu/Downloads/a-very-poorly-named-file_08_09_2020.csv"
str_detect_multiple(string = file_path, patterns = c("pingu", "2020"), method = 'and')
#> [1] TRUE

#更精簡(jiǎn)版的模式 str_detect_multiple_and()
str_detect_multiple_and(string = file_path, patterns = c("Downloads", "csv"))
#> [1] TRUE

############### 或模式 #######
#str_detect_multiple with the "and" 方法確保文本中含有多個(gè)匹配默認(rèn)
str_detect_multiple(string = file_path, patterns = c("very", "purple"), method = 'or')
#> [1] TRUE
#It is also aliased with str_detect_multiple_or()
str_detect_multiple_or(string = file_path, patterns = c("large", "file"))
#> [1] TRUE

理解模式匹配的上下文也很重要，str_extract_context可以將模式擴(kuò)展為給定模式前后的最大字符數(shù)

## 使用window_size函數(shù)控制字符數(shù)
## str_extract_context()返回第一個(gè)匹配
sentence <- "I have spread my dreams under your feet; Tread softly because you tread on my dreams."
str_extract_context(string = sentence, pattern = "my", window_size = 15)
#> [1] "I have spread my dreams under y"

##str_extract_context_all() 返回所有的匹配
str_extract_context_all(string = sentence, pattern = "my", window_size = 15)
#>      [,1]                             
#> [1,] "I have spread my dreams under y"
#> [2,] "e you tread on my dreams."

interregex

inferregex的目標(biāo)是推斷識(shí)別字符串的正則表達(dá)式(regex)以及其他特性，這點(diǎn)還是很智能的。

github地址：https://github.com/daranzolin/inferregex

example1-單字符串

remotes::install_github("daranzolin/inferregex")
library(inferregex)
s <- "abcd-9999-ab9"
infer_regex(s)$regex
#> [1] "^[a-z]{4}-\\d{4}-[a-z]{2}\\d$"

example2-批量多字符串

library(purrr)
(regex_df <- map_dfr(rownames(mtcars), infer_regex))
all(map2_lgl(regex_df$string, regex_df$regex, ~grepl(.y, .x)))
#>                 string                                         regex
#> 1            Mazda RX4                 ^[A-Z][a-z]{4}\\s[A-Z]{2}\\d$
#> 2        Mazda RX4 Wag ^[A-Z][a-z]{4}\\s[A-Z]{2}\\d\\s[A-Z][a-z]{2}$
#> 3           Datsun 710                      ^[A-Z][a-z]{5}\\s\\d{3}$
#> 4       Hornet 4 Drive         ^[A-Z][a-z]{5}\\s\\d\\s[A-Z][a-z]{4}$
#> 5    Hornet Sportabout               ^[A-Z][a-z]{5}\\s[A-Z][a-z]{9}$
#> 6              Valiant                               ^[A-Z][a-z]{6}$
#> 7           Duster 360                      ^[A-Z][a-z]{5}\\s\\d{3}$
#> 8            Merc 240D                 ^[A-Z][a-z]{3}\\s\\d{3}[A-Z]$
#> 9             Merc 230                      ^[A-Z][a-z]{3}\\s\\d{3}$
#> 10            Merc 280                      ^[A-Z][a-z]{3}\\s\\d{3}$
#> 11           Merc 280C                 ^[A-Z][a-z]{3}\\s\\d{3}[A-Z]$
#> 12          Merc 450SE              ^[A-Z][a-z]{3}\\s\\d{3}[A-Z]{2}$
#> 13          Merc 450SL              ^[A-Z][a-z]{3}\\s\\d{3}[A-Z]{2}$
#> 14         Merc 450SLC              ^[A-Z][a-z]{3}\\s\\d{3}[A-Z]{3}$
#> 15  Cadillac Fleetwood               ^[A-Z][a-z]{7}\\s[A-Z][a-z]{8}$
#> 16 Lincoln Continental              ^[A-Z][a-z]{6}\\s[A-Z][a-z]{10}$
#> 17   Chrysler Imperial               ^[A-Z][a-z]{7}\\s[A-Z][a-z]{7}$
#> 18            Fiat 128                      ^[A-Z][a-z]{3}\\s\\d{3}$
#> 19         Honda Civic               ^[A-Z][a-z]{4}\\s[A-Z][a-z]{4}$
#> 20      Toyota Corolla               ^[A-Z][a-z]{5}\\s[A-Z][a-z]{6}$
.........

all(map2_lgl(regex_df$string, regex_df$regex, ~grepl(.y, .x)))
#> [1] TRUE

可以看到，掌握了這三個(gè)R包后(可能后兩個(gè)幫助會(huì)更大一些)，遇到類似情況可以輕松提取數(shù)據(jù)或者直接交給函數(shù)識(shí)別出該模式正則的寫法，即提高了分析效率，又對(duì)正則的寫法有所收獲，兩全其美，快哉快哉~

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

學(xué)會(huì)這三個(gè)包，搞定R中的正則

學(xué)會(huì)這三個(gè)包，搞定R中的正則

regexplain

安裝

stringr.plus

示例

interregex

example1-單字符串

example2-批量多字符串

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

學(xué)會(huì)這三個(gè)包，搞定R中的正則

regexplain

安裝

stringr.plus

示例

interregex

example1-單字符串

example2-批量多字符串

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频