背景:匹配某字段是否含有單個或多個特定關(guān)鍵詞
一、單一匹配
樣例:2476401041436860373259
需求:從上述樣例中,截取0414開頭的字符串,即從0414開始得到最后3259
str_pattern='0414'
m=[]
for i in range(data_set.shape[0]):
start_point=data_set['old_order_id'][i].find(str_pattern)
end_point=len(data_set['old_order_id'][i])
m.append(data_set['old_order_id'][i][start_point:end_point])
# 截取后的字段重新賦值給原始的dataframe
data_set['new_orderId']=m
data_set.head()
二、多種匹配
樣例:這是一個天朗氣清的早餐,風和日麗,適合郊游
需求:匹配關(guān)鍵字,早/風和/郊游 or 天亮了/月亮
pattern_v1="早|風和|郊游"
pattern_v1 ="天亮了|月亮"
import re
# type分類
type =[]
for i in range(data_set.shape[0]):
result_v1=re.findall(pattern_v1, data_set.shape['contents'][i])
result_v2=re.findall(pattern_v1, data_set.shape['contents'][i])
if len(result_v1)>0:
risk_type.append(1)
elif len(result_v2)>0:
risk_type.append(2)
else:
risk_type.append(0)
# 將結(jié)果重新賦值給原始dataframe
data_set.shape['type']=type
data_set.shape.head(1)
知識點總結(jié):
- python正則表達式:
re
模塊 -
find
函數(shù)