簡單來說 LabelEncoder 是對不連續(xù)的數(shù)字或者文本進(jìn)行編號
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
le.fit([1,5,67,100])
le.transform([1,1,100,67,5])
輸出: array([0,0,3,2,1])
OneHotEncoder 用于將表示分類的數(shù)據(jù)擴維:
from sklearn.preprocessing import OneHotEncoder
ohe = OneHotEncoder()
ohe.fit([[1],[2],[3],[4]])
ohe.transform([2],[3],[1],[4]).toarray()
輸出:[ [0,1,0,0] , [0,0,1,0] , [1,0,0,0] ,[0,0,0,1] ]
啞變量轉(zhuǎn)換
model_dummy = pd.get_dummies(df_type6['model_id'])
model_dummy.head()