生成候選詞不大于5的縮寫詞庫

程序是用SAS寫的,本來也打算用python寫的,碰到困難,就用SAS了(SAS也困難重重,寫了一天,才搞定,╮(╯_╰)╭)

首先獲取數(shù)據(jù),本程序利用了宏變量,宏程序;宏變量寫在程序開頭,方便修改


libname aa "d:\english_word";

/*是否合格的定義:大于候選個(gè)數(shù)限制且長度大于已輸入單詞,的為不合格*/

%let wait_max=5;/*候選個(gè)數(shù)限制*/

%let infile="d:\word.txt";

%let outfile="d:\output.txt";

/*讀取txt*/

data aa.allword_temp;

infile &infile.;

attrib word length=$20.;

input word $;

/*input word $20.;*/

/*是錯(cuò)誤寫法,$20.指連續(xù)的20位,效果是每隔一行讀一次,語法錯(cuò)誤會導(dǎo)致各種奇怪的錯(cuò)誤*/

run;

/*去重*/

proc sql noprint;

create table aa.allword as

select distinct word from aa.allword_temp;

quit;

;

接下來的工作就麻煩了。

/*0-3位縮寫不合格的*/

proc sql noprint;

create table aa.temp2 as

select distinct word,substr(word,1,3) as suox from aa.allword

group by suox

having count(word)>&wait_max. and length(word)>3;

/*0-3位縮寫合格的*/

create table aa.yes2 as

select distinct word,substr(word,1,3) as suox from aa.allword

group by suox

having count(word)<= &wait_max. or length(word)<=3;

quit;

;

測試一下,縮寫成3位,檢查,能正確運(yùn)行之后,就可以寫宏程序(弄一個(gè)循環(huán))

我想弄一個(gè)sql里面的循環(huán)的,發(fā)現(xiàn)sql里面沒有循環(huán)結(jié)構(gòu),反正我搞不懂

/*3位縮寫,不合格的*/

proc sql noprint;

create table aa.temp3 as

select word,substr(word,1,3) as suox from aa.temp2

group by suox

having count(word)>&wait_max. and length(word)>3

;

/*3位縮寫,合格的*/

/*不宜從allword里選,而是從上一步不合格(0-3位縮寫)的數(shù)中選*/

create table aa.yes3 as

select word,substr(word,1,3) as suox from aa.temp2

group by suox

having count(word)<=&wait_max. or length(word)<=3

;

quit;

下面就是完整代碼鏈接:

百度云鏈接?

最后編輯于
?著作權(quán)歸作者所有,轉(zhuǎn)載或內(nèi)容合作請聯(lián)系作者
平臺聲明:文章內(nèi)容(如有圖片或視頻亦包括在內(nèi))由作者上傳并發(fā)布,文章內(nèi)容僅代表作者本人觀點(diǎn),簡書系信息發(fā)布平臺,僅提供信息存儲服務(wù)。

推薦閱讀更多精彩內(nèi)容