導讀:在數(shù)據(jù)庫的處理工作,會遇到重復記錄的問題,會影響數(shù)據(jù)的分析結果的準確性,今天我們探討一下在數(shù)據(jù)庫中對重復記錄的操作;
示例數(shù)據(jù)如下我們可以看到一個name有兩條記錄,有的是英語成績不一樣(zhangsan),有的是math成績不一樣(wangwu),有的是兩個都不一樣(zhaoliu),有的是記錄完全一致(lisi),針對不同的記錄重復情況有不同的處理辦法。(原則為保留該科目最高成績?yōu)樽罱K成績,即刪除成績較低的行,記錄相同保留最近一條記錄,number更大的,記錄都不相同則取數(shù)學成績高的記錄)
首先要找出重復的記錄
? ? SELECT name? FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1
1.處理兩條記錄完全一致的情況
①找出要刪除的行
SELECT? * FROM? repeat_nums
WHERE? name in (SELECT name? FROM? repeat_nums? GROUP BY? name ,math,english HAVING COUNT(1) >1)
AND? ? number in (SELECT min(number)? FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1)
② 刪除 ? ?把 select 改為delete?
2.處理有一個字段重復的情況
SELECT? * FROM? repeat_nums
WHERE? CAST(name as VARCHAR ) + CAST(math as VARCHAR)+ CAST(english as VARCHAR)
in (SELECT CAST(name as VARCHAR ) +CAST(min(math) as VARCHAR)+ CAST(min (english) as VARCHAR)? FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1)
①找出要刪除的行
② 刪除? ? 把 select 改為delete
3.處理兩個字段都不一致的情況(選擇數(shù)學成績好的記錄)
①找出要刪除的行(name)
SELECT? name? FROM? repeat_nums? as? a
WHERE? CAST(math as VARCHAR)+ CAST(english as VARCHAR)? not in
(SELECT? CAST(min(math) as VARCHAR)+ CAST(min(english) as VARCHAR)? FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1)
and
CAST(math as VARCHAR)+ CAST(english as VARCHAR)? not in
(SELECT CAST(max(math) as VARCHAR)+ CAST(max(english) as VARCHAR)? FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1)
and
name in (SELECT name? FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1)
② 刪除
DELETE ?FROM? repeat_nums? as a? WHERE name in
(SELECT? name? FROM? repeat_nums? as? a
WHERE? CAST(math as VARCHAR)+ CAST(english as VARCHAR)? not in
(SELECT? CAST(min(math) as VARCHAR)+ CAST(min(english) as VARCHAR)? FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1)
and
CAST(math as VARCHAR)+ CAST(english as VARCHAR)? not in
(SELECT CAST(max(math) as VARCHAR)+ CAST(max(english) as VARCHAR)? FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1)
and
name in (SELECT name? FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1))
AND? MATh? in (SELECT min(MATH) FROM? repeat_nums? GROUP BY? name? HAVING COUNT(1) >1)