題目:187. Repeated DNA Sequences[Medium]
All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.
Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.
For example,
Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",
Return:
["AAAAACCCCC", "CCCCCAAAAA"].
找出DNA序列中所有長度大于10且出現次數大于1的子串。
方法一:用HashMap存儲所有子串,結果:Time Limit Exceeded
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> res = new ArrayList<String>();
if(s.length() <= 10) return res;
Map<String,Integer> strmap = new HashMap<String, Integer>();
int i =0;
while( i <= s.length()-10){
String temp = s.substring(i, i+10);
if(!strmap.containsKey(temp)){
strmap.put(temp,1);
i++;
}else{
if(strmap.get(temp) == 1){
res.add(temp);
strmap.put(temp,-1) ; //had be add to res
i++;
}
}
}
return res;
}
}
方法二:位運算
Runtime: 63 ms
對于A,C,G,T四個字符其二進制表示為如下,僅有最后三位不同。
A: 0100 0001
C: 0100 0011
G: 0100 0111
T: 0101 0100
每一位用1bit表示,10個字符供需要10x3 = 30bit。一個int有32bit,可以表示一個字符串。
注意
0x7ffffff 是 111...1111 , 一共3+6*4 = 27bit位個1
substring 的范圍是前閉后開:[0, 10) 取得是->[0,9]
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> res = new ArrayList<String>();
if(s.length() <= 10) return res;
Map<Integer,Integer> strmap = new HashMap<Integer, Integer>();//substring, 出現次數
int i =0;
int mask = 0x7ffffff; //111...1111 一共3+6*4 = 27bit位個1
int cur =0;
while( i< 9 ) {
cur = cur<<3 | (s.charAt(i) & 7); i++;
}
//i =9
while( i < s.length()){
cur = ((cur & mask) << 3) | ((int)s.charAt(i) & 7);
//((cur & mask) << 3) |:取cur的后27位再補3個0,再加上i的三位
if(!strmap.containsKey(cur)){
strmap.put(cur,1);
}else{
if(strmap.get(cur) == 1){
res.add(s.substring(i-9,i+1)); //[i-9, i+1)
strmap.put(cur,-1) ; //had be add to res
}
}
i++;
}
return res;
}
}
方法三:
Runtime: 41 ms
在solution里看到的,更快更節約時間。
在set.add(obj)方法里,如果obj已在set中存在,該方法會返回false。
class Solution {
public List<String> findRepeatedDnaSequences(String s) {
Set set = new HashSet();
Set repeat = new HashSet();
for(int i=0; i<s.length()-9;i++){
if(!set.add(s.substring(i,i+10))){
repeat.add(s.substring(i,i+10));
}
}
return new ArrayList(repeat);
}
}