My code:
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
ArrayList<String> ret = new ArrayList<String>();
if (s == null || s.length() <= 10)
return ret;
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
for (int i = 0; i <= s.length() - 10; i++) {
String tmp = s.substring(i, i + 10);
int key = toNum(tmp);
if (!map.containsKey(key)) {
map.put(key, 1);
}
else {
int val = map.get(key);
val++;
map.put(key, val);
if (val == 2)
ret.add(tmp);
}
}
return ret;
}
private int toNum(String tmp) {
int ret = 0;
for (int i = 0; i < tmp.length(); i++) {
char temp = tmp.charAt(i);
int b = 0;
switch (temp) {
case 'A':
b |= 0;
break;
case 'C':
b |= 1;
break;
case 'G':
b |= 2;
break;
case 'T':
b |= 3;
break;
default:
break;
}
ret |= b;
ret = (ret << 2);
}
return ret;
}
}
這道題木我也沒做出來。狀態不是很好。
然后看了解法,感覺實在是好暴力。。。
做法就是,
把一個字符串,每個10位都存放到hashmap中。
也就是不停地移動一個字母,然后把接下來的10位字符串存入hashmap并且記錄次數。
然后如果key是string,內存會爆掉。
于是將string編碼。
A: 1
C: 2
G: 3
T: 4
然后一個10位的字符串可以由一個20比特的數字表示,也就是一個int(32位)來表示。
然后就可以做了。
我參考了這個博客。
http://betterpoetrythancode.blogspot.com/2015/02/repeated-dna-sequences-leetcode-bit.html
**
總結:
hashmap, 編碼 <- bit manipulation
**
Anyway, Good luck, Richardo!
My code:
public class Solution {
public List<String> findRepeatedDnaSequences(String s) {
List<String> ret = new ArrayList<String>();
if (s == null || s.length() == 0) {
return ret;
}
int end = 10;
HashMap<Integer, Integer> map = new HashMap<Integer, Integer>();
HashSet<String> repeat = new HashSet<String>();
while (end <= s.length()) {
String sub = s.substring(end - 10, end);
int hashcode = hashCode(sub);
if (!map.containsKey(hashcode)) {
map.put(hashcode, 1);
}
else {
map.put(hashcode, map.get(hashcode) + 1);
if (!repeat.contains(sub)) {
repeat.add(sub);
ret.add(sub);
}
}
end++;
}
return ret;
}
private int hashCode(String s) {
int ret = 0;
for (int i = 0; i < s.length(); i++) {
char curr = s.charAt(i);
int b = 0;
switch (curr) {
case 'A':
b |= 0;
break;
case 'C':
b |= 1;
break;
case 'G':
b |= 2;
break;
case 'T':
b |= 3;
break;
default:
break;
}
ret |= b;
ret = (ret << 2);
}
return ret;
}
}
reference:
https://discuss.leetcode.com/topic/8894/clean-java-solution-hashmap-bits-manipulation
一開始自己寫出來了純 HashMap的解法。很簡單。
然后以為 Bit manipulation 是加速的,其實不是,是用來節省內存空間的,類似于一種 hashcode
因為 HashMap key is string, memory cost is big
each string has 10 characters.
each character in Java is 16 bit
one string should domain 160 bits
If we use Integer rather than String as key, the cost is 32 bits per key
So how to use Integer to represent a 10-character string?
We try to design some hash functions.
A = 0X00
B = 0X01
C = 0X10
D = 0X11
根據每個character,然后不停地異或,不停地移位,向左移兩位,
得出這個integer。
其他的差不多了。為了避免重復,還需要多加一個set
以前的做法,沒加,看來testcase有問題。
Anyway, Good luck, Richardo! -- 09/22/2016