LeetCode-Repeated DNA Sequences (位图算法减少内存)

Repeated DNA Sequences

All DNA is composed of a series of nucleotides abbreviated as A, C, G, and T, for example: "ACGAATTCCG". When studying DNA, it is sometimes useful to identify repeated sequences within the DNA.

Write a function to find all the 10-letter-long sequences (substrings) that occur more than once in a DNA molecule.

For example,

Given s = "AAAAACCCCCAAAAACCCCCCAAAAAGGGTTT",

Return:
["AAAAACCCCC", "CCCCCAAAAA"].
 
用位图算法可以减少内存,代码如下:
int map_exist[ *  / ];
int map_pattern[ * / ]; #define set(map,x) \
(map[x >> ] |= ( << (x & 0x1F))) #define test(map,x) \
(map[x >> ] & ( << (x & 0x1F))) int dnamap[]; char** findRepeatedDnaSequences(char* s, int* returnSize) {
*returnSize = ;
if (s == NULL) return NULL;
int len = strlen(s);
if (len <= ) return NULL; memset(map_exist, , sizeof(int)* ( * / ));
memset(map_pattern, , sizeof(int)* ( * / )); dnamap['A' - 'A'] = ; dnamap['C' - 'A'] = ;
dnamap['G' - 'A'] = ; dnamap['T' - 'A'] = ; char ** ret = malloc(sizeof(char*));
int curr = ;
int size = ;
int key;
int i = ; while (i < )
key = (key << ) | dnamap[s[i++] - 'A'];
while (i < len){
key = ((key << ) & 0xFFFFF) | dnamap[s[i++] - 'A'];
if (test(map_pattern, key)){
if (!test(map_exist, key)){
set(map_exist, key);
if (curr == size){
size *= ;
ret = realloc(ret, sizeof(char*)* size);
}
ret[curr] = malloc(sizeof(char)* );
memcpy(ret[curr], &s[i-], );
ret[curr][] = '\0';
++curr;
} }
else{
set(map_pattern, key);
}
} ret = realloc(ret, sizeof(char*)* curr);
*returnSize = curr;
return ret;
}

该算法用时 6ms 左右, 非常快

 
上一篇:2021-10-30 相对定位和绝对定位


下一篇:servlet的含义和作用