Given a string S and a string T, count the number of distinct subsequences of T in S.
A subsequence of a string is a new string which is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (ie,
"ACE"
is a subsequence of "ABCDE"
while "AEC"
is not).
Here is an example:
S =
S =
"rabbbit"
, T = "rabbit"
Return
3
.First approach is to use dynamic programming. Basically 2 cases:
1. If current characters do not match, increment on S;
2. If match, need to process two different cases:
A: Using current match, increment on both S and T
B: keep doing case 1
/**Small test passed, but time limit exceeded for large set */ class Solution { private: int count(string const& s, size_t s_index, string const& w, size_t w_index){ if (s_index >= s.size() || w_index >= w.size()) return 0; if (s[s_index] != w[w_index]) return count(s,s_index+1,w,w_index); size_t result = 0; if (w_index +1 == w.size()) ++result; result += count(s, s_index+1, w, w_index+1);//search the rest part based on existing findings result += count(s, s_index+1, w, w_index);//new advanture regardless what found already return result; }//count() public: int numDistinct(string S, string T) { // Start typing your C/C++ solution below // DO NOT write int main() function return count(S,0,T,0); } };
However, due to duplicated work, this algorithm is not efficient enough to pass large data set. Similar to the hint from the longest common sequence, a bottom-up built table can be derived.
Notice that the current index pair (tt, ss) can depend on (ss, tt+1) and/or (ss+1,tt+1), provided that if the current character on (tt,ss) match. So the algorithm is simplified to
v[tt,ss]= {v[tt,ss] + v[tt,ss+1], if no match} or {v[tt,ss]+v[tt,ss+1]+v[tt+1,ss+1], otherwise}
And the boundary condition is filling with zeros, except for those tail match of the last char in T, which are assigned to +1.
For example, T={bag}, S={babgbag}, which should return 5.
And the dependencies are labeled by arrows:
The complete code:
/*passed both small and large tests**/ class Solution { public: int numDistinct(string S, string T) { // Start typing your C/C++ solution below // DO NOT write int main() function vector<int> w(S.size()+1, 0); vector<vector<int> > v(T.size()+1, w); int maxS = S.size()-1; int maxT = T.size()-1; for(int ii = 0; ii < S.size(); ++ii){ if (T[maxT] == S[ii]) v[maxT][ii] = 1; }//init for(int ss = maxS-1; ss>=0; --ss){ for(int tt = maxT; tt >= 0; --tt){ v[tt][ss] += v[tt][ss+1]; if (T[tt] == S[ss]) v[tt][ss] += v[tt+1][ss+1] ; } } return v[0][0]; } };
3 comments:
subsequences of bag are - 8 and all are present in S , so why is answer 5 ?
babgbag:
ba*g***
ba****g
b****ag
**b**ag
****bag
can you help me to understand problem.
S = "rabbbit", T = "rabbit"
T has many distinct subsequence Eg. rabbit, rab, rait , rabit, ait, bit, ans many more
ans all of them present in S.
Then how the answer is 3.
Thanks in advance
Post a Comment