No.28 实现strStr() - 《LeetCode刷题笔记》

题目：实现strStr()函数。给定一个haystack字符串和一个needle字符串，在haystack字符串中找出needle字符串出现的第一个位置（从0开始）。若不存在，则返回-1。
例：

输入: haystack = "hello", needle = "ll"
输出: 2

输入: haystack = "aaaaa", needle = "bba"
输出: -1

题解：
一、暴力法

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        L, n = len(needle), len(haystack)

        for start in range(n - L + 1):
            if haystack[start: start + L] == needle:
                return start
        return -1

作者：LeetCode
链接：https://leetcode-cn.com/problems/implement-strstr/solution/shi-xian-strstr-by-leetcode/
来源：力扣（LeetCode）
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

知识点：
滑动窗口
逐窗口扫描，进行字符串匹配，其复杂度通常与窗长和整个序列长度成正比。缺点是窗长定死，不够灵活。
复杂度 No.28 实现strStr() - 图1 。

二、双指针

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        L, n = len(needle), len(haystack)
        if L == 0:
            return 0

        pn = 0
        while pn < n - L + 1:
            # find the position of the first needle character
            # in the haystack string
            while pn < n - L + 1 and haystack[pn] != needle[0]:
                pn += 1

            # compute the max match string
            curr_len = pL = 0
            while pL < L and pn < n and haystack[pn] == needle[pL]:
                pn += 1
                pL += 1
                curr_len += 1

            # if the whole needle string is found,
            # return its start position
            if curr_len == L:
                return pn - L

            # otherwise, backtrack
            pn = pn - curr_len + 1

        return -1

作者：LeetCode
链接：https://leetcode-cn.com/problems/implement-strstr/solution/shi-xian-strstr-by-leetcode/
来源：力扣（LeetCode）
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

慢指针在haystack里，快指针在needle里。双指针逐位置匹配，一旦出现不匹配情况直接将慢指针移动一格。好处是及时跳过了不必要的匹配过程。

三、Rabin Karp法

class Solution:
    def strStr(self, haystack: str, needle: str) -> int:
        L, n = len(needle), len(haystack)
        if L > n:
            return -1

        # base value for the rolling hash function
        a = 26
        # modulus value for the rolling hash function to avoid overflow
        modulus = 2**31

        # lambda-function to convert character to integer
        h_to_int = lambda i : ord(haystack[i]) - ord('a')
        needle_to_int = lambda i : ord(needle[i]) - ord('a')

        # compute the hash of strings haystack[:L], needle[:L]
        h = ref_h = 0
        for i in range(L):
            h = (h * a + h_to_int(i)) % modulus
            ref_h = (ref_h * a + needle_to_int(i)) % modulus
        if h == ref_h:
            return 0

        # const value to be used often : a**L % modulus
        aL = pow(a, L, modulus) 
        for start in range(1, n - L + 1):
            # compute rolling hash in O(1) time
            h = (h * a - h_to_int(start - 1) * aL + h_to_int(start + L - 1)) % modulus
            if h == ref_h:
                return start

        return -1

作者：LeetCode
链接：https://leetcode-cn.com/problems/implement-strstr/solution/shi-xian-strstr-by-leetcode/
来源：力扣（LeetCode）
著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

知识点：
哈希码
简单地说，就是用不同整数来表达不同对象。最理想的情况下，这是一一映射。
例：

haystack='abcde'
L=4

设字符a-z分别对应整数1-26，则’abcd’映射为’1234’，但这种映射不是一一映射。比如’11234’可能是’kbcd’也可能是’aabcd’。因此我们定义一个常用的转换公式：
No.28 实现strStr() - 图2
写成通式：
No.28 实现strStr() - 图3 ,
这里指数倒过来是为了防止数值溢出。
显然，这种方法的复杂度是 No.28 实现strStr() - 图4 ，如果要进行 No.28 实现strStr() - 图5 。
但由于haystack中窗口逐位滚动的特点，我们可以通过计算滚动哈希码来节省计算量。

滚动哈希码
简单地说，就是窗口向右移动一位时，窗口内字符串的变化是去掉最左边字符同时最右边新增一个字符。
还是以’abcd’为例，现在它右移一位变成了’bcde’。用通式我们可以计算得到：
No.28 实现strStr() - 图6
由于我们之前已经得到了 No.28 实现strStr() - 图7 ，显然也可以以如下方式计算：
No.28 实现strStr() - 图8
这样我们就成功地将窗口内哈希码的计算量变为 No.28 实现strStr() - 图9 了，用哈希码来进行匹配，复杂度就是 No.28 实现strStr() - 图10 。

数值上溢
随着字符集或窗长的增大，哈希码值将会迅速增长，存在数值上溢的风险。对于int32来说，其取值不应该超过 No.28 实现strStr() - 图11 ，为了防止这种情况，可以对哈希值取模，例如: No.28 实现strStr() - 图12 。
尽管这样做会破坏哈希映射的唯一性，导致匹配存在出错的可能，但仍可以通过校验来纠正。