数据结构与算法分类练习－－字典字符串

字典是Python语言中唯一的映射类型。格式为：

d = {key1 : value1, key2 : value2 }

映射类型对象里哈希值（键，key）和指向的对象（值，value）是一对多的的关系，通常被认为是可变的哈希表。字典对象是可变的，它是一个容器类型，能存储任意个数的Python对象，其中也可包括其他容器类型。

字典类型与序列类型（列表，元组）的区别是存取和访问数据的方式不同。1.序列类型只用数字类型的键（从序列的开始按数值顺序索引）；映射类型可以用其他对象类型作键，一般最常见的是用字符串作键。2.和序列类型的键不同，映射类型的键直接或间接地和存储数据值相关联。3.映射类型中的数据是无序排列的，序列类型是以数值序排列的。－－摘自python核心编程。

Word Pattern词语模式

Given a pattern and a string str, find if str follows the same pattern.

Examples:

pattern = "abba", str = "dog cat cat dog" should return true.
pattern = "abba", str = "dog cat cat fish" should return false.

Notes:

Both pattern and str contains only lowercase alphabetical letters.
Both pattern and str do not have leading or trailing spaces.
Each word in str is separated by a single space.
Each letter in pattern must map to a word with length that is at least 1.

建立模式字符串中每个字符和单词字符串每个单词之间的映射。从第一个字符开始，首先检查其是否在哈希表中出现，若出现，其映射的单词若不是此时对应的单词，则返回false。如果没有在哈希表中出现，还要看新遇到的单词是否已经是哈希表中的映射，如果是则返回false。

class Solution(object):
    def wordPattern1(self, pattern, str):
        words = str.split(' ')
        dic = {}
        used = set()
        
        # key: patter 
        # value: word_str
        if len(words) != len(pattern):
            return False 
        if not words and not pattern:
            return True  
        for i in range(len(pattern)):
            if pattern[i] not in dic:
                dic[pattern[i]] = words[i]
                if words[i] in used:
                    return False
                used.add(words[i])
            if dic[pattern[i]] != words[i]:
                return False           
        return True

Longest Substring Without Repeating Characters最长无重复字符的子串

Given a string, find the length of the longest substring without repeating characters.

Examples:

Given "abcabcbb", the answer is "abc", which the length is 3.

Given "bbbbb", the answer is "b", with the length of 1.

分别定义longest, offset, n, index为当前最长字符串，偏移量（最左值），字符计数，无重复字符字典。从头开始遍历，当前字符不在字典中则加入；在字典中则更新坐标，如果原坐标还大于偏移量，则说明又遇到了重复字符，需要更新longest和offset。当遍历完在更新一次longest即可。

class Solution:
    # @return an integer
    def lengthOfLongestSubstring(self, s):
        str_len = len(s)
        if str_len <= 1:
            return str_len
        if str_len == 2:
            return 1 if s[0] == s[1] else 2
        longest, offset, n, index = 0, 0, 0, {}
        for ch in s:
            if ch in index and index[ch] >= offset:
                longest = max(n - offset, longest)
                offset = index[ch] + 1
            index[ch] = n
            n += 1   
        longest = max(n - offset, longest)
        return longest

Minimum Window Substring最小窗口子串

Given a string S and a string T, find the minimum window in S which will contain all the characters in T in complexity O(n).

For example, S = "ADOBECODEBANC", T = "ABC"

Minimum window is "BANC".

Note:

If there is no such window in S that covers all characters in T, return the emtpy string "".

If there are multiple such windows, you are guaranteed that there will always be only one unique minimum window in S.

用字典need保存需要查找的每个字符及数量，missing保存当前窗口还缺少字符的个数。从头开始遍历，如果need[c]>0说明找到一个需要的字符，更新missing和need。当缺失的字符都找到了，将窗口左侧边界往右移，如果当前字符不在T中直接略掉，如果在T中则需要窗口右端再次遇到这个字符时才可跳过，然后比较并更新返回范围。s[i:j]表示当前窗口，s[I:J]表示返回窗口。

def minWindow(self, s, t):
    need, missing = collections.Counter(t), len(t)
    i = I = J = 0
    for j, c in enumerate(s, 1):
        missing -= need[c] > 0
        need[c] -= 1
        if not missing:
            while i < j and need[s[i]] < 0:
                need[s[i]] += 1
                i += 1
            if not J or j - i <= J - I:
                I, J = i, j
    return s[I:J]

Repeated Substring Pattern重复子字符串模式

Given a non-empty string check if it can be constructed by taking a substring of it and appending multiple copies of the substring together. You may assume the given string consists of lowercase English letters only and its length will not exceed 10000.

Input: "abab" Output: True

参考leetcode discuss

设S1 = S + S，去掉S1的第一个和最后一个字符得S2，如果S存在与S2中，则S为重复子串构成。假设S在S1中的起始位置为p，那么S[:p]即为重复子串。

充分条件：设S＝sy sy，S1= sy sy sy sy，S2= sx sy sy sx 包含S。

必要条件：设S在S1中的起始位置为p，s1=S[:p]，S=s1x1，则有S=s1x1=x1s1 -- (1)

其中，len(x1)>=len(s1)。因为如果len(x1)<len(s1)，那么S在S1中的起始位置应该为len(x1)而不是len(s1)。

当len(x1)=len(s1)时，由(1)得s1=x1 => S是由重复子串构成。

当len(x1)>len(s1)时，则x1为s1开头的字符串，设x1=s1x2，由(1)知s1s1x2=s1x2s1 => s1x2=x2s1 -- (2)

同理，len(x2)>=len(s1)。由(1)(2)知，S=s1s1x2=x2s1s1，如果len(x2)<len(s1)，则S在S1中的起始位置为len(x2)。

如此迭代下去，最终有len(xn)=len(s1)，且s1xn=xns1 => xn=s1 => S=s1x1=s1s1x2=...=s1s1...s1xn => S是由重复子串构成。

class Solution(object):
    def repeatedSubstringPattern(self, s):
        """
        :type s: str
        :rtype: bool
        """
        if not s:
            return False
        return str in (2 * str)[1:-1]

Validate IP Address 验证IP地址

分别就分隔符，字段数，字段长度，字符范围，字段格式进行判断。

class Solution(object):    
    def validIPAddress(self, IP):
        def isIPv4(s):
            try: return str(int(s)) == s and 0 <= int(s) <= 255
            except: return False
            
        def isIPv6(s):
            if len(s) > 4: return False
            try: return int(s, 16) >= 0 and s[0] != '-'
            except: return False

        if IP.count(".") == 3 and all(isIPv4(i) for i in IP.split(".")): 
            return "IPv4"
        if IP.count(":") == 7 and all(isIPv6(i) for i in IP.split(":")): 
            return "IPv6"
        return "Neither"