golang正则使用总结

正则表达式的语法可以说基本和编程语言无关，大同小异，不过就是调用的方法不同而已。
以一个案例总结下golang的正则常用方法。

注：
下面的regObj指regexp.Compile后的值，可理解为正则实例.
如果是js,就表示new Regexp或者/regexp/形式定义的正则实例

regexp.MatchString (测试是否匹配，类似js的regObj.test(str)方法)

sourceStr := `my email is [email protected]`
matched, _ := regexp.MatchString(`[\w-]+@[\w]+(?:\.[\w]+)+`, sourceStr)
fmt.Printf("%v", matched) // true

regObj.FindAllStringSubmatch (提取子分组, 类似js的str.match(regObj))

var sourceStr string = `
test text     lljflsdfjdskal
[email protected]
[email protected]
[email protected]`

re := regexp.MustCompile(`[\w-]+@([\w]+(?:\.[\w]+)+)`)
matched := re.FindAllStringSubmatch(sourceStr, -1)
for _, match := range matched {
    fmt.Printf("email is: %s, domain is: %s\n", match[0], match[1])
}

输出结果：

email is: [email protected], domain is: 163.com
email is: [email protected], domain is: gmail.com
email is: [email protected], domain is: sina.com.cn

其中m[0]完整匹配输入值，m[1]为域名信息，如163.com

用js来实现一下上面的效果：

用str.match(regObj)实现

var sourceStr = `
    test text     lljflsdfjdskal
    gerrylon@163.com
    [email protected]
    [email protected]`;
var re = /[\w-]+@([\w]+(?:\.[\w]+)+)/g;
var result = sourceStr.match(re);

// ["[email protected]", "[email protected]", "[email protected]"]
// 注意这里没有输出子分组，也就是域名信息
console.log(result);

用regObj.exec(str)实现

// sourceStr 和 re和上边一样
var result = null;

// 输出和regObj.FindAllStringSubmatch实验结果一样
// email is: [email protected], domain is: 163.com
// email is: [email protected], domain is: gmail.com
// email is: [email protected], domain is: sina.com.cn
while (result = re.exec(sourceStr)) {
    console.log("email is: %s, domain is: %s\n", result[0], result[1])
}

regObj.FindAllStringIndex (输出匹配串的起始结束索引)

和regObj.FindAllStringSubmatch类似，不过这个方法输出的是索引.
为了方便观察，把sourceStr简化下

var sourceStr string = `[email protected], [email protected], [email protected]`
re := regexp.MustCompile(`[\w-]+@([\w]+(?:\.[\w]+)+)`)
matched := re.FindAllStringIndex(sourceStr, -1)
for _, pos := range matched {
    fmt.Printf("start is: %v, end is: %v\n", pos[0], pos[1])
}

输出：

start is: 0, end is: 16
start is: 18, end is: 31
start is: 33, end is: 52

结果说明:
以第一行输出为例子， 0为第一个匹配子串（[email protected]）的起始位置， 16为第一个匹配子串的最后一个字符所在索引加1，也就是第一个逗号。

总结：
来源于godoc：
说是和正则相关的方法有16个，它们的名字可以用如下正则来表示（有没有感觉好好玩？）

Find(All)?(String)?(Submatch)?(Index)?
如果带有All，就找到所有的（和相关参数配合使用）
如果带有String，参数就是string, 否则就是slice of byte
如果带有Submatch，就能匹配分组
如果带有Index，就以索引返回

瞬间明白了怎么用了，很好！

欢迎补充指正。