【0基础教程】Javascript 正则表达式里的分组捕捉 Capturing Groups 使用方法及原理

在这里插入图片描述

一、从最简单开始

现有一个字符串： “1-apple”
需要解析出 1 和 apple 来，对应的正则表达式很简单： ^(\d)-(.+)$
其中，
^ 表示字符串的开始，然后一个圆括号包裹的内容表示一个"组"：(\d) 表示一组数字， (.+) 表示是一组任意值；$ 表示字符串的结束。
对应的代码：

        const str = "1-apple"
        const regexp = /^(\d)-(.+)$/
        let match = regexp.exec(str)
        console.log(match[0])
        console.log(match[1])
        console.log(match[2])

注意，尽管只有两个括号，但是正则的返回值却有三个：

match[0] = 1-apple
match[1] = 1
match[2] = apple

match[0] 返回的是一个完整匹配，其值就是str本身。显然在这个例子中 match[0] 并没有什么用，因为我们真正要解析的内容是 match[1] 和 match[2]。

二、重复模式分组

如果字符串中出现多次重复的模式（ pattern ），比如碰到这种情况时：
1-apple
2-orange
3-pear
以上三行文字，尽管内容不同，但规律是一样的，即每一行都是：(数字)-(任意值)
因此我们可以使用一个正则表达式来匹配这3行文字，注意后面需要增加 gm 标识符，（g代表global，m代表multi line），代码如下：

        const str = "1-apple\n2-orange\n3-pear"
        const regexp = /^(\d)-(.+)$/gm   //注意：和上例相比，唯一的区别是后面加了gm标志
        let match = regexp.exec(str)
        console.log(match[0])
        console.log(match[1])
        console.log(match[2])

运行结果：

match[0] = 1-apple
match[1] = 1
match[2] = apple

只解析了 1-apple 出来，那么我们如何解析出 2-orange 和 3-pear 呢？
让我们修改一下代码：

        const str = "1-apple\n2-orange\n3-pear"
        const regexp = /^(\d)-(.+)$/gm
        let match
        match = regexp.exec(str) // 第1次触发
        console.log(match[0])
        console.log(match[1])
        console.log(match[2])

        match = regexp.exec(str) // 第2次触发
        console.log(match[0])
        console.log(match[1])
        console.log(match[2])

        match = regexp.exec(str) // 第3次触发
        console.log(match[0])
        console.log(match[1])
        console.log(match[2])

运行结果：

1-apple
1
apple

扫描二维码关注公众号，回复： 15575458 查看本文章

2-orange
2
orange

3-pear
3
pear

在这个例子里，我们连续三次触发了match = regexp.exec(str) 这行语句，
事实上，尽管语句完全一样，但是每一次match的返回值都不同。在带有 gm 标识符时，.exec方法会将当前匹配值的首字符位置保存在 index 变量里（注：不带 gm 标识符没有这个变量），当下一次触发.exec的时候，index 并不是从0开始搜索，而是从第一次匹配完成之后的位置进行第二次匹配，如此反复，直至将整个字符匹配完成为止。

知道了 .exec 这个方法可以反复执行这个小秘密之后，将代码再改改就很简单了：

        const str = "1-apple\n2-orange\n3-pear"
        const regexp = /^(\d)-(.+)$/gm   //注意：和上例相比，唯一的区别是后面加了gm标志
        let match
        while ((match = regexp.exec(str)) !== null) {
    
    
            console.log('index = ', match.index)
            console.log('result = ', match[0], match[1], match[2])
        }

运行结果：

index = 0
result = 1-apple 1 apple
index = 8
result = 2-orange 2 orange
index = 17
result = 3-pear 3 pear

或者，我们还可以将结果 push 到一个二维数组里以方便使用：

        const str = "1-apple\n2-orange\n3-pear"
        const regexp = /^(\d)-(.+)$/gm 
        let match
        let matches = []
        while ((match = regexp.exec(str)) !== null) {
    
    
            matches.push(match)
        }
        console.log(matches[0][1]) // 显示：1
        console.log(matches[0][2]) // 显示：apple
        console.log(matches[1][1]) // 显示：2
        console.log(matches[1][2]) // 显示：orange
        console.log(matches[2][1]) // 显示：3
        console.log(matches[2][2]) // 显示：pear

三、matchAll 登场

如果你不喜欢while循环方式，还可以使用 matchAll ，就可以不必使用 while 循环加 exec 方式。使用 matchAll 会得到一个迭代器的返回值，配合 for…of, array spread, 或者 Array.from() 可以更方便实现功能。

        const str = "1-apple\n2-orange\n3-pear"
        const regexp = /^(\d)-(.+)$/gm

        const matches = str.matchAll(regexp) // 采用matchAll进行匹配

        for (const match of matches) {
    
     // 采用 for of 方式读取
            console.log('index = ', match.index)
            console.log('result = ', match[0], match[1], match[2])
        }

这段代码效果和上例完全一样。

此外，我们也可以用Array.from() 实现同样的效果：

        const str = "1-apple\n2-orange\n3-pear"
        const regexp = /^(\d)-(.+)$/gm

        const matches = str.matchAll(regexp) 
        Array.from(matches, (match) => {
    
     // 采用 Array.from方式读取
            console.log('index = ', match.index)
            console.log('result = ', match[0], match[1], match[2])
        })

或者更简单的ES6的“三个点”语法（ array spread ）：

        const str = "1-apple\n2-orange\n3-pear"
        const regexp = /^(\d)-(.+)$/gm

        const matches = [...str.matchAll(regexp)] // 采用 ... 方式展开str.matchAll(regexp)

		// 遍历数组
        for (let i = 0; i < matches.length; i++) {
    
    
            console.log('index = ', matches[i].index)
            console.log('result = ', matches[i][0], matches[i][1], matches[i][2])
        }

三个点称为 “array spread” 或者“展开语法”，它的作用很多很杂，没有展开语法的时候，只能组合使用 push, splice, concat 等方法，来将已有数组元素变成新数组的一部分。有了展开语法，通过字面量方式，构造新数组会变得更简单、更优雅，比如：

        const str = "1-apple\n2-orange\n3-pear"
        const regexp = /^(\d)-(.+)$/gm

        const matches = [...str.matchAll(regexp)] // 采用 ... 方式展开str.matchAll(regexp)

        console.log(matches[0][1]) // 显示：1
        console.log(matches[0][2]) // 显示：apple
        console.log(matches[1][1]) // 显示：2
        console.log(matches[1][2]) // 显示：orange
        console.log(matches[2][1]) // 显示：3
        console.log(matches[2][2]) // 显示：pear

三个点 “…” 将 str.matchAll(regexp) 展开之后，我们将直接获得一个二维数组，不需要额外执行 push 操作，与前面介绍的办法相比，更简单、直观。

四、后记

本文只列举了非常简单的关于正则分组的基础案例，进一步研究可以阅读以下资料：

【matchAll()详解】：
https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Reference/Global_Objects/String/matchAll

【array spread 展开语法详解】：
https://developer.mozilla.org/zh-CN/docs/Web/JavaScript/Reference/Operators/Spread_syntax