Swfit uses regular expressions

Swfit uses regular expressions

Regex usage and matching can be done in several ways in Swift. The application scenarios of each situation are different, and you can choose the one that suits you.

Tip: Using Raw String to define regular expressions can reduce the use of escape symbols\

Ordinary string regularization: let pattern = "\\d{3,11}"

Extended delimiter regularization: let pattern = #“\d{3,11}”#

Match regular expressions through NSpredicate (not recommended)

let email = "[email protected]"
let regex = "[A-Z0-9a-z._%+-]+@[A-Za-z0-9.-]+\\.[A-Za-z]{2,4}"
let predicate = NSPredicate(format: "SELF MATCHES %@", regex)
let isValid = predicate.evaluate(with: email)
print(isValid ? "正确的邮箱地址" : "错误的邮箱地址")

Use String's RangeOfString: option: to search directly

    let email = "[email protected]"
    let rangeindex = email.range(of: "[0-9]{4}", options: .regularExpression, range: email.startIndex..<email.endIndex, locale:Locale.current)
    print(email.substring(with: rangeindex!)) //输出;1483
   注意使用的option参数为regularExpression  , 还有range参数是一个半闭String.index    location指的是语言环境,一般为current 

NSRegularExpression

The NSRegularExpression class can be used for regular matching and regular replacement

regular match

/**
     正则表达判断是否含有结果值
     
     - parameter pattern: 一个字符串类型的正则表达式
     parameter str: 需要比较判断的对象
     - returns: 返回布尔值判断结果
     warning: 注意匹配到结果的话就会返回true,没有匹配到结果就会返回false
     */
    class func regex(pattern:String, str:String) -> Bool {
    
    
        let regex = try! NSRegularExpression(pattern: pattern, options:[NSRegularExpression.Options.caseInsensitive])
        let resultNum = regex.numberOfMatches(in: str, options: NSRegularExpression.MatchingOptions(rawValue: 0) , range: NSMakeRange(0, str.characters.count))
        if resultNum>=1 {
    
    
            return true
        }
        return false
    }
    
    /**
     正则表达式获取目的值
     
     - parameter pattern: 一个字符串类型的正则表达式
     parameter str: 需要比较判断的对象
     - imports: 这里子串的获取先转话为NSString的[以后处理结果含NS的还是可以转换为NS前缀的方便]
     returns: 返回目的字符串结果值数组(目前将String转换为NSString获得子串方法较为容易)
     - warning: 注意匹配到结果的话就会返回true,没有匹配到结果就会返回false
     */
    class func regexGetSub(pattern:String, str:String) -> [String] {
    
    
        var subStr = [String]()
        let regex = try! NSRegularExpression(pattern: pattern, options:[NSRegularExpression.Options.caseInsensitive])
        let results = regex.matches(in: str, options: NSRegularExpression.MatchingOptions.init(rawValue: 0), range: NSMakeRange(0, str.characters.count))
        //解析出子串
        for  rst in results {
    
    
            let nsStr = str as  NSString  //可以方便通过range获取子串
            subStr.append(nsStr.substring(with: rst.range))
            //str.substring(with: Range<String.Index>) //本应该用这个的,可以无法直接获得参数,必须自己手动获取starIndex 和 endIndex作为区间
        }
        return subStr
    }

regular replacement

 func replaceString() {
        let givenString = "hello,world"
        guard let regularExpression = try? NSRegularExpression(pattern: "hello") else { return }
        let replacedString = regularExpression.stringByReplacingMatches(in: givenString, range: NSRange(location: 0, length: givenString.utf16.count), withTemplate: "你好")
        print(replacedString)
  }

It cannot be replaced by matching first and circular matching results, and the replacement will be abnormal due to the inconsistency of the range.

Why is the NSRange initialization length parameter utf16.count of the string?

This avoids emoji and similar length miscalculation issues

NSRegularExpression.Options enumeration

Initialize the regular option parameter

enumerate describe example
caseInsensitive not case sensitive Aa is equivalent to aa
allowCommentsAndWhitespace Whitespace and # (comments) are ignored AB#CC is equivalent to AB
ignoreMetacharacters integration The \b in "AA\b" will not be regarded as a matching boundary, but a string
dotMatcheshLineSeparators Allow . to match any character, including line separators "ab" can match "a\nb"
dotMatchesLines Allow ^ and $ to match the beginning and end of a line
useUnixLineSeparators Only \n is treated as a line separator, otherwise, all standard line separators are used
useUnicodeWordBoundaries Use Unicode TR#29 to specify word boundaries, otherwise, use traditional regular expression word boundaries

MathchingFlags

This enumeration is mainly used enumerateMatches(in:options:range:using:)in the closure callback parameters of the traversal closure matching method

typedef NS_OPTIONS(NSUInteger, NSMatchingFlags) {
   //还在长时间的匹配中
   NSMatchingProgress               = 1 << 0,  
   //匹配已经完成
   NSMatchingCompleted              = 1 << 1,  
   //当前匹配操作到达搜索范围的末尾     
   NSMatchingHitEnd                 = 1 << 2,  
   //当前匹配项取决于搜索范围末端的位置    
   NSMatchingRequiredEnd            = 1 << 3, 
   //由于内部错误而导致匹配失败而没有检查整个搜索范围   
   NSMatchingInternalError          = 1 << 4     
};

MatchingOptions enumeration

enumerateMatches(in:options:range:using:)This enumeration is mainly used to traverse the parameters of the closure matching method

typedef NS_OPTIONS(NSUInteger, NSMatchingOptions) {
   //在长时间的匹配操作期间,定期回调一次。
   NSMatchingReportProgress         = 1 << 0,       
   //当匹配完成时,回调一次。
   NSMatchingReportCompletion       = 1 << 1,
   //只能匹配查询范围开始处的字符串  "aa"只能匹配"aabcd",而不能匹配"baabcd"       
   NSMatchingAnchored               = 1 << 2,      
   //允许匹配超出搜索范围的范围,例如文字边界检测,前瞻等。如果搜索范围包含整个字符串,该选项将不起作用 
   NSMatchingWithTransparentBounds  = 1 << 3,      
   //防止^和$自动匹配搜索范围的开始和结束,如果搜索范围包含整个字符串,该选项 将不起作用 
   //"^ab"默认能匹配NSMakeRange(1, 3)]范围上的"babcd"
   //当使用该选项时,则不能匹配 
   NSMatchingWithoutAnchoringBounds = 1 << 4 
};

The most commonly used is the use of regular expression strings in these three methods. The first and second methods are more convenient for judging whether they contain values, and only match once, while the third method can match and return multiple destination values.

Regular Language Quick Preview

NSRegularExpression Regular Grammar – Apple Official

Regular Expression Grammar – A Tutorial for Novices

Online regularization – regular expressions can mark groups and items to match, and can also perform regular grammar checks.

JS Regular Expression Complete Tutorial – Nuggets introduces the basic and advanced use of regular expressions in detail

c5d5e8841ff3847c9abbceb7cc1e5834

Matches all characters: [\s\S]

group

Use () to group and mark. After the match is successful, you can get the value of the corresponding group through the following table.

Example: Get the matched year, month, and day

func regularExpressionGroup() {
    
    
        let givenString = "2022-04-28"
        guard let regularExpression = try? NSRegularExpression(pattern: #"(\d{4,})-(\d{1,2})-(?<day>\d{1,2})"#) else {
    
     return }
        let results = regularExpression.matches(in: givenString, range: NSRange(location: 0, length: givenString.utf16.count))
        for result in results {
    
    
            //let all = result.range(at: 0)  // 匹配到的整个字符串
            let yearRange = result.range(at: 1) //匹配到的组1
            print("年")
            print(givenString[Range<String.Index>.init(yearRange, in: givenString)!])
            let monthRange = result.range(at: 2)
            print("月")
            print(givenString[Range<String.Index>.init(monthRange, in: givenString)!])
            let dayRange = result.range(withName: "day")
            print("日")
            print(givenString[Range<String.Index>.init(dayRange, in: givenString)!])
            
        }
    }

It is recommended to define the group by name

The custom group name is used as(?<name>子表达式)

How do nested groups determine the index?

in order of left (parentheses

What if you don't want to capture the group?

Can be added after the opening parenthesis?:

For example (?:\d)-(\d), so that the first parenthesis is not captured.

zero width breakpoint

Zero-width breakpoint: the content it matches will not be extracted, but a location matched.

The main application scenarios are as follows:

  • Exclude search, find lines that do not contain a certain string
  • Contains find, finds lines that contain a certain string

Regular Expressions - Zero Degree Assertion contains usage scenarios, usage introduction and examples, and simply distinguishes various zero degree assertions by matching first and then checking.

Greedy/non-greedy matching

?In the table, one item is a quantifier, and the default is greedy matching, that is, it is matched according to the most cases, and non-greedy matching can be realized by adding it after the quantifier .

For example, the text to be matched: a,b,c,d,

Greedy matching:.*,

The result is: a,b,c,d, only one item can be matched

Screenshot 2022-04-28 15.40.28

Non-greedy matching:.*?,

The result is: a, b, c, d, can match 4 items

Screenshot 2022-04-28 15.39.30

Guess you like

Origin blog.csdn.net/qq_14920635/article/details/124479492