Getting started with RegexBuilder on Swift

There's an old adage about regular expressions: "Some people, when faced with a problem, think: 'I know, I'll use regular expressions.'" Now they have two problems. ” This proves how messy and complex regular expressions can be.

This is where RegexBuilder in version 5.7 of the Swift language shines. RegexBuilder simplifies writing regular expressions and makes them more readable. In this article, we'll cover getting started with RegexBuilder, including using the various RegexBuilder components such as CharacterClass, Currency, and date.

Jump forward:

  • Set up Swift Playground on Xcode

  • Using the regular expression API

  • Regular Expression Generator API

  • RegexBuilder quantifier

  • Match RegexBuilder component

  • Capture matching text

Set up Swift Playground on Xcode

You can use Swift language on many platforms, Windows 11 won't eject external hard drive? 8 fixes included for Linux. RegexBuilder is supported on Linux, but for this tutorial we will be using Swift on Mac since we are using the UIKit library, which is only available on Mac.

First, open Xcode. Then create a Swift Playground application. Once you have done this, navigate to File in the menu and click New > Playground. Name it RegexBuilderPlayground. You will see the default code that imports UIKit and declares the variable greeting:

Using the regular expression API

Before learning how to use the new RegexBuilder API, what should I do if my WiFi keeps disconnecting on Windows 11? 12 Fixes You should be familiar with the original Regex API.

Replace the default code you get when creating a new playground with the following code:

import UIKit
let regex = /\d+@\w+/
let match = "12345@hello".firstMatch(of: regex)
print(match!.0)

Compile and run the code and you will get the following results:

12345@hello

As you can see, regular expressions are written with this mysterious syntax: /\d+@\w+/.

\d represents a number, \d+ represents one or more numbers, @ represents the literal @, \w represents a word character, and \w+ represents one or more word characters. This/is the boundary of regular expression syntax.

The next line is how to match a string with a regular expression using the firstMatch method. The result is match purpose. You will get with 0 methods, if any.

Regular Expression Generator API

Now, it's time to check the equivalent code of RegexBuilderAPI. How to Fix WhatsApp Web QR Code Not Working (10 Ways) There is a shortcut to convert the old regular expression syntax to RegexBuilder syntax. Highlight and right-click on the old regular expression syntax (control-click while holding down the control button) and you should see an option to refactor the old regular expression syntax into the new RegexBuilder syntax:

The new regular expression syntax will look like this:

let regex = Regex {
    OneOrMore(.digit)
    "@"
    OneOrMore(.word)
}

With this new syntax, you no longer need to wonder what the \d method is. In the RegexBuilderAPI, the mysterious \d+ has been replaced with a friendlier syntax, OneOrMore(.digit). It's clear what the OneOrMore(.digit) method is. Same situation as \w+, its replacement syntax, OneOrMore(.word) is much clearer.

Also, note that the import line RegexBuilder has been added:

import RegexBuilder

RegexBuilder quantifier

OneOrMore is a quantifier. In the legacy API, the quantifiers were *, which meant zero or more, + which meant one or more, ? which meant zero or one, and {n,m} which meant, at least, n repetitions, At most, m is repeated.

If you want the @ on the left to be optional, you can use the Optionally quantifier:

let regex2 = Regex {
    Optionally(.digit)
    "@"
    OneOrMore(.word)
}

The above code means /\d?@\w+/.

How to View Your Instagram Reels View History (5 Ways) What if you want at least four digits on the left and up to six digits? @? You can use Repeat:

let regex3 = Regex {
    Repeat(4...6) {
        .digit
    }
    "@"
    OneOrMore(.word)
}

Match RegexBuilder component

Let's start learning RegexBuilder from scratch. Add the following code:

let text = "Writer/Arjuna Sky Kok/$1,000/December 4, 2022"
let text2 = "Illustrator/Karen O'Reilly/$350/November 30, 2022"

This example demonstrates that you work for LogRocket and need to parse a freelancer's payment text. This text variable indicates that LogRocket should pay Arjuna Sky Kok $1,000 for writing services no later than December 4, 2022. This text2 variable indicates that LogRocket should pay Karen O'Reilly $350 for illustration services on November 30, 2022.

You want to parse the text into four parts, the job part, the name part, the payment amount, and the payment due date.

Use ChoiceOf to indicate choice

Let's start with the job component. Based on the code above, the job is either "Writer" or "Illustrator". You can create a regular expression that expresses your selection.

Add the following code:

let job = Regex {
    ChoiceOf {
        "Writer"
        "Illustrator"
    }
}

As you can see in the code, How to Make the Windows 11 Taskbar Fully Transparent you use ChoiceOf to represent a choice. You put what you want to choose into the ChoiceOf block. You're not limited to two options. You can add more options, but each one requires a dedicated line. In the legacy API, you would use |.

You can add the following code with text variable:

if let jobMatch = text.firstMatch(of: job) {
    let (wholeMatch) = jobMatch.output
    print(wholeMatch)
}

If you compile and run this program, you will get the following output:

Writer

This means that your regular expression matches the job components. You can use the following command to test text2 variable if you wish.

CharacterClass

Now, let's move on to the next component: the name. The name is defined by one or more word characters, optional spaces, and a single quote character. Generally speaking, names can be more complex than this. But for our example this definition is sufficient.

Here is the regex for your name component:

let name = Regex {
    OneOrMore(
        ChoiceOf {
            CharacterClass(.word)
            CharacterClass(.whitespace)
            "'"
        }
    )
}

You've seen OneOrMore and ChoiceOf. But there's a new component: CharacterClass. In the old API, this was equivalent to \d, \s, \w, etc. It is a representative of a type of character.

CharacterClass(.word) represents word characters, such as a, b, c, d, etc. CharacterClass(.whitespace) represents whitespace, such as spaces, tabs, etc. In addition to .word and .space, you have several character classes. If you want a numeric CharacterClass, you can write CharacterClass(.digit) to represent 1, 2, 3, etc.

Therefore, the name consists of one or more word characters, any spaces, and a single quote character.

You can try using this regex text variable:

if let nameMatch = "Karen O'Reilly".firstMatch(of: name) {
    let (wholeMatch) = nameMatch.output
    print(wholeMatch)
}

The output is what you would expect:

Karen O'Reilly

currency

Now, let’s move onto the next component: payments. The text you want to match is "$1,000" or "$350". You can create a complex regular expression to match these two payments by checking for the $ sign and optional comma. However, there is an easier way:

let USlocale = Locale(identifier: "en_US")
let payment = Regex {
    One(.localizedCurrency(code: "USD", locale: USlocale))
}

You can use .localizedCurrency with USD code and US locale. This way you can change the code and locale in case you want to match payments in a different currency, such as "¥1,000".

The regular expression component One is similar to OneOrMore. It represents exactly one occurrence of the expression.

Add the following code to the file, then compile and run the program to see the results:

if let paymentMatch = text.firstMatch(of: payment) {
    let (wholeMatch) = paymentMatch.output
    print(wholeMatch)
}

The results are a little different than the previous ones. You will get:

1000

The result is not $1,000, but the original number, 1,000. Behind the scenes, RegexBuilder converts matching text into integers.

date

There is an equivalent regular expression for dates. You want to parse the date component, December 4, 2022. You can take the same approach. You don't need to create a custom regular expression to parse dates. You add the following code using a date regular expression component:

let date = Regex {
    One(.date(.long, locale: USlocale, timeZone: .gmt))
}

This time you are using the .date and .long parameters, the same locale and GMT time zone. The date "December 4, 2022" you want to parse is in long format. If you use a different format for the date, you will use different parameters.

Now you should test it by adding the following code and running the program:

if let dateMatch = text.firstMatch(of: date) {
    let (wholeMatch) = dateMatch.output
    print(wholeMatch)
}

The result is a date format, not an exact string:

2022-12-04 00:00:00 +0000

Just like the payment case, RegexBuilder converts matching text into dates.

Capture matching text

Now you want to combine all the RegexBuilder codes to match the full text. You can stack all Regex blocks:

let separator = Regex { "/" }
let regexCode = Regex {
    job
    separator
    name
    separator
    payment
    separator
    date
}

So you can give a variable a subset regex and use it in a larger variable with Regex blocking.

Then you should test it with these two texts:

if let match = text.firstMatch(of: regexCode) {
    let (wholeMatch) = match.output
    print(wholeMatch)
}
if let match2 = text2.firstMatch(of: regexCode) {
    let (wholeMatch) = match2.output
    print(wholeMatch)
}

The output is perfect:

Writer/Arjuna Sky Kok/$1,000/December 4, 2022
Illustrator/Karen O'Reilly/$350/November 30, 2022

But we are not satisfied, because we want to capture every component, interesting notes - share valuable tutorials! rather than the entire component. Add the following code:

let regexCodeWithCapture = Regex {
    Capture {
        job
    }
    separator
    Capture {
        name
    }
    separator
    Capture {
        payment
    }
    separator
    Capture {
        date
    }
}

We put the component to be captured into the Capture block. In this example, we put four components inside a block.

This way, when matching text with a regular expression, the captured components can be accessed. In the legacy RegexAPI, we call this a backreference. Add the following code to get the captured components:

if let matchWithCapture = text.firstMatch(of: regexCodeWithCapture) {
    let (wholeMatch) = matchWithCapture.output
    print(wholeMatch.0)
    print(wholeMatch.1)
    print(wholeMatch.2)
    print(wholeMatch.3)
    print(wholeMatch.4)
}

Compile and run the program and you will get the following output:

Writer/Arjuna Sky Kok/$1,000/December 4, 2022
Writer
Arjuna Sky Kok
1000
2022-12-04 00:00:00 +0000

This 0 method refers to an exact match. This method points to the first captured component, the job component. Then 2 is for the name, 3 is for the payment, and 4 is for the date. You don't have 5 methods because you only captured four components.

in conclusion

In this article, you learned how to write regular expressions using RegexBuilder. You first write the regular expression using the old API and then convert it to the new syntax. This shows how regular expressions can become easier to read. You reviewed concepts such as quantifiers, selections, character classes, currencies, and dates. Finally, you capture the components of a regular expression.

This article only scratches the surface of RegexBuilder. There are some things you haven't learned yet, like repeating behaviors and using the capture component TryCapture. You can also read about its evolution in the RegexBuilderAPI documentation here. The code for this article can be found in this GitHub repository.

Guess you like

Origin blog.csdn.net/weixin_47967031/article/details/132836483