There's an old adage about regular expressions: "Some people, when faced with a problem, think: 'I know, I'll use regular expressions.'" Now they have two problems. ” This proves how messy and complex regular expressions can be.
This is where RegexBuilder in version 5.7 of the Swift language shines. RegexBuilder simplifies writing regular expressions and makes them more readable. In this article, we'll cover getting started with RegexBuilder, including using the various RegexBuilder components such as CharacterClass, Currency, and date.
Jump forward:
-
Set up Swift Playground on Xcode
-
Using the regular expression API
-
Regular Expression Generator API
-
RegexBuilder quantifier
-
Match RegexBuilder component
-
Capture matching text
Set up Swift Playground on Xcode
You can use Swift language on many platforms, Windows 11 won't eject external hard drive? 8 fixes included for Linux. RegexBuilder is supported on Linux, but for this tutorial we will be using Swift on Mac since we are using the UIKit library, which is only available on Mac.
First, open Xcode. Then create a Swift Playground application. Once you have done this, navigate to File in the menu and click New > Playground. Name it RegexBuilderPlayground. You will see the default code that imports UIKit and declares the variable greeting:
Using the regular expression API
Before learning how to use the new RegexBuilder API, what should I do if my WiFi keeps disconnecting on Windows 11? 12 Fixes You should be familiar with the original Regex API.
Replace the default code you get when creating a new playground with the following code:
import UIKit let regex = /\d+@\w+/ let match = "12345@hello".firstMatch(of: regex) print(match!.0)
Compile and run the code and you will get the following results:
12345@hello
As you can see, regular expressions are written with this mysterious syntax: /\d+@\w+/.
\d represents a number, \d+ represents one or more numbers, @ represents the literal @, \w represents a word character, and \w+ represents one or more word characters. This/is the boundary of regular expression syntax.
The next line is how to match a string with a regular expression using the firstMatch method. The result is match purpose. You will get with 0 methods, if any.
Regular Expression Generator API
Now, it's time to check the equivalent code of RegexBuilderAPI. How to Fix WhatsApp Web QR Code Not Working (10 Ways) There is a shortcut to convert the old regular expression syntax to RegexBuilder syntax. Highlight and right-click on the old regular expression syntax (control-click while holding down the control button) and you should see an option to refactor the old regular expression syntax into the new RegexBuilder syntax:
The new regular expression syntax will look like this:
let regex = Regex { OneOrMore(.digit) "@" OneOrMore(.word) }
With this new syntax, you no longer need to wonder what the \d method is. In the RegexBuilderAPI, the mysterious \d+ has been replaced with a friendlier syntax, OneOrMore(.digit). It's clear what the OneOrMore(.digit) method is. Same situation as \w+, its replacement syntax, OneOrMore(.word) is much clearer.
Also, note that the import line RegexBuilder has been added:
import RegexBuilder
RegexBuilder quantifier
OneOrMore is a quantifier. In the legacy API, the quantifiers were *, which meant zero or more, + which meant one or more, ? which meant zero or one, and {n,m} which meant, at least, n repetitions, At most, m is repeated.
If you want the @ on the left to be optional, you can use the Optionally quantifier:
let regex2 = Regex { Optionally(.digit) "@" OneOrMore(.word) }
The above code means /\d?@\w+/.
How to View Your Instagram Reels View History (5 Ways) What if you want at least four digits on the left and up to six digits? @? You can use Repeat:
let regex3 = Regex { Repeat(4...6) { .digit } "@" OneOrMore(.word) }
Match RegexBuilder component
Let's start learning RegexBuilder from scratch. Add the following code:
let text = "Writer/Arjuna Sky Kok/$1,000/December 4, 2022" let text2 = "Illustrator/Karen O'Reilly/$350/November 30, 2022"
This example demonstrates that you work for LogRocket and need to parse a freelancer's payment text. This text variable indicates that LogRocket should pay Arjuna Sky Kok $1,000 for writing services no later than December 4, 2022. This text2 variable indicates that LogRocket should pay Karen O'Reilly $350 for illustration services on November 30, 2022.
You want to parse the text into four parts, the job part, the name part, the payment amount, and the payment due date.
Use ChoiceOf to indicate choice
Let's start with the job component. Based on the code above, the job is either "Writer" or "Illustrator". You can create a regular expression that expresses your selection.
Add the following code:
let job = Regex { ChoiceOf { "Writer" "Illustrator" } }
As you can see in the code, How to Make the Windows 11 Taskbar Fully Transparent you use ChoiceOf to represent a choice. You put what you want to choose into the ChoiceOf block. You're not limited to two options. You can add more options, but each one requires a dedicated line. In the legacy API, you would use |.
You can add the following code with text variable:
if let jobMatch = text.firstMatch(of: job) { let (wholeMatch) = jobMatch.output print(wholeMatch) }
If you compile and run this program, you will get the following output:
Writer
This means that your regular expression matches the job components. You can use the following command to test text2 variable if you wish.
CharacterClass
Now, let's move on to the next component: the name. The name is defined by one or more word characters, optional spaces, and a single quote character. Generally speaking, names can be more complex than this. But for our example this definition is sufficient.
Here is the regex for your name component:
let name = Regex { OneOrMore( ChoiceOf { CharacterClass(.word) CharacterClass(.whitespace) "'" } ) }
You've seen OneOrMore and ChoiceOf. But there's a new component: CharacterClass. In the old API, this was equivalent to \d, \s, \w, etc. It is a representative of a type of character.
CharacterClass(.word) represents word characters, such as a, b, c, d, etc. CharacterClass(.whitespace) represents whitespace, such as spaces, tabs, etc. In addition to .word and .space, you have several character classes. If you want a numeric CharacterClass, you can write CharacterClass(.digit) to represent 1, 2, 3, etc.
Therefore, the name consists of one or more word characters, any spaces, and a single quote character.
You can try using this regex text variable:
if let nameMatch = "Karen O'Reilly".firstMatch(of: name) { let (wholeMatch) = nameMatch.output print(wholeMatch) }
The output is what you would expect:
Karen O'Reilly
currency
Now, let’s move onto the next component: payments. The text you want to match is "$1,000" or "$350". You can create a complex regular expression to match these two payments by checking for the $ sign and optional comma. However, there is an easier way:
let USlocale = Locale(identifier: "en_US") let payment = Regex { One(.localizedCurrency(code: "USD", locale: USlocale)) }
You can use .localizedCurrency with USD code and US locale. This way you can change the code and locale in case you want to match payments in a different currency, such as "¥1,000".
The regular expression component One is similar to OneOrMore. It represents exactly one occurrence of the expression.
Add the following code to the file, then compile and run the program to see the results:
if let paymentMatch = text.firstMatch(of: payment) { let (wholeMatch) = paymentMatch.output print(wholeMatch) }
The results are a little different than the previous ones. You will get:
1000
The result is not $1,000, but the original number, 1,000. Behind the scenes, RegexBuilder converts matching text into integers.
date
There is an equivalent regular expression for dates. You want to parse the date component, December 4, 2022. You can take the same approach. You don't need to create a custom regular expression to parse dates. You add the following code using a date regular expression component:
let date = Regex { One(.date(.long, locale: USlocale, timeZone: .gmt)) }
This time you are using the .date and .long parameters, the same locale and GMT time zone. The date "December 4, 2022" you want to parse is in long format. If you use a different format for the date, you will use different parameters.
Now you should test it by adding the following code and running the program:
if let dateMatch = text.firstMatch(of: date) { let (wholeMatch) = dateMatch.output print(wholeMatch) }
The result is a date format, not an exact string:
2022-12-04 00:00:00 +0000
Just like the payment case, RegexBuilder converts matching text into dates.
Capture matching text
Now you want to combine all the RegexBuilder codes to match the full text. You can stack all Regex blocks:
let separator = Regex { "/" } let regexCode = Regex { job separator name separator payment separator date }
So you can give a variable a subset regex and use it in a larger variable with Regex blocking.
Then you should test it with these two texts:
if let match = text.firstMatch(of: regexCode) { let (wholeMatch) = match.output print(wholeMatch) } if let match2 = text2.firstMatch(of: regexCode) { let (wholeMatch) = match2.output print(wholeMatch) }
The output is perfect:
Writer/Arjuna Sky Kok/$1,000/December 4, 2022 Illustrator/Karen O'Reilly/$350/November 30, 2022
But we are not satisfied, because we want to capture every component, interesting notes - share valuable tutorials! rather than the entire component. Add the following code:
let regexCodeWithCapture = Regex { Capture { job } separator Capture { name } separator Capture { payment } separator Capture { date } }
We put the component to be captured into the Capture block. In this example, we put four components inside a block.
This way, when matching text with a regular expression, the captured components can be accessed. In the legacy RegexAPI, we call this a backreference. Add the following code to get the captured components:
if let matchWithCapture = text.firstMatch(of: regexCodeWithCapture) { let (wholeMatch) = matchWithCapture.output print(wholeMatch.0) print(wholeMatch.1) print(wholeMatch.2) print(wholeMatch.3) print(wholeMatch.4) }
Compile and run the program and you will get the following output:
Writer/Arjuna Sky Kok/$1,000/December 4, 2022 Writer Arjuna Sky Kok 1000 2022-12-04 00:00:00 +0000
This 0 method refers to an exact match. This method points to the first captured component, the job component. Then 2 is for the name, 3 is for the payment, and 4 is for the date. You don't have 5 methods because you only captured four components.
in conclusion
In this article, you learned how to write regular expressions using RegexBuilder. You first write the regular expression using the old API and then convert it to the new syntax. This shows how regular expressions can become easier to read. You reviewed concepts such as quantifiers, selections, character classes, currencies, and dates. Finally, you capture the components of a regular expression.
This article only scratches the surface of RegexBuilder. There are some things you haven't learned yet, like repeating behaviors and using the capture component TryCapture. You can also read about its evolution in the RegexBuilderAPI documentation here. The code for this article can be found in this GitHub repository.