Regular Expressions in Swift
Like everyone else in the Pacific Northwest, we got snowed-in over the weekend. To pass the time, we decided to break out our stash of board games: Carcassonne, Machi Koro, Power Grid, Pandemic; we had plenty of excellent choices available. But cooped up in our home for the afternoon, we decided on a classic: Cluedo.
Me, I’m an avid fan of Cluedo — and yes, that’s what I’m going to call it. Because despite being born and raised in the United States, where the game is sold and marketed exclusively under the name “Clue”, I insist on referring to it by its proper name. (Otherwise, how would we refer to the 1985 film adaptation?)
Alas, my relentless pedantry often causes me to miss out on invitations to play. If someone were to ask:
var invitation = "Hey, would you like to play Clue?"
invitation.contains("Cluedo") // false
…I’d have no idea what they were talking about. If only they’d bothered to phrase it properly, there’d be no question about their intention:
invitation = "Fancy a game of Cluedo™?"
invitation.contains("Cluedo") // true
Of course,
a regular expression
would allow me to relax my exacting standards.
I could listen for /Clue(do)?™?/
and never miss another invitation.
But who can be bothered to figure out regexes in Swift, anyway?
Well,
sharpen your pencils,
gather your detective notes,
and warm up your 6-sided dice,
because this week on NSHipster,
we’re cracking the case of the cumbersome class known as NSRegular
.
Who killed regular expressions in Swift?
I have a suggestion:
It was
NSRegular
, in the API, with the cumbersome usability.Expression
In any other language, regular expressions are something you can sling around in one-liners.
-
Need to substitute one word for another?
Boom: regular expression. -
Need to extract a value from a templated string?
Boom: regular expression. -
Need to parse XML?
Boom:regular expressionactually, you should really use an XML parser in this case
But in Swift,
you have to go through the trouble of
initializing an NSRegular
object
and converting back and forth from String
ranges to NSRange
values.
It’s a total drag.
Here’s the good news:
- You don’t need
NSRegular
to use regular expressions in Swift.Expression - Recent additions in Swift 4 and 5 make it much, much nicer
to use
NSRegular
when you need to.Expression
Let’s interrogate each of these points, in order:
Regular Expressions without NSRegularExpression
You may be surprised to learn that you can,
in fact,
use regular expressions in a Swift one-liner —
you just have to bypass NSRegular
entirely.
Matching Strings Against Patterns
When you import the Foundation framework,
the Swift String
type automatically gets access to
NSString
instance methods and initializers.
Among these is range(of:options:range:locale:)
,
which finds and returns the first range of the specified string.
Normally, this performs a by-the-books substring search operation. Meh.
But, if you pass the .regular
option,
the string argument is matched as a pattern.
Eureka!
Let’s take advantage of this lesser-known feature to dial our Cluedo sense to the “American” setting.
import Foundation
let invitation = "Fancy a game of Cluedo™?"
invitation.range(of: #"\b Clue(do)?™?\b"#,
options: .regular Expression) != nil // true
If the pattern matches the specified string,
the method returns a Range<String.Index>
object.
Therefore, checking for a non-nil
value
tells us whether or not a match occurred.
The method itself provides default arguments to the
options
, range
, and locale
parameters;
by default, it performs a localized, unqualified search
over the entire string in the current locale.
Within a regular expression,
the ?
operator matches the preceding character or group zero or one times.
We use it in our pattern
to make the “-do” in “Cluedo” optional
(accommodating both the American and correct spelling),
and allow a trademark symbol (™)
for anyone wishing to be prim and proper about it.
The \b
metacharacters match if the current position is a word boundary,
which occurs between word (\w
) and non-word (\W
) characters.
Anchoring our pattern to match on word boundaries
prevents false positives like “Pseudo-Cluedo”.
That solves our problem of missing out on invitations. The next question is how to respond in kind.
Searching and Retrieving Matches
Rather than merely checking for a non-nil
value,
we can actually use the return value
to see the string that got matched.
import Foundation
func respond(to invitation: String) {
if let range = invitation.range(of: #"\b Clue(do)?™?\b"#,
options: .regular Expression) {
switch invitation[range] {
case "Cluedo":
print("I'd be delighted to play!")
case "Clue":
print("Did you mean Cluedo? If so, then yes!")
default:
fatal Error("(Wait... did I mess up my regular expression?)")
}
} else {
print("Still waiting for an invitation to play Cluedo.")
}
}
Conveniently,
the range returned by the range(of:...)
method
can be plugged into a subscript to get a Substring
for the matching range.
Finding and Replacing Matches
Once we’ve established that the game is on, the next step is to read the instructions. (Despite its relative simplicity, players often forget important rules in Cluedo, such as needing to be in a room in order to suggest it.)
Naturally, we play the original, British edition. But as a favor to the American players, I’ll go to the trouble of localizing the rules on-the-fly. For example, the victim’s name in the original version is “Dr. Black”, but in America, it’s “Mr. Boddy”.
We automate this process
using the replacing
method —
again passing the .regular
option.
import Foundation
let instructions = """
The object is to solve by means of elimination and deduction
the problem of the mysterious murder of Dr. Black.
"""
instructions.replacing Occurrences(
of: #"(Dr\.|Doctor) Black"#,
with: "Mr. Boddy",
options: .regular Expression
)
Regular Expressions with NSRegularExpression
There are limits to what we can accomplish with
the range(of:options:range:locale:)
and
replacing
methods.
Specifically,
you’ll need to use NSRegular
if you want to match a pattern more than once in a string
or extract values from capture groups.
Enumerating Matches with Positional Capture Groups
A regular expression can match its pattern one or more times on a string. Within each match, there may be one or more capture groups, which are designated by enclosing by parentheses in the regex pattern.
For example, let’s say you wanted to use regular expressions to determine how many players you need to play Cluedo:
import Foundation
let description = """
Cluedo is a game of skill for 2-6 players.
"""
let pattern = #"(\d+)[ \p{Pd}](\d+) players"#
let regex = try NSRegular Expression(pattern: pattern, options: [])
This pattern includes two capture groups for
one or more digits,
as denoted by the +
operator and \d
metacharacter, respectively.
Between them, we match on a set containing a space
and any character in the
Unicode General Category
Pd
(Punctuation, dash).
This allows us to match on
hyphen / minus (-
), en dash (–
), em dash (—
),
or whatever other exotic typographical marks we might encounter.
We can use the enumerate
method
to try each match until we find one that
has three ranges (the entire match and the two capture groups),
whose captured values can be used to initialize a valid range.
In the midst of all of this,
we use the new(-ish)
NSRange(_: in:)
and Range(_:in:)
initializers
to convert between String
and NSString
index ranges.
Once we find such a match,
we set the third closure parameter (a pointer to a Boolean value)
to true
as a way to tell the enumeration to stop.
var player Range: Closed Range<Int>?
let nsrange = NSRange(description.start Index..<description.end Index,
in: description)
regex.enumerate Matches(in: description,
options: [],
range: nsrange) { (match, _, stop) in
guard let match = match else { return }
if match.number Of Ranges == 3,
let first Capture Range = Range(match.range(at: 1),
in: description),
let second Capture Range = Range(match.range(at: 2),
in: description),
let lower Bound = Int(description[first Capture Range]),
let upper Bound = Int(description[second Capture Range]),
lower Bound > 0 && lower Bound < upper Bound
{
player Range = lower Bound...upper Bound
stop.pointee = true
}
}
print(player Range!)
// Prints "2...6"
Each capture group can be accessed by position
by calling the range(at:)
method on the match object.
*Sigh*. What? No, we like the solution we came up with — longwinded as it may be. It’s just… gosh, wouldn’t it be nice if we could play Cluedo solo?
Matching Multi-Line Patterns with Named Capture Groups
The only thing making Cluedo a strictly multiplayer affair is that you need some way to test a theory without revealing the answer to yourself.
If we wanted to write a program to check that without spoiling anything for us, one of the first steps would be to parse a suggestion into its component parts: suspect, location, and weapon.
let suggestion = """
I suspect it was Professor Plum, \
in the Dining Room, \
with the Candlestick.
"""
When writing a complex regular expression,
it helps to know exactly which features your platform supports.
In the case of Swift,
NSRegular
is a wrapper around the
ICU regular expression engine,
which lets us do some really nice things:
let pattern = #"""
(?xi)
(?<suspect>
((Miss|Ms\.) \h Scarlett?) |
((Colonel | Col\.) \h Mustard) |
((Reverend | Mr\.) \h Green) |
(Mrs\. \h Peacock) |
((Professor | Prof\.) \h Plum) |
((Mrs\. \h White) | ((Doctor | Dr\.) \h Orchid))
),?(?-x: in the )
(?<location>
Kitchen | Ballroom | Conservatory |
Dining \h Room | Library |
Lounge | Hall | Study
),?(?-x: with the )
(?<weapon>
Candlestick
| Knife
| (Lead(en)?\h)? Pipe
| Revolver
| Rope
| Wrench
)
"""#
let regex = try NSRegular Expression(pattern: pattern, options: [])
First off,
declaring the pattern with a multi-line raw string literal
is a huge win in terms of readability.
That, in combination with the x
and i
flags within those groups,
allows us to use whitespace to organize our expression
into something more understandable.
Another nicety is how
this pattern uses named capture groups
(designated by (?<name>)
) instead of the
standard, positional capture groups from the previous example.
Doing so allows us to access groups by name
by calling the range(with
method on the match object.
Beyond the more outlandish maneuvers, we have affordances for regional variations, including the spelling of “Miss Scarlet(t)”, the title of “Mr. / Rev. Green”, and the replacement of Mrs. White with Dr. Orchid in standard editions after 2016.
let nsrange = NSRange(suggestion.start Index..<suggestion.end Index,
in: suggestion)
if let match = regex.first Match(in: suggestion,
options: [],
range: nsrange)
{
for component in ["suspect", "location", "weapon"] {
let nsrange = match.range(with Name: component)
if nsrange.location != NSNot Found,
let range = Range(nsrange, in: suggestion)
{
print("\(component): \(suggestion[range])")
}
}
}
// Prints:
// "suspect: Professor Plum"
// "location: Dining Room"
// "weapon: Candlestick"
Regular expressions are a powerful tool for working with text, but it’s often a mystery how to use them in Swift. We hope this article has helped clue you into finding a solution.