NSScanner
Strings are a ubiquitous and diverse part of our computing lives. They comprise emails and essays, poems and novels—and indeed, every article on nshipster.com, the configuration files that shape the site, and the code that builds it.
Being able to pull apart strings and extract particular bits of data is therefore a powerful skill, one that we use over and over building apps and shaping our tools. Cocoa provides a powerful set of tools to handle string processing. In particular:
-
string.components
/Separated By Characters In Set string.components
: Great for splitting a string into constituent pieces. Not so great at anything else.Separated By String -
NSRegular
: Powerful for validating and extracting string data from an expected format. Cumbersome when dealing with complex serial input and finicky for parsing numeric values.Expression -
NSData
: Perfect for detecting and extracting dates, addresses, links, and more. Limited to its predefined types.Detector -
NSScanner
: Highly configurable and designed for scanning string and numeric values from loosely demarcated strings.
This week’s article focuses on the last of these, NSScanner
. Read on to learn about its flexibility and power.
Among Cocoa’s tools, NSScanner
serves as a wrapper around a string, scanning through its contents to efficiently retrieve substrings and numeric values. It offers several properties that modify an NSScanner
instance’s behavior:
case
Sensitive Bool
: Whether to pay attention to the upper- or lower-case while scanning. Note that this property only applies to the string-matching methodsscan
andString:into String: scan
—character sets scanning is always case-sensitive.Up To String:into String: characters
To Be Skipped NSCharacter
: The characters to skip over on the way to finding a match for the requested value type.Set scan
Location Int
: The current position of the scanner in its string. Scanning can be rewound or restarted by setting this property.locale
NSLocale
: The locale that the scanner should use when parsing numeric values (see below).
An NSScanner
instance has two additional read-only properties: string
, which gives you back the string the scanner is scanning; and at
, which is true if scan
is at the end of the string.
Note:
NSScanner
is actually the abstract superclass of a private cluster of scanner implementation classes. Even though you’re callingalloc
andinit
onNSScanner
, you’ll actually receive one of these subclasses, such asNSConcrete
. No need to fret over this.Scanner
Extracting Substrings and Numeric Values
The raison d’être of NSScanner
is to pull substrings and numeric values from a larger string. It has fifteen methods to do this, all of which follow the same basic pattern. Each method takes a reference to an output variable as a parameter and returns a boolean value indicating success or failure of the scan:
var whitespace And Punctuation Set = Character Set.whitespaces And Newlines
whitespace And Punctuation Set.form Union(.punctuation Characters)
let string Scanner = Scanner(string: "John & Paul & Ringo & George.")
string Scanner.characters To Be Skipped = whitespace And Punctuation Set
var name: NSString?
while string Scanner.scan Up To Characters(from: whitespace And Punctuation Set, into: &name) {
print(name ?? "")
}
// John
// Paul
// Ringo
// George
NSMutable Character Set *whitespace And Punctuation Set = [NSMutable Character Set punctuation Character Set];
[whitespace And Punctuation Set form Union With Character Set:[NSCharacter Set whitespace And Newline Character Set]];
NSScanner *string Scanner = [[NSScanner alloc] init With String:@"John & Paul & Ringo & George."];
string Scanner.characters To Be Skipped = whitespace And Punctuation Set;
NSString *name;
while ([string Scanner scan Up To Characters From Set:whitespace And Punctuation Set into String:&name]) {
NSLog(@"%@", name);
}
// John
// Paul
// Ringo
// George
The NSScanner API has methods for two use-cases: scanning for strings generally, or for numeric types specifically.
1) String Scanners
scan
/String:into String: scan
Characters From Set:into String: Scans to match the string parameter or characters in the
NSCharacter
parameter, respectively. TheSet into
parameter will return containing the scanned string, if found. These methods are often used to advance the scanner’s location—passString nil
for theinto
parameter to ignore the output.String scan
/Up To String:into String: scan
Up To Characters From Set:into String: Scans characters into a string until finding the string parameter or characters in the
NSCharacter
parameter, respectively. TheSet into
parameter will return containing the scanned string, if any was found. If the given string or character set are not found, the result will be the entire rest of the scanner’s string.String
2) Numeric Scanners
scan
/Double: scan
/Float: scan
Decimal: Scans a floating-point value from the scanner’s string and returns the value in the referenced
Double
,Float
, orNSDecimal
instance, if found.scan
/Integer: scan
/Int: scan
/Long Long: scan
Unsigned Long Long: Scans an integer value from the scanner’s string and returns the value in the referenced
Int
,Int32
,Int64
, orUInt64
instance, if found.scan
/Hex Double: scan
Hex Float: Scans a hexadecimal floating-point value from the scanner’s string and returns the value in the referenced
Double
orFloat
instance, if found. To scan properly, the floating-point value must have a0x
or0X
prefix.scan
/Hex Int: scan
Hex Long Long: Scans a hexadecimal integer value from the scanner’s string and returns the value in the referenced
UInt32
orUInt64
instance, if found. The value may have a0x
or0X
prefix, but it is not required.
localized Scanner With String / locale
Because it is a part of Cocoa, NSScanner
has built-in localization support (of course). An NSScanner
instance can work with either the user’s locale when created via + localized
, or a specific locale after setting its locale
property. In particular, the separator for floating-point values will be correctly interpreted based on the given locale:
var price = 0.0
let gas Price Scanner = Scanner(string: "2.09 per gallon")
gas Price Scanner.scan Double(&price)
// 2.09
// use a german locale instead of the default
let benzin Price Scanner = Scanner(string: "1,38 pro Liter")
benzin Price Scanner.locale = Locale(identifier: "de-DE")
benzin Price Scanner.scan Double(&price)
// 1.38
double price;
NSScanner *gas Price Scanner = [[NSScanner alloc] init With String:@"2.09 per gallon"];
[gas Price Scanner scan Double:&price];
// 2.09
// use a german locale instead of the default
NSScanner *benzin Price Scanner = [[NSScanner alloc] init With String:@"1,38 pro Liter"];
[benzin Price Scanner set Locale:[NSLocale locale With Locale Identifier:@"de-DE"]];
[benzin Price Scanner scan Double:&price];
// 1.38
Example: Parsing SVG Path Data
To take NSScanner
out for a spin, we’ll look at parsing the path data from an SVG path. SVG path data are stored as a string of instructions for drawing the path, where “M” indicates a “move-to” step, “L” stands for “line-to”, and “C” stands for a curve. Uppercase instructions are followed by points in absolute coordinates; lowercase instructions are followed by coordinates relative to the last point in the path.
Here’s an SVG path I happen to have lying around (and a point-offsetting helper we’ll use later):
var svg Path Data = "M28.2,971.4c-10,0.5-19.1,13.3-28.2,2.1c0,15.1,23.7,30.5,39.8,16.3c16,14.1,39.8-1.3,39.8-16.3c-12.5,15.4-25-14.4-39.8,4.5C35.8,972.7,31.9,971.2,28.2,971.4z"
extension CGPoint {
func offset(_ p: CGPoint) -> CGPoint {
return CGPoint(x: x + p.x, y: y + p.y)
}
}
static NSString *const svg Path Data = @"M28.2,971.4c-10,0.5-19.1,13.3-28.2,2.1c0,15.1,23.7,30.5,39.8,16.3c16,14.1,39.8-1.3,39.8-16.3c-12.5,15.4-25-14.4-39.8,4.5C35.8,972.7,31.9,971.2,28.2,971.4z";
CGPoint offset Point(CGPoint p1, CGPoint p2) {
return CGPoint Make(p1.x + p2.x, p1.y + p2.y);
}
Note that the point data are fairly irregular. Sometimes the x
and y
values of a point are separated by a comma, sometimes not, and likewise with points themselves. Parsing these data with regular expressions could turn into a mess pretty quickly, but with NSScanner
the code is clear and straightforward.
We’ll define a bezier
function that will convert a string of path data into an UIBezier
. Our scanner is set up to skip commas and whitespace while scanning for values:
func bezier Path From SVGPath(str: String) -> UIBezier Path {
let scanner = Scanner(string: str)
// skip commas and whitespace
var skip Chars = Character Set(characters In: ",")
skip Chars.form Union(.whitespaces And Newlines)
scanner.characters To Be Skipped = skip Chars
// the resulting bezier path
let path = UIBezier Path()
- (UIBezier Path *)bezier Path From SVGPath:(NSString *)str {
NSScanner *scanner = [NSScanner scanner With String:str];
// skip commas and whitespace
NSMutable Character Set *skip Chars = [NSMutable Character Set character Set With Characters In String:@","];
[skip Chars form Union With Character Set:[NSCharacter Set whitespace And Newline Character Set]];
scanner.characters To Be Skipped = skip Chars;
// the resulting bezier path
UIBezier Path *path = [UIBezier Path bezier Path];
With the setup out of the way, it’s time to start scanning. We start by scanning for a string made up of characters in the allowed set of instructions:
// instructions code can be upper- or lower-case
let instruction Set = Character Set(characters In: "MCSQTAmcsqta")
var instruction: NSString?
// scan for an instruction code
while scanner.scan Characters(from: instruction Set, into: &instruction) {
// instructions codes can be upper- or lower-case
NSCharacter Set *instruction Set = [NSCharacter Set character Set With Characters In String:@"MCSQTAmcsqta"];
NSString *instruction;
// scan for an instruction code
while ([scanner scan Characters From Set:instruction Set into String:&instruction]) {
The next section scans for two Double
values in a row, converts them to a CGPoint
, and then ultimately adds the correct step to the bezier path:
var x = 0.0, y = 0.0
var points: [CGPoint] = []
// scan for pairs of Double, adding them as CGPoints to the points array
while scanner.scan Double(&x) && scanner.scan Double(&y) {
points.append(CGPoint(x: x, y: y))
}
// new point for bezier path
switch instruction {
case "M":
path.move(to: points[0])
case "C":
path.add Curve(to: points[2], control Point1: points[0], control Point2: points[1])
case "c":
path.add Curve(to: path.current Point.offset(points[2]), control Point1: path.current Point.offset(points[0]),
control Point2: path.current Point.offset(points[1]))
default:
break
}
}
return path
}
double x, y;
NSMutable Array *points = [NSMutable Array array];
// scan for pairs of Double, adding them as CGPoints to the points array
while ([scanner scan Double:&x] && [scanner scan Double:&y]) {
[points add Object:[NSValue value With CGPoint:CGPoint Make(x, y)]];
}
// new point in path
if ([instruction is Equal To String:@"M"]) {
[path move To Point:[points[0] CGPoint Value]];
} else if ([instruction is Equal To String:@"C"]) {
[path add Curve To Point:[points[2] CGPoint Value]
control Point1:[points[0] CGPoint Value]
control Point2:[points[1] CGPoint Value]];
} else if ([instruction is Equal To String:@"c"]) {
CGPoint new Point = offset Point(path.current Point, [points[2] CGPoint Value]);
CGPoint control1 = offset Point(path.current Point, [points[0] CGPoint Value]);
CGPoint control2 = offset Point(path.current Point, [points[1] CGPoint Value]);
[path add Curve To Point:new Point
control Point1:control1
control Point2:control2];
}
}
[path apply Transform:CGAffine Transform Make Scale(1, -1)];
return path;
}
Lo and behold, the result:
The required flipping, resizing, waxing, and twirling are left as an exercise for the reader.
Swift-Friendly Scanning
As a last note, working with NSScanner
in Swift can feel almost silly. Really, NSScanner
, I need to pass in a pointer just so you can return a Bool
? I can’t use optionals, which are pretty much designed for this exact purpose? Really?
With a simple extension converting the built-in methods to ones returning optional values, scanning becomes far more in sync with Swift’s idiom. Our path data scanning example can now use optional binding instead of inout
variables for a cleaner, easier-to-read implementation:
// look for an instruction code
while let instruction = scanner.scan Characters From Set(instruction Set) {
var points: [CGPoint] = []
// scan for pairs of Double, adding them as CGPoints to the points array
while let x = scanner.scan Double(), y = scanner.scan Double() {
points.append(CGPoint(x: x, y: y))
}
// new point for bezier path
switch instruction {
…
}
}
You’ve gotta have the right tools for every job. NSScanner
can be the shiny tool to reach for when it’s time to parse a user’s input or a web service’s data. Being able to distinguish which tools are right for which tasks helps us on our way to creating clear and accurate code.