NSDataDetector
Text means nothing without context.
What gives weight to our words is their relation to one another, to ourselves, and to our location space-time.
Consider endophoric expressions whose meaning depends on the surrounding text, or deictic expressions, whose meaning is dependent on who the speaker is, where they are, and when they said it. Now consider how difficult it would be for a computer to make sense of an utterance like “I’ll be home in 5 minutes”? (And that’s to say nothing of the challenges of ambiguity and variation in representations of dates, addresses, and other information.)
For better or worse, that’s how we communicate. And until humanity embraces RDF for our daily interactions, computers will have to work overtime to figure out what the heck we’re all talking about.
There’s immense value in transforming natural language into structured data that’s compatible with our calendars, address books, maps, and reminders. Manual data entry, however, amounts to drudgery, and is the last thing you want to force on users.
On other platforms,
you might delegate this task to a web service
or hack something together that works well enough.
Fortunately for us Cocoa developers,
Foundation us covered with NSData
.
You can use NSData
to extract
dates, links, phone numbers, addresses, and transit information
from natural language text.
First, create a detector,
by specifying the result types that you’re interested in.
Then call the enumerate
method,
passing the text to be processed.
The provided closure is executed once for each result.
let string = "123 Main St. / (555) 555-1234"
let types: NSText Checking Result.Checking Type = [.phone Number, .address]
let detector = try NSData Detector(types: types.raw Value)
detector.enumerate Matches(in: string,
options: [],
range: range) { (result, _, _) in
print(result)
}
NSString *string = @"123 Main St. / (555) 555-1234";
NSError *error = nil;
NSData Detector *detector =
[NSData Detector data Detector With Types:NSText Checking Type Address |
NSText Checking Type Phone Number
error:&error];
[detector enumerate Matches In String:string
options:k Nil Options
range:NSMake Range(0, [string length])
using Block:
^(NSText Checking Result *result, NSMatching Flags flags, BOOL *stop) {
NSLog(@"%@", result);
}];
As you might expect, running this code produces two results: the address “123 Main St.” and the phone number “(555) 555-1234”.
When initializing
NSData
, specify only the types you’re interested in because any unused types will only slow you down.Detector
Discerning Information from Results
NSData
produces NSText
objects.
On the one hand,
this makes sense
because NSData
is actually a subclass of NSRegular
.
On the other hand,
there’s not much overlap between a pattern match and detected data
other than the range and type.
So what you get is an API that’s polluted
and offers no strong guarantees about what information is present
under which circumstances.
To make matters worse,
NSText
is also used byChecking Result NSSpell
. Gross.Server
To get information about data detector results,
you need to first check its result
;
depending on that,
you might access information directly through properties,
(in the case of links, phone numbers, and dates),
or indirectly by keyed values on the components
property
(for addresses and transit information).
Here’s a rundown of the various
NSData
result types
and their associated properties:
Type | Properties |
---|---|
.link |
|
.phone |
|
.date |
|
.address |
|
.transit |
|
Data Detector Data Points
Let’s put NSData
through its paces.
That way, we’ll not only have a complete example of how to use it
to its full capacity
but see what it’s actually capable of.
The following text contains one of each of the type of data
that NSData
should be able to detect:
let string = """
My flight (AA10) is scheduled for tomorrow night from 9 PM PST to 5 AM EST.
I'll be staying at The Plaza Hotel, 768 5th Ave, New York, NY 10019.
You can reach me at 555-555-1234 or [email protected]
"""
We can have NSData
check for everything
by passing NSText
to its initializer.
The rest is a matter of switching over each result
and extracting their respective details:
let detector = try NSData Detector(types: NSText Checking All Types)
let range = NSRange(string.start Index..<string.end Index, in: string)
detector.enumerate Matches(in: string,
options: [],
range: range) { (match, flags, _) in
guard let match = match else {
return
}
switch match.result Type {
case .date:
let date = match.date
let time Zone = match.time Zone
let duration = match.duration
print(date, time Zone, duration)
case .address:
if let components = match.components {
let name = components[.name]
let job Title = components[.job Title]
let organization = components[.organization]
let street = components[.street]
let locality = components[.city]
let region = components[.state]
let postal Code = components[.zip]
let country = components[.country]
let phone Number = components[.phone]
print(name, job Title, organization, street, locality, region, postal Code, country, phone Number)
}
case .link:
let url = match.url
print(url)
case .phone Number:
let phone Number = match.phone Number
print(phone Number)
case .transit Information:
if let components = match.components {
let airline = components[.airline]
let flight = components[.flight]
print(airline, flight)
}
default:
return
}
}
When we run this code,
we see that NSData
is able to identify each of the types.
Type | Output |
---|---|
Date | “2018-08-31 04:00:00 +0000”, “America/Los_Angeles”, 18000.0 |
Address |
nil , nil , nil “768 5th Ave”, “New York”, “NY”, “10019”, nil , nil
|
Link | “mailto:[email protected]” |
Phone Number | “555-555-1234” |
Transit Information |
nil , “10” |
Impressively, the date result correctly calculates the 5-hour duration of the flight, accommodating for the time zone change. However, some information is missing, like the name of The Plaza Hotel in the address, and the airline in the transit information.
Even after trying a handful of different representations (“American Airlines 10”, “AA 10”, “AA #10”, “American Airlines (AA) #10”) and airlines (“Delta 1226”, “DL 1226”) I still wasn’t able to find an example that populated the
airline
property. If anyone knows what’s up, @ us.
Detect (Rough) Edges
Useful as NSData
is,
it’s not a particularly nice API to use.
With all of the charms of its parent class,
NSRegular
,
the same, cumbersome initialization pattern of
NSLinguisticTagger,
and an
incomplete Swift interface,
NSData
has an interface that only a mother could love.
But that’s only the API itself.
In a broader context,
you might be surprised to learn that a nearly identical API can be found
in the data
properties of UIText
and WKWeb
.
Nearly identical.
UIData
and WKData
are distinct from
and incompatible with NSText
,
which is inconvenient but not super conspicuous.
But what’s utterly inexplicable is that these APIs
can detect shipment tracking numbers
and lookup suggestions,
neither of which are supported by NSData
.
It’s hard to imagine why shipment tracking numbers wouldn’t be supported,
which leads one to believe that it’s an oversight.
Humans have an innate ability to derive meaning from language. We can stitch together linguistic, situational and cultural information into a coherent interpretation at a subconscious level. Ironically, it’s difficult to put this process into words — or code as the case may be. Computers aren’t hard-wired for understanding like we are.
Despite its shortcomings,
NSData
can prove invaluable for certain use cases.
Until something better comes along,
take advantage of it in your app
to unlock the structured information hiding in plain sight.