Text means nothing without context.
What gives weight to our words is their relation to one another, to ourselves, and to our location space-time.
Consider endophoric expressions whose meaning depends on the surrounding text, or deictic expressions, whose meaning is dependent on who the speaker is, where they are, and when they said it. Now consider how difficult it would be for a computer to make sense of an utterance like “I’ll be home in 5 minutes”? (And that’s to say nothing of the challenges of ambiguity and variation in representations of dates, addresses, and other information.)
For better or worse, that’s how we communicate. And until humanity embraces RDF for our daily interactions, computers will have to work overtime to figure out what the heck we’re all talking about.
There’s immense value in transforming natural language into structured data that’s compatible with our calendars, address books, maps, and reminders. Manual data entry, however, amounts to drudgery, and is the last thing you want to force on users.
On other platforms,
you might delegate this task to a web service
or hack something together that works well enough.
Fortunately for us Cocoa developers,
Foundation us covered with NSData
You can use NSData
to extract
dates, links, phone numbers, addresses, and transit information
from natural language text.
First, create a detector,
by specifying the result types that you’re interested in.
Then call the enumerate
passing the text to be processed.
The provided closure is executed once for each result.
let string = "123 Main St. / (555) 555-1234"
let types: NSTextCheckingResult .CheckingType = [.phoneNumber , .address]
let detector = try NSDataDetector (types: types.rawValue )
detector.enumerateMatches (in: string,
options: [],
range: range) { (result, _, _) in
NSString *string = @"123 Main St. / (555) 555-1234";
NSError *error = nil;
NSDataDetector *detector =
[NSDataDetector dataDetectorWithTypes :NSTextCheckingTypeAddress |
[detector enumerateMatchesInString :string
range:NSMakeRange (0, [string length])
^(NSTextCheckingResult *result, NSMatchingFlags flags, BOOL *stop) {
NSLog(@"%@", result);
As you might expect, running this code produces two results: the address “123 Main St.” and the phone number “(555) 555-1234”.
When initializing
, specify only the types you’re interested in because any unused types will only slow you down.Detector
Discerning Information from Results
produces NSText
On the one hand,
this makes sense
because NSData
is actually a subclass of NSRegular
On the other hand,
there’s not much overlap between a pattern match and detected data
other than the range and type.
So what you get is an API that’s polluted
and offers no strong guarantees about what information is present
under which circumstances.
To make matters worse,
is also used byChecking Result NSSpell
. Gross.Server
To get information about data detector results,
you need to first check its result
depending on that,
you might access information directly through properties,
(in the case of links, phone numbers, and dates),
or indirectly by keyed values on the components
(for addresses and transit information).
Here’s a rundown of the various
result types
and their associated properties:
Type | Properties |
.link |
.phone |
.date |
.address |
.transit |
Data Detector Data Points
Let’s put NSData
through its paces.
That way, we’ll not only have a complete example of how to use it
to its full capacity
but see what it’s actually capable of.
The following text contains one of each of the type of data
that NSData
should be able to detect:
let string = """
My flight (AA10) is scheduled for tomorrow night from 9 PM PST to 5 AM EST.
I'll be staying at The Plaza Hotel, 768 5th Ave, New York, NY 10019.
You can reach me at 555-555-1234 or [email protected]
We can have NSData
check for everything
by passing NSText
to its initializer.
The rest is a matter of switching over each result
and extracting their respective details:
let detector = try NSDataDetector (types: NSTextCheckingAllTypes )
let range = NSRange(string.startIndex ..<string.endIndex , in: string)
detector.enumerateMatches (in: string,
options: [],
range: range) { (match, flags, _) in
guard let match = match else {
switch match.resultType {
case .date:
let date = match.date
let timeZone = match.timeZone
let duration = match.duration
print(date, timeZone , duration)
case .address:
if let components = match.components {
let name = components[.name]
let jobTitle = components[.jobTitle ]
let organization = components[.organization]
let street = components[.street]
let locality = components[.city]
let region = components[.state]
let postalCode = components[.zip]
let country = components[.country]
let phoneNumber = components[.phone]
print(name, jobTitle , organization, street, locality, region, postalCode , country, phoneNumber )
case .link:
let url = match.url
case .phoneNumber :
let phoneNumber = match.phoneNumber
print(phoneNumber )
case .transitInformation :
if let components = match.components {
let airline = components[.airline]
let flight = components[.flight]
print(airline, flight)
When we run this code,
we see that NSData
is able to identify each of the types.
Type | Output |
Date | “2018-08-31 04:00:00 +0000”, “America/Los_Angeles”, 18000.0 |
Address |
nil , nil , nil “768 5th Ave”, “New York”, “NY”, “10019”, nil , nil
Link | “mailto:[email protected]” |
Phone Number | “555-555-1234” |
Transit Information |
nil , “10” |
Impressively, the date result correctly calculates the 5-hour duration of the flight, accommodating for the time zone change. However, some information is missing, like the name of The Plaza Hotel in the address, and the airline in the transit information.
Even after trying a handful of different representations (“American Airlines 10”, “AA 10”, “AA #10”, “American Airlines (AA) #10”) and airlines (“Delta 1226”, “DL 1226”) I still wasn’t able to find an example that populated the
property. If anyone knows what’s up, @ us.
Detect (Rough) Edges
Useful as NSData
it’s not a particularly nice API to use.
With all of the charms of its parent class,
the same, cumbersome initialization pattern of
and an
incomplete Swift interface,
has an interface that only a mother could love.
But that’s only the API itself.
In a broader context,
you might be surprised to learn that a nearly identical API can be found
in the data
properties of UIText
and WKWeb
Nearly identical.
and WKData
are distinct from
and incompatible with NSText
which is inconvenient but not super conspicuous.
But what’s utterly inexplicable is that these APIs
can detect shipment tracking numbers
and lookup suggestions,
neither of which are supported by NSData
It’s hard to imagine why shipment tracking numbers wouldn’t be supported,
which leads one to believe that it’s an oversight.
Humans have an innate ability to derive meaning from language. We can stitch together linguistic, situational and cultural information into a coherent interpretation at a subconscious level. Ironically, it’s difficult to put this process into words — or code as the case may be. Computers aren’t hard-wired for understanding like we are.
Despite its shortcomings,
can prove invaluable for certain use cases.
Until something better comes along,
take advantage of it in your app
to unlock the structured information hiding in plain sight.