NSLinguisticTagger
NSLinguistic
is a veritable Swiss Army Knife of linguistic functionality, with the ability to tokenize natural language strings into words, determine their part-of-speech & stem, extract names of people, places, & organizations, and tell you the languages & respective writing system used in the string.
For most of us, this is far more power than we know what to do with. But perhaps this is just for lack sufficient opportunity to try. After all, almost every application deals with natural language in one way or another–perhaps NSLinguistic
could add a new level of polish, or enable brand new features entirely.
Introduced with iOS 5, NSLinguistic
is a contemporary to Siri, raising speculation that it was a byproduct of the personal assistant’s development.
Consider a typical question we might ask Siri:
What is the weather in San Francisco?
Computers are a long ways off from “understanding” this question literally, but with a few simple tricks, we can do a reasonable job understanding the intention of the question:
let question = "What is the weather in San Francisco?"
let options: NSLinguistic Tagger Options = [.Omit Whitespace, .Omit Punctuation, .Join Names]
let schemes = NSLinguistic Tagger.available Tag Schemes For Language("en")
let tagger = NSLinguistic Tagger(tag Schemes: schemes, options: Int(options.raw Value))
tagger.string = question
tagger.enumerate Tags In Range(NSMake Range(0, (question as NSString).length), scheme: NSLinguistic Tag Scheme Name Type Or Lexical Class, options: options) { (tag, token Range, _, _) in
let token = (question as NSString).substring With Range(token Range)
println("\(token): \(tag)")
}
NSString *question = @"What is the weather in San Francisco?";
NSLinguistic Tagger Options options = NSLinguistic Tagger Omit Whitespace | NSLinguistic Tagger Omit Punctuation | NSLinguistic Tagger Join Names;
NSLinguistic Tagger *tagger = [[NSLinguistic Tagger alloc] init With Tag Schemes: [NSLinguistic Tagger available Tag Schemes For Language:@"en"] options:options];
tagger.string = question;
[tagger enumerate Tags In Range:NSMake Range(0, [question length]) scheme:NSLinguistic Tag Scheme Name Type Or Lexical Class options:options using Block:^(NSString *tag, NSRange token Range, NSRange sentence Range, BOOL *stop) {
NSString *token = [question substring With Range:token Range];
NSLog(@"%@: %@", token, tag);
}];
This code would print the following:
What: Pronoun is: Verb the: Determiner weather: Noun in: Preposition San Francisco: PlaceName
If we filter on nouns, verbs, and place name, we get [is, weather, San Francisco]
.
Just based on this alone, or perhaps in conjunction with something like the Latent Semantic Mapping framework, we can conclude that a reasonable course of action would be to make an API request to determine the current weather conditions in San Francisco.
Tagging Schemes
NSLinguistic
can be configured to tag different kinds of information by specifying any of the following tagging schemes:
-
NSLinguistic
: Classifies tokens according to their broad type: word, punctuation, whitespace, etc.Tag Scheme Token Type -
NSLinguistic
: Classifies tokens according to class: part of speech for words, type of punctuation or whitespace, etc.Tag Scheme Lexical Class -
NSLinguistic
: Classifies tokens as to whether they are part of named entities of various types or not.Tag Scheme Name Type -
NSLinguistic
: FollowsTag Scheme Name Type Or Lexical Class NSLinguistic
for names, andTag Scheme Name Type NSLinguistic
for all other tokens.Tag Scheme Lexical Class
Here’s a list of the various token types associated with each scheme (NSLinguistic
, as the name implies, is the union between NSLinguistic
& NSLinguistic
):
NSLinguistic |
NSLinguistic |
NSLinguistic |
---|---|---|
|
|
|
So for basic tokenization, use NSLinguistic
, which will allow you to distinguish between words and whitespace or punctuation. For information like part-of-speech, or differentiation between different parts of speech, NSLinguistic
is your new bicycle.
Continuing with the tagging schemes:
-
NSLinguistic
: This tag scheme supplies a stem forms of the words, if known.Tag Scheme Lemma -
NSLinguistic
: Tags tokens according to their script. The tag values will be standard language abbreviations such asTag Scheme Language "en"
,"fr"
,"de"
, etc., as used with theNSOrthography
class. Note that the tagger generally attempts to determine the language of text at the level of an entire sentence or paragraph, rather than word by word. -
NSLinguistic
: Tags tokens according to their script. The tag values will be standard script abbreviations such asTag Scheme Script "Latn"
,"Cyrl"
,"Jpan"
,"Hans"
,"Hant"
, etc.
As demonstrated in the example above, first you initialize an NSLinguistic
with an array of all of the different schemes that you wish to use, and then assign or enumerate each of the tags after specifying the tagger’s input string.
Tagging Options
In addition to the available tagging schemes, there are several options you can pass to NSLinguistic
(combined with bitwise OR |
) to slightly change its behavior:
NSLinguistic
Tagger Omit Words NSLinguistic
Tagger Omit Punctuation NSLinguistic
Tagger Omit Whitespace NSLinguistic
Tagger Omit Other
Each of these options omit the broad categories of tags described. For example, NSLinguistic
, which distinguishes between many different kinds of punctuation, all of those would be omitted with NSLinguistic
. This is preferable to manually filtering these tag types in enumeration blocks or with predicates.
The last option is specific to NSLinguistic
:
NSLinguistic
Tagger Join Names
By default, each token in a name is treated as separate instances. In many circumstances, it makes sense to treat names like “San Francisco” as a single token, rather than two separate tokens. Passing this token makes this so.
Finally, NSString provides convenience methods that handle the setup and configuration of NSLinguisticTagger on your behalf. For one-off tokenizing, you can save a lot of boilerplate:
var token Ranges: NSArray?
let tags = "Where in the world is Carmen San Diego?".linguistic Tags In Range(
NSMake Range(0, (question as NSString).length),
scheme: NSLinguistic Tag Scheme Name Type Or Lexical Class,
options: options, orthography: nil, token Ranges: &token Ranges
)
// tags: ["Pronoun", "Preposition", "Determiner", "Noun", "Verb", "Personal Name"]
Natural language is woefully under-utilized in user interface design on mobile devices. When implemented effectively, a single utterance from the user can achieve the equivalent of a handful of touch interactions, in a fraction of the time.
Sure, it’s not easy, but if we spent a fraction of the time we use to make our visual interfaces pixel-perfect, we could completely re-imagine how users best interact with apps and devices. And with NSLinguistic
, it’s never been easier to get started.