Before we begin, let us talk about how Mike (a fictional character) spends a typical morning. Mike begins his day by searching for breakfast recipes on Google Now (https://en.wikipedia.org/wiki/Google_Now). After a filling breakfast, Mike starts getting ready for work. He asks Siri (http://www.apple.com/in/ios/siri/) to tell him the weather and traffic conditions for his drive to work. Finally, as Mike gets ready to leave the house, he asks Alexa (https://en.wikipedia.org/wiki/Amazon_Alexa) to dim the lights and thermostat. It is not even 10 a.m. yet, but Mike like many of us has already used three intelligent personal assistant applications using Natural Language Processing (NLP). We will unravel the mysteries of building intelligent personal assistants with a simple example to build such an assistant quite easily using NLP.
I. Software Architecture – The Big Picture
Before we begin diving into software, let me show you a very simplified architecture of how NLP is applied within the context of these personal assistants. As you speak to an assistant, your speech is converted into text. This text is passed as input to an NLP module, and then based on the context of the statement, we call APIs to the right service such as Banking or Weather services. In this blog we will focus on how to build the NLP module.
II. Try NLP Online
You can use Stanford’s online NLP site http://corenlp.run/ to try out English language sentences. Once you become familiar with NLP, come back here to learn more on how to program the NLP module.
III. Code using NLTK
Python’s NLTK library comes with a lot of inbuilt functions and collections of texts to help you get started with NLP. Before reading this tutorial, you may want to get NLTK installed as you can practice with some actual examples. To install NLTK you can find instructions here – http://www.nltk.org/install.html
Let’s go through some of the steps involved in NLP. Throughout the tutorial, let us assume we have the sample English sentence – “What is the weather in Chicago?”
To begin with, we first need to tokenize the sentence. This enables us to handle individual words and punctuation marks in the sentence. Below we see how to tokenize our sample sentence in Python with NLTK.
>>> from nltk import word_tokenize >>> sentence = "What is the weather in Chicago?" >>> tokens = word_tokenize(sentence)
Now “tokens” is a Python list of the words and punctuation marks as seen below.
['What', 'is', 'the', 'weather', 'in', 'Chicago', '?']
2. Stop Word Removal
Now that we have the tokens ready for processing, we can move on to stop word removal. This involves removing all the words which are unnecessary and do not really add to the semantic meaning of the sentence. Some examples of stop words are “the”, “and”, “a”, “an”, “then”, etc. NLTK provides a list of inbuilt stop words for 11 different languages.
Let’s go ahead and remove the inbuilt NLTK stop words from our list of tokens that we created previously.
>>> from nltk.corpus import stopwords >>> stop_words = set(stopwords.words('english')) >>> clean_tokens = [w for w in tokens if not w in stop_words] >>> clean_tokens ['What', 'weather', 'Chicago', '?']
As you can see, the words : “is”, “the” and “in” have been removed, making it a much more concise bag of tokens.
3. Parts of Speech Tagging
This an important part of NLP where we tag each word in a sentence as a ‘noun’, ‘verb’, ‘adjective’, etc. Below, we can see how to do this. The function nltk.pos_tag performs Parts of Speech tagging.
>>> import nltk >>> tagged = nltk.pos_tag(clean_tokens) >>> tagged [('What', 'WP'), ('weather', 'NN'), ('Chicago', 'NNP'), ('?', '.')]
Now let us understand what this means. The list “tagged” contains tuples of the form (word, tag). Below, I have listed the tags that have appeared in our “tagged” list.
|NNP||Proper noun, singular|
4. Named Entity Recognition (NER)
What do we mean by Named Entity Recognition (NER)? This goes by other names as well like Entity Identification and Entity Extraction. NER involves identifying all named entities and putting them into categories like the name of a person, an organization, a location, etc.
nltk.ne_chunk() is the function which classifies named entities.
Now, our list “tagged” from the previous stage of Parts of Speech tagging is going to be the input to this function.
>>> print(nltk.ne_chunk(tagged)) (S What/WP weather/NN (GPE Chicago/NNP) ?/.)
As you can see, Chicago has been correctly identified as a location (GPE represents locations).
IV. Call the APIs
After named entity recognition, the meaning of the sentence is analyzed (http://www.nltk.org/book/ch10.html) and the appropriate call to an API can be made. For instance, in the above sentence, after recognizing the location as “Chicago” and the context as “weather”, a call can be made to a cloud based weather service such as https://openweathermap.org/current. The current weather will then be displayed or said back to the user who asked the question – “What is the weather in Chicago?”.
This was a tutorial to get started with NLP using Python NLTK library and show how this technology is used in intelligent personal assistants such as Google Now, Siri and Amazon Alexa.