In this post, I am going to talk about precision and recall and their importance in information retrieval. First of all, let’s talk about what we mean by information retrieval. Suppose you wake up one morning and decide you want to make muffins for breakfast. You take out your laptop and search for “healthy muffin recipe” on Google. Then, you go through the search results, decide on a recipe and get started on it. This is an example of information retrieval where the search engine (Google in this case) retrieved the results for your search query “healthy muffin recipe”.
In this post I am going to talk about N-grams, a concept found in Natural Language Processing ( aka NLP). First of all, let’s see what the term ‘N-gram’ means. Turns out that is the simplest bit, an N-gram is simply a sequence of N words. For instance, let us take a look at the following examples.
- San Francisco (is a 2-gram)
- The Three Musketeers (is a 3-gram)
- She stood up slowly (is a 4-gram)
Now which of these three N-grams have you seen quite frequently? Probably, “San Francisco” and “The Three Musketeers”. On the other hand, you might not have seen “She stood up slowly” that frequently. Basically, “She stood up slowly” is an example of an N-gram that does not occur as often in sentences as Examples 1 and 2.
Now if we assign a probability to the occurrence of an N-gram or the probability of a word occurring next in a sequence of words, it can be very useful. Why? Continue reading
In this post, I am going to talk about the relations in WordNet (https://wordnet.princeton.edu) and how you can use these in a Python project. WordNet is a database of English words with different relations between the words.
Take a look at the next four sentences.
- “She went home and had pasta.”
- “Then she cleaned the kitchen and sat on the sofa.”
- “A little while later, she got up from the couch.”
- “She walked to her bed and in a few minutes she was snoring loudly.”
In Natural Language Processing, we try to use computer programs to find the meaning of sentences. In the above four sentences, with the help of WordNet, a computer program will be able to identify the following –
- “pasta” is a type of dish.
- “kitchen” is a part of “home”.
- “sofa” is the same thing as “couch”.
- “snoring” implies “sleeping”.
Let’s get started with using WordNet in Python. It is included as a part of the NLTK (http://www.nltk.org/) corpus. To use it, we need to import it first.
>>> from nltk.corpus import wordnet as wn
Before we begin, let us talk about how Mike (a fictional character) spends a typical morning. Mike begins his day by searching for breakfast recipes on Google Now (https://en.wikipedia.org/wiki/Google_Now). After a filling breakfast, Mike starts getting ready for work. He asks Siri (http://www.apple.com/in/ios/siri/) to tell him the weather and traffic conditions for his drive to work. Finally, as Mike gets ready to leave the house, he asks Alexa (https://en.wikipedia.org/wiki/Amazon_Alexa) to dim the lights and thermostat. It is not even 10 a.m. yet, but Mike like many of us has already used three intelligent personal assistant applications using Natural Language Processing (NLP). We will unravel the mysteries of building intelligent personal assistants with a simple example to build such an assistant quite easily using NLP.