In this post I am going to talk about N-grams, a concept found in Natural Language Processing ( aka NLP). First of all, let’s see what the term ‘N-gram’ means. Turns out that is the simplest bit, an N-gram is simply a sequence of N words. For instance, let us take a look at the following examples.
- San Francisco (is a 2-gram)
- The Three Musketeers (is a 3-gram)
- She stood up slowly (is a 4-gram)
Now which of these three N-grams have you seen quite frequently? Probably, “San Francisco” and “The Three Musketeers”. On the other hand, you might not have seen “She stood up slowly” that frequently. Basically, “She stood up slowly” is an example of an N-gram that does not occur as often in sentences as Examples 1 and 2.
Now if we assign a probability to the occurrence of an N-gram or the probability of a word occurring next in a sequence of words, it can be very useful. Why? Continue reading
I live in Mexico City, one of the largest cities in the world. As all large cities, it has its ghastly demons. I am willing to argue that the one lying below it makes it unique in the world — We live atop what remains of a lake. Of course, it is not visible at plain sight (I would not surround it in this mystery setting otherwise): Our city is not Amsterdam or Venice, embracing the water, but we seem to hide it as if we were ashamed of it — there is a small surviving fraction of the once great Texcoco lake in the city’s South East, in the touristic Xochimilco borough, plus a wider region in the North East that just refused to go away, but has been historically shunned by everybody.
The lake, however invisible, is far from dry: Although it is mostly mud, it does contain some water pockets that are harvested to help our 22 million people have running water. That, of course, together with an impressive set of aqueducts bringing water from over 400Km away.