Evolution of NLP Techniques based on the Google Books Corpus

Great Ideas in current Computer Science Research

Computer Science (CS) Research is an emergent and exciting area. Classical parts of CS are being reshaped to fit a more modern concept of computing. One domain that is experiencing a renaissance is Natural Language Processing (NLP). Classical NLP tasks are being expanded to include time-series information allowing us to capture evolutionary dynamics, and not just static information.  For example, the word “bitch” was historically synonymous with a female dog, and more recently became (pejoratively) synonymous with the word “feminist.”

ACM BLOG IMG1Fig1:  The Trend of “Feminist” Over Time and Its Close Relatives

Traditional thesauruses do not contain information on when this synonymy was generated, nor the surrounding events that gave rise to this. This additional information about the historicity of the linguistic change is so innovative that it blurs the boundary between disparate disciplines: NLP and Computational Linguistics. This added dimension also allows us to challenge the foundations of traditional NLP research.

Language is the foundation of civilization. The story of the Tower of Babel in the Bible describes language as the uniting force among humanity, the key to its technological advancement and ability to become like G-d. Speaking one same language, Babel’s inhabitants were able to work together to develop a city and build a tower high enough to reach heaven. Seeing this, G-d mixes up their language, taking away the source of the inhabitants’ power by breaking down their mutual understanding. This story illustrates the power and cultural significance of universal language. Continue reading

Web-based voice command recognition

Last time we converted audio buffers into images. This time we’ll take these images and train a neural network using deeplearn.js. The result is a browser-based demo that lets you speak a command (“yes” or “no”), and see the output of the classifier in real-time, like this:

Curious to play with it, see whether or not it recognizes yay or nay in addition to yes and noTry it out live. You will quickly see that the performance is far from perfect. But that’s ok with me: this example is intended to be a reasonable starting point for doing all sorts of audio recognition on the web. Now, let’s dive into how this works. Continue reading

Audio features for web-based ML

One of the first problems presented to students of deep learning is to classify handwritten digits in the MNIST dataset. This was recently ported to the web thanks to deeplearn.js. The web version has distinct educational advantages over the relatively dry TensorFlow tutorial. You can immediately get a feeling for the model, and start building intuition for what works and what doesn’t. Let’s preserve this interactivity, but change domains to audio. This post sets the scene for the auditory equivalent of MNIST. Rather than recognize handwritten digits, we will focus on recognizing spoken commands. We’ll do this by converting sounds like this:

Into images like this, called log-mel spectrograms, and in the next post, feed these images into the same types of models that do handwriting recognition so well:

final-log-mel-spectrogram

The audio feature extraction technique I discuss here is generic enough to work for all sorts of audio, not just human speech. The rest of the post explains how. If you don’t care and just want to see the code, or play with some live demos, be my guest! Continue reading

The Power of Bisection Logic

Bisection or Binary logic is an example of a simple yet powerful idea in computer science that has today become an integral part of every computer scientist’s arsenal. It stands synonymous to logarithmic time complexity that is no less than music to a programmer’s ears. Yet, the technique never fails to surprise us with all the creative ways it has been put to use to solve some tricky programming problems. This blog-post will endeavour to acquaint you with a few such problems and their solutions to delight you and make you appreciate it’s ingenuity and efficacy. Continue reading

Traceroute to the Front Door: Trimming Public Net Hops

Hop_0:

Traceroute is a wonderful computer networking diagnostic tool. This article will attempt a traceroute script written in Python, including a customization that can identify just the routers within your private network (from the host up to and including the public/internet gateway). Continue reading