Web-based voice command recognition

Last time we converted audio buffers into images. This time we’ll take these images and train a neural network using deeplearn.js. The result is a browser-based demo that lets you speak a command (“yes” or “no”), and see the output of the classifier in real-time, like this:

Curious to play with it, see whether or not it recognizes yay or nay in addition to yes and noTry it out live. You will quickly see that the performance is far from perfect. But that’s ok with me: this example is intended to be a reasonable starting point for doing all sorts of audio recognition on the web. Now, let’s dive into how this works. Continue reading

Audio features for web-based ML

One of the first problems presented to students of deep learning is to classify handwritten digits in the MNIST dataset. This was recently ported to the web thanks to deeplearn.js. The web version has distinct educational advantages over the relatively dry TensorFlow tutorial. You can immediately get a feeling for the model, and start building intuition for what works and what doesn’t. Let’s preserve this interactivity, but change domains to audio. This post sets the scene for the auditory equivalent of MNIST. Rather than recognize handwritten digits, we will focus on recognizing spoken commands. We’ll do this by converting sounds like this:

Into images like this, called log-mel spectrograms, and in the next post, feed these images into the same types of models that do handwriting recognition so well:


The audio feature extraction technique I discuss here is generic enough to work for all sorts of audio, not just human speech. The rest of the post explains how. If you don’t care and just want to see the code, or play with some live demos, be my guest! Continue reading

Natural Language Understanding : Let’s Play Dumb

What is the meaning of the word understanding? This was a question posed during  a particularly enlightening lecture given by Dr. Anupam Basu, a professor with the  Department of Computer Science Engineering at IIT Kharagpur, India.

Understanding something probably relates to being able to answer questions based on it, maybe form an image or a flow chart in your head. If you can make another human being comprehend the concept with the least amount of effort, well that means you do truly understand what you are talking about. But what about a computer? How does it understand? Continue reading

Notes from IIT Kharagpur ACM Summer School on ML and NLP


Entrance to library and academic area.

[This entry has been edited for clarity. An example given discussing the similarity of words in French and English was incorrect. The following sentence has been removed: “The next question addressed by Bhattacharya was the ambiguity that may arise in languages with similar origins, for example in French ‘magazine’ actually means shop while in English, well it is a magazine.”]

Today is June 14th, so I am 14 days into summer school; 7 more days left, and we are all already feeling saddened by the idea of leaving Kharagpur soon. In India, an IIT is a dream for 90% of the 12th graders who join IIT coaching classes. The competition is high so not everyone gets in. I’m one of those who didn’t get in. So when I saw there was an ACM Summer School opportunity at the largest and oldest IIT in India, obviously I grabbed it. By sheer luck, I was selected to actually attend the school. Over the course of 21 days, we have been tasked to learn about machine learning and natural language processing. Continue reading

Convolutional Neural Networks (CNNs): An Illustrated Explanation

Artificial Neural Networks (ANNs) are used everyday for tackling a broad spectrum of prediction and classification problems, and for scaling up applications which would otherwise require intractable amounts of data. ML has been witnessing a “Neural Revolution”1 since the mid 2000s, as ANNs found application in tools and technologies such as search engines, automatic translation, or video classification. Though structurally diverse, Convolutional Neural Networks (CNNs) stand out for their ubiquity of use, expanding the ANN domain of applicability from feature vectors to variable-length inputs.

Continue reading