Great Ideas in current Computer Science Research
Computer Science (CS) Research is an emergent and exciting area. Classical parts of CS are being reshaped to fit a more modern concept of computing. One domain that is experiencing a renaissance is Natural Language Processing (NLP). Classical NLP tasks are being expanded to include time-series information allowing us to capture evolutionary dynamics, and not just static information. For example, the word “bitch” was historically synonymous with a female dog, and more recently became (pejoratively) synonymous with the word “feminist.”
Fig1: The Trend of “Feminist” Over Time and Its Close Relatives
Traditional thesauruses do not contain information on when this synonymy was generated, nor the surrounding events that gave rise to this. This additional information about the historicity of the linguistic change is so innovative that it blurs the boundary between disparate disciplines: NLP and Computational Linguistics. This added dimension also allows us to challenge the foundations of traditional NLP research.
Language is the foundation of civilization. The story of the Tower of Babel in the Bible describes language as the uniting force among humanity, the key to its technological advancement and ability to become like G-d. Speaking one same language, Babel’s inhabitants were able to work together to develop a city and build a tower high enough to reach heaven. Seeing this, G-d mixes up their language, taking away the source of the inhabitants’ power by breaking down their mutual understanding. This story illustrates the power and cultural significance of universal language. Continue reading
In this post, I am going to talk about automated spelling correction. Let’s say you are writing a document on your computer, and instead of typing “morning”, you accidentally type “mornig”. If you have automated spelling correction enabled, you will probably see that “mornig” has been transformed to “morning” on its own. How does this work? How does your computer know that when you typed “mornig”, you actually meant “morning”? We are going to see how in this post.
Spelling mistakes could turn out to be real words!
Before we actually go through how spelling correction works, let’s think about the complexity of this problem. In the previous example, “mornig” was not a real word, so we knew it had to be a spelling mistake. But what if you misspelled “college” as “collage”, or you misspelled “three” as “tree”? In these cases, the word you typed incorrectly happens to be an actual word itself! Correcting these types of errors is called real word spelling correction. On the other hand, if the error is not a real word (like “mornig” instead of “morning”), correcting those errors is called non-word spelling correction. You can see that real world spelling correction seems more difficult than non-word spelling correction because every word that you type could be an error (even if it has a correct spelling). For example, the sentence “The tree threes were tail” makes no sense because every word except “the” and “were” is an error even though they are all actual words. The actual sentence should be “The three trees were tall”. In this post, I am going to talk about non-word spelling correction with a basic approach to it.
Over the past decade, we have seen a shift that caught many long-time computer users and developers off-guard: The advent of apps. Up to ten years ago, there was a clear trend of distributed applications becoming webapps, and the browser was seen as the new universal program delivery interface. And, as I will explain, we now see yet another quite popular webapp, Slack, closing their interoperable, standards-adhering interface to further trap users into their controlled ecosystem. Continue reading
Last time we converted audio buffers into images. This time we’ll take these images and train a neural network using deeplearn.js. The result is a browser-based demo that lets you speak a command (“yes” or “no”), and see the output of the classifier in real-time, like this:
Curious to play with it, see whether or not it recognizes yay or nay in addition to yes and no? Try it out live. You will quickly see that the performance is far from perfect. But that’s ok with me: this example is intended to be a reasonable starting point for doing all sorts of audio recognition on the web. Now, let’s dive into how this works. Continue reading
One of the first problems presented to students of deep learning is to classify handwritten digits in the MNIST dataset. This was recently ported to the web thanks to deeplearn.js. The web version has distinct educational advantages over the relatively dry TensorFlow tutorial. You can immediately get a feeling for the model, and start building intuition for what works and what doesn’t. Let’s preserve this interactivity, but change domains to audio. This post sets the scene for the auditory equivalent of MNIST. Rather than recognize handwritten digits, we will focus on recognizing spoken commands. We’ll do this by converting sounds like this:
Into images like this, called log-mel spectrograms, and in the next post, feed these images into the same types of models that do handwriting recognition so well:
The audio feature extraction technique I discuss here is generic enough to work for all sorts of audio, not just human speech. The rest of the post explains how. If you don’t care and just want to see the code, or play with some live demos, be my guest! Continue reading