Natural Language Understanding : Let’s Play Dumb

What is the meaning of the word understanding? This was a question posed during  a particularly enlightening lecture given by Dr. Anupam Basu, a professor with the  Department of Computer Science Engineering at IIT Kharagpur, India.

Understanding something probably relates to being able to answer questions based on it, maybe form an image or a flow chart in your head. If you can make another human being comprehend the concept with the least amount of effort, well that means you do truly understand what you are talking about. But what about a computer? How does it understand? Continue reading

Notes from IIT Kharagpur ACM Summer School on ML and NLP

IIT KGP CAMPUS

Entrance to library and academic area.

[This entry has been edited for clarity. An example given discussing the similarity of words in French and English was incorrect. The following sentence has been removed: “The next question addressed by Bhattacharya was the ambiguity that may arise in languages with similar origins, for example in French ‘magazine’ actually means shop while in English, well it is a magazine.”]

Today is June 14th, so I am 14 days into summer school; 7 more days left, and we are all already feeling saddened by the idea of leaving Kharagpur soon. In India, an IIT is a dream for 90% of the 12th graders who join IIT coaching classes. The competition is high so not everyone gets in. I’m one of those who didn’t get in. So when I saw there was an ACM Summer School opportunity at the largest and oldest IIT in India, obviously I grabbed it. By sheer luck, I was selected to actually attend the school. Over the course of 21 days, we have been tasked to learn about machine learning and natural language processing. Continue reading

How 1 Million App Calls can Tell you a Bit About Malware – Part 2

In my previous blog post, I described some of my findings regarding malicious mobile apps. In summary, I observed that there are POSIX abstractions, which are popular only for malicious apps. The findings were derived from a study that I did with some colleagues on POSIX (Portable Operating System Interface) abstractions. Recall that, a part of our study involved the examination of the POSIX calls that are used by both benign Android applications (~1 million) coming from the Google Play Store, and malicious Android applications (about 1260 of them) taken from a well-known dataset, which you can download from here.

Figure 1: Potentially Malicious Apps. The identification was based on an SVM Model.

Figure 1: Potentially Malicious Apps. The identification was based on an SVM Model.

Table 2: Indicative potentially malicious apps classified by the SVM model. These apps were identified as malicious by more than 15 antiviruses.

Table 2: Indicative potentially malicious apps classified by the SVM model. These apps were identified as malicious by more than 15 antiviruses.

We performed a further analysis on these results to check if we can create a more robust filter to detect malicious apps, than the simple filter described in my previous post (recall that this filter was based on the three most unpopular abstractions among benign applications and at the same time popular among malicious ones). Our attempt involved the following: we fed a set of benign apps (the 500 most popular apps of the Google Store) and the aforementioned dataset of the malicious apps, to an SVM (Support Vector Machine), a binary classifier that builds a model based on given features (abstractions in our case) to separate the two cases. In this way the classifier can classify a new app as malicious or not. By using the model on the same set of apps that we examined in the previous case, 1283 apps were identified as suspicious. Based on the antiviruses provided by the VirusTotal website again, we found that from these apps, 232 (18%) are potentially malicious. Even if the approach seems less robust than the previous one, Figure 1, illustrates that there are more cases of apps that were indicated as malicious by more than one antivirus. Table 1, presents applications that were filtered out by the SVM model, and were identified as malicious by more than 15 antiviruses.

Figure 2: Potentially Malicious Apps. The identification was based on the obfuscated libraries.

Figure 2: Potentially Malicious Apps. The identification was based on the obfuscated libraries.

Table 2: Indicative potentially malicious apps containing obfuscated libraries. These apps were identified as malicious by more than 22 antiviruses.

Table 2: Indicative potentially malicious apps containing obfuscated libraries. These apps were identified as malicious by more than 22 antiviruses.

Through our experiments, we came across a number of Android apps that included obfuscated libraries (991 apps in total). Given the fact that obfuscation techniques have been extensively encountered while analyzing Android malware, we decided to examine all the apps that contained such libraries by using the 54 antiviruses of the VirusTotal website. Surprisingly, almost half of the apps (481 in total — 48.53%) were classified as suspicious. An interesting observation is that the majority of these apps were indicated as potentially malicious by a large number of antiviruses — see Figure 2. Table 2, presents indicative apps that were identified as malicious by more than 22 antiviruses.

As it is clear, a malware detector cannot be based solely on observations like the aforementioned ones. However, such findings could be useful for the development of complex filters that can help find malicious software.

Top 3 Winning Articles of the “The Time is Write 2.0” Competition

Here are the three winners of our The Time is Write 2.0 competition! You can read the three articles below, but first: congrats to Dipika Rajesh, Aditi Balaji and Pratyush Singh.

The Time is Write is an article writing competition that encourages all the aspiring writers to lay out their thoughts in writin and to share them on a global platform. This year, participants had to write a short article on the topic “Your Dream Software: Revolutionize the future”, about what their idea of a perfect software might be in order to revolutionize a particular field.

Continue reading

The reign and modern challenges of the Message Passing Interface (MPI): A discussion with Dr. Torsten Hoefler.

A few years ago, while I was a graduate student in Greece, I was preparing  slides for my talk at the SIAM Parallel Processing 2012 conference. While showing my slides to one of my colleagues, one of his comments was: “All good, but why do you guys doing numerical linear algebra and parallel computing always use the Message Passing Interface to communicate between the processors?”. Having read* the book review of Beresford Parlett in [1], I did have the wit to imitate Marvin Minsky and reply “Is there any other way?”. Nowadays, this question is even more interesting, and my answer would certainly be longer (perhaps too long!). Execution of programs in distributed computing environments requires communication between the processors. It is then natural to consider by what protocols and guidelines should the processors communicate with each other? This is the question to which the Message Passing Interface (MPI) has been the answer for more than 25 years.

Continue reading