Many years ago (I will not reveal my age), I began working on my PhD thesis concerning the area of Domain-Specific Languages (DSLs). Research was booming at the time and many research articles stated in their introduction that DSLs are very useful and increase productivity, by reducing lines of code etc. All these claims seemed logical to me, but I always considered them something like urban legends. We all know that they are correct, but cannot easily prove it. Keeping that on the back of my mind, I searched for a way to bring the “legend” down to measurable facts that will provide solid motivation for the importance DSLs in every day programming. I decided to do a simple experiment that measures DSL usage in open source programs. Continue reading
In the last thirty years, computing has made transformative strides and, with it, so has society. The advancements in engineering, social networking, and household technology have all been caused by advancements in computing.
Safe to say, computing is no longer an abstruse field meant for few. In an age where technology dominates society, computing skills are more important than ever. Computing relates to so many areas – engineering, science research, economics, etc. – and has tremendous potential. Continue reading
For a long time in the area of design and analysis of algorithms, when we have said that an algorithm is efficient we meant that it runs in time polynomial in the input size n and finding a linear time algorithm have been considered as the most efficient way to solve a problem. It’s been because of this assumption that we need at least to consider and read all the input to solve the problem. This way it seems that we cannot do much better! But nowadays the data sets are growing fast in various areas and applications in a way that it hardly fits in storage and in this case even linear time is prohibitive. To work with this massive amount of data, the traditional notions of an efficient algorithm is not sufficient anymore and we need to design more efficient algorithms and data structures. This encourages researchers to ask whether it is possible to solve the problems using just sublinear amount of resources? what does that mean exactly when we say ‘sublinear resources’?
We can think of sublinear algorithms in the area of big data in three different categories:
This week, the Simons Institute hosted a workshop entitled Unifying Theory and Experiment for Large-Scale Networks. The goal of the workshop is to bring together researchers involved in various large networks problems to discuss both the theoretical models and the empirical process for testing and validating them. Even further, the “unifying” in the title suggests a forum on where the ends of the spectrum may meet.
Deduplication is a critical technology for modern production and research systems. In many domains, such as cloud computing, it is often taken for granted . Deduplication magnifies the amount of data you can store in memory , on disk , and in transmission across the network . It comes at the cost of more CPU cycles, and potentially more IO operations at origin and destination storage backends. Microsoft , IBM , EMC , Riverbed , Oracle , NetApp , and other companies tout deduplication as a major feature and differentiator across the computing industry. So what exactly is deduplication?
One of the most basic philosophical questions stems from attempting to identify oneself, with the first step of proving you actually exist. René Descartes provides a proof with
Cogito ergo sum
meaning, “I think, therefore I am.” The intuition is that the mere fact of thinking forms a proof that you exist. But who or what are you exactly? What identifies you? How can we definitively prove you are what you claim to be? Who you claim to be? The problem of identity is an incredibly hard one—how do you know a letter in the mail is from the person that signed it? How do you know a text was written by the owner of a certain phone? How do you know an email comes from the person that owns an email address? This is a fundamental problem that faces the fields of computer science and cryptography, and it is incredibly hard to solve.
In a previous blog post, I discussed about the occurrence of security bugs through software evolution. In this post we will examine their existence in a large software ecosystem. To achieve this, together with four other colleagues (Vasilios Karakoidas, Georgios Gousios, Panos Louridas and Diomidis Spinellis) we used the FindBugs static analysis tool, to analyze all the projects that exist in the Maven central repository (approximately 260GB of interdependent project versions).
Most researchers differ in their workflow. For researchers in the algorithms world (or at least, those I know), the work is in the design. Our hours are spent at the blueprint stage. Algorithms are designed, improved, reformulated, or reapplied in different problems, mostly on paper. But this is unarguably only the first stage in successfully developing a new algorithm. There are still the matters of proving and testing the algorithm, and submitting the result to the public. When are we done drafting our blueprint? How do we package and ship the blueprint to the engineers and construction team?
Let’s address the more straight-forward question first. What is the best way to present an algorithm? How descriptive and specific should it be? Should it be entirely self-contained or, for instance, could we have a pointer to a “… subroutine of choice”? Is implementability more important than readability?
This week, TCS+ hosted a talk by Greg Valiant via a Google+ hangout. Valiant gave a talk on his work with brother, Paul, on an efficient estimator for entropy and support size of an unknown probability distribution requiring only O(n/log n) samples, where n is a bound on the support size of the distribution. This work diverges from the existing literature by demonstrating that the estimate can be obtained with a concrete linear program; an algorithm which outputs a distribution very similar to the unknown distribution with respect to certain statistical properties.
I like my shopping routine at the grocery store around the corner, where my cart seems to easily navigate itself through the isles. Once in a while I make adventurous purchases (the Halloween-edition beer with pumpkin aroma still awaits in my fridge), but I usually stick to the products that have already made me happy before. Whenever in a new town, I try to shop at the same chain, where I know the products and their location on the shelves.