All PhD candidates around the world know about the thesis. You always knew about the thesis. It marks the beginning of the end for your career as a PhD and if you actually do it, you can have that cool “Dr.” title that you always wanted in your business card. What is the problem then? Why it seems so frustrating when you are sitting down to do it? The following is based on a true story, actually my story. How I managed to write it down and track my progress. Continue reading
As a follow-up to my previous post on the discussion of where theory and experimentation meet in the study of large-scale networks, I would like to discuss in more detail one of the empirically best-performing algorithms which also has a sound theoretical background: spectral partitioning. In this post I will examine the history of the problem, outline some key results, and present some future ideas for the problem. Continue reading
As a PhD student who does research on theory and algorithms for massive data analysis, I am interested in exploring current and future challenges in this area, which I’d like to share it here. There are two major points of view when we talk about big data problems:
One is more focused on industry and business aspects of big data, and includes many IT companies who work on analytics. These companies believe that the potential of big data lies in its ability to solve business problems and provide new business opportunities. To get the most from big data investments, they focus on questions which companies would like to answer. They view big data not as a technological problem but as a business solution, and their main goals are to visualize, explore, discover and predict.
For a long time in the area of design and analysis of algorithms, when we have said that an algorithm is efficient we meant that it runs in time polynomial in the input size n and finding a linear time algorithm have been considered as the most efficient way to solve a problem. It’s been because of this assumption that we need at least to consider and read all the input to solve the problem. This way it seems that we cannot do much better! But nowadays the data sets are growing fast in various areas and applications in a way that it hardly fits in storage and in this case even linear time is prohibitive. To work with this massive amount of data, the traditional notions of an efficient algorithm is not sufficient anymore and we need to design more efficient algorithms and data structures. This encourages researchers to ask whether it is possible to solve the problems using just sublinear amount of resources? what does that mean exactly when we say ‘sublinear resources’?
We can think of sublinear algorithms in the area of big data in three different categories:
This week, the Simons Institute hosted a workshop entitled Unifying Theory and Experiment for Large-Scale Networks. The goal of the workshop is to bring together researchers involved in various large networks problems to discuss both the theoretical models and the empirical process for testing and validating them. Even further, the “unifying” in the title suggests a forum on where the ends of the spectrum may meet.