Processing large datasets or working with huge data is a usual stage in almost any kind of research. The problem appears when processing these data requires long processing time on any kind of personal laptop. In order to solve that issue, we have at our hands the possibility of using any major compute cloud provider in order to accelerate that processing stage wasting minimal time and resources. Cloud computing allows us with a few clicks to build a computer cluster, process our data, get the results and destroy that cluster for only a few dollars.
To introduce you to Exascale computing, as well as its challenges, we interviewed the distinguished Professor Jack Dongarra (University of Tennessee), an internationally renowned expert in high-performance computing and the leading scientist behind the TOP500 (http://www.top500.org/), a list which ranks supercomputers according to their performance.
In my previous post, I briefly mentioned that if the execution order of iterations in a loop can be altered without affecting the result, it is possible to parallelize the loop. In this post, we will take a look at why this is the case, i.e., how is execution order related to parallelism. Moreover, we will see how this idea can be further exploited to optimize code for data locality, i.e., how can reordering of loop iterations result in using the same data (temporally or spatially) as much as possible, in order to efficiently utilize the memory hierarchy. Continue reading
“I really wish I had a dedicated Linux computer to run computer vision algorithms on,” said my fiancée a couple of weeks ago. If you were there you would have been blinded by the metaphorical light bulb that lit over my head. You see, just the week before, my friend and co-worker had ordered an old, decommissioned (complete with “non-classified” stickers!) Apple Xserve off of eBay for merely $40. Like my fiancée, he wanted to have a machine for a special purpose: test compilations of open source software on a big-endian architecture. I was quite envious that he was able to hack on such cool hardware for such a cheap price. But, I wasn’t yet ready to bring out my wallet. I couldn’t justify indulging a new hobby without good reason—I was stuck waiting for just the right impetus. I didn’t wait long. My fiancée’s wish became my command!
“As soon as an Analytical Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will then arise — by what course of calculation can these results be arrived at by the machine in the shortest time?”
Charles Babbage (1864)
Points to Ponder
Would it not be wonderful, if we could write all our simulations as serial programs, and parallelized code (highly optimized for any given supercomputer) would be generated automatically by the compiler? Why is this not the case today? How come supercomputing centers require teams of highly trained developers to write simulations?
Scientists around the world develop mathematical models and write simulations to understand systems in nature. In many cases, simulation performance becomes an issue either as datasets (problem size) get larger, and/or when higher accuracy is required. In order to resolve the performance issues, parallel processing resources can be utilized. Since a large number of these simulations are developed using high level tools such as Matlab, Mathematica, Octave, etc., the obvious choice for the scientist is to use the parallel processing functions provided within the tool. A case in point is the
parfor function in Matlab, which executes iterations of a for-loop in parallel. However, when an automation tool fails to parallelize a for-loop, it can be hard to understand why parallelization failed, and how one might change the code to help the tool with parallelization. Continue reading