About Fahad Khalid

Fahad Khalid is a Ph.D. student with the HPI Research School on Service Oriented Systems Engineering, and member of the Operating Systems and Middleware Group. His research is in the area of programming models for Hybrid Parallel Computing. He focuses on devising strategies and developing frameworks that make it possible to balance the productivity vs. performance tradeoff inherent in programming massively parallel accelerator architectures.

Parallel Programming through Dependence Analysis – Part II

In my previous post, I briefly mentioned that if the execution order of iterations in a loop can be altered without affecting the result, it is possible to parallelize the loop. In this post, we will take a look at why this is the case, i.e., how is execution order related to parallelism. Moreover, we will see how this idea can be further exploited to optimize code for data locality, i.e., how can reordering of loop iterations result in using the same data (temporally or spatially) as much as possible, in order to efficiently utilize the memory hierarchy. Continue reading

Parallel Programming through Dependence Analysis – Part I

 “As soon as an Analytical Engine exists, it will necessarily guide the future course of the science. Whenever any result is sought by its aid, the question will then arise — by what course of calculation can these results be arrived at by the machine in the shortest time?”

Charles Babbage (1864)

Points to Ponder

Would it not be wonderful, if we could write all our simulations as serial programs, and parallelized code (highly optimized for any given supercomputer) would be generated automatically by the compiler? Why is this not the case today? How come supercomputing centers require teams of highly trained developers to write simulations?


Scientists around the world develop mathematical models and write simulations to understand systems in nature. In many cases, simulation performance becomes an issue either as datasets (problem size) get larger, and/or when higher accuracy is required. In order to resolve the performance issues, parallel processing resources can be utilized. Since a large number of these simulations are developed using high level tools such as Matlab, Mathematica, Octave, etc., the obvious choice for the scientist is to use the parallel processing functions provided within the tool. A case in point is the parfor function in Matlab, which executes iterations of a for-loop in parallel. However, when an automation tool fails to parallelize a for-loop, it can be hard to understand why parallelization failed, and how one might change the code to help the tool with parallelization. Continue reading