Run loops concurrently using parfor loops
What is a for loop?
For loops are used to perform the same set of statements repeatedly inside a script. These loops may contain any sort of commands, which are executed a set number of times. The number of executions is determined by the number of elements present in the matrix given to it in its first line. Let’s look at an example:
for i = 1:10
disp(i)
end
If we run this, very simple, for loop in Compose, the command window will echo this:
1
2
3
4
5
6
7
8
9
10
That is, i is taking values from 1 to 10 (given in the matrix 1:10), and the loop is executing its statements 10 times (disp prints its input in the command window).
The more statements a loop has (and the more computationally demanding they are), each iteration is going to take longer.
How to run for loops concurrently
Compose features parallel loop execution using parfor loops. These work in the same way as conventional for loops but are executed at the same time in different cores of your computer. Thus, your script will take less time to finish. Let’s look at an (extremely simplified) example to highlight the difference in execution time between conventional and parallel loops:
First, let’s define the number of chunks into which our parallel loop will divide, which is done using parcluster. The number of chunks executed at the same time depends on your computer configuration (e.g., if your machine only has two cores, setting parcluster(64) will cause the loop to be executed in 64 chunks, but only two of those will occur simultaneously).
close all, clear, clc
% Number of chunks in which loop space will be divided
parcluster(8);
Next, let’s declare relevant variables: N will contain different number of iterations, from 10 up to 50 million, for comparing how long it takes for the loops to finish. t1 and t2 (as pre-allocation is always a good practice, they are initialized using zeros) will be useful for storing the seconds elapsed each time we compare.
% Matrix of iterations for each comparison
N = [10 0.5e2 1e2 0.5e3 1e3 0.5e4 1e4 0.5e5 1e5 0.5e6 1e6 0.5e7 1e7 0.5e8];
A = 500;
% Matrices for storing time ellapsed
t1 = zeros(1, length(N));
t2 = zeros(1, length(N));
Now it’s time to perform some computations, both with for and parfor loops, and measure the time it takes for each to finish. For that, we use tic and toc (calls to toc return the elapsed time since tic was called).
% Compare for and parfor loops for each number of iterations in N
for j = length(N)
% Current number of iterations to perform
n = N(j)
% Use tic & toc for recording time elapsed
tic;
% Perform some mock computations serially
for i = 1:n
a(i) = rand(1,A) * rand(A,1);
b(i) = rank(a(i));
c(i) = max(a(i) * b(i));
end
t1(j) = toc
tic;
% Perform some mock computations parallelly
parfor i = 1:n
a(i) = rand(1,A) * rand(A,1);
b(i) = rank(a(i));
c(i) = max(a(i) * b(i));
end
t2(j) = toc
end
Finally, we plot both t1 and t2 for comparison and format the plot appropriately.
% Create plot and format
semilogx(N, t1, 's--b');
hold on
semilogx(N, t2, 's--g');
title('Conventional vs. parallel for loops');
xlabel('Number of iterations');
ylabel('Time elapsed (seconds)');
legend('Conventional for loop', 'Parallel for loop', 'location', 'northwest');
ax = gca;
set(ax, 'xtick', 1);
grid on;
Let’s have a look at the results:
We can see that the higher the number of iterations, the higher the difference in time elapsed. In the last run, parallel processing results in a 30% time reduction, and let’s keep in mind here we are just performing 3 arbitrary operations I chose randomly, so this difference could be further increased.
Nevertheless, zooming a lot into the first iterations, we can see that initially conventional for loops were faster, and this is a huge insight, because using parfor loops in every situation may bring undesired results. Let’s talk about some situations in which you should avoid parfor loops.
Limitations of parfor loops
Some key differences must be considered before deciding whether to use parfor loops or not, as sometimes it’s better (or required) to execute our loops serially. These are some examples:
- Nothing inside a parfor loop can be echoed or displayed in the command window.
- If you will assign values to a matrix, pre-allocate first, as it must be sliced accordingly before executing in parallel.
- Variables declared inside a parfor loop only exist within it. Pre-allocation fixes this as well.
- Global variables in a parfor loop are not well-defined since multiple loops may be modifying the variable concurrently.
- Any function that accesses a shared resource (like many file reading/writing functions) shouldn’t be used.
Conclusions
In short, don’t use parfor loops just because. Best-case scenario, they will be slower if you are handling a simple loop and not many iterations. Worst-case scenario, you will get weird behavior and errors. Nevertheless, it is an extremely valuable tool to have in your arsenal for script more efficiently. Try them for yourself!