Is it possible to speed up a Python script that calculates mixing index?
Hi community,
At the moment, I am running an adjusted version of the Python script uploaded by Stefan Pantaleev "Script for computing the temporal evolution of the Lacey mixing index for a binary mixture of particles in EDEM". My goal with exploiting the adjusted Python script is to calculate the Relative Standard Deviation (RSD) for a double shaft paddle mixer, with a grid size yet to be determined. A total number of approx. 3 million spherical particles is present in the system with a diameter of 5mm and 15mm. Simulation time takes around 30 seconds with a target save interval of 0.2.
As running the script takes quite an amount of time (order of hours), I am wondering if it is possible to speed up the Python script. And if so, in what way could I do so? I am aware of one possible solution; reducing target save interval to minimize data points considered.
Thanks in advance!
Answers
-
Hi Jeroen,
Python is not parallelised by default so all python scripts run on a single CPU thread by default. One thing you can do to signifficantly speed-up any python script is to parallelise it. The simplest option is to convert a loop into a pipeline of parallel jobs using multiprocess (https://docs.python.org/3/library/multiprocessing.html) or joblib (https://joblib.readthedocs.io/en/latest/).
You can also try and export the binned data using a grid bin group in EDEM and do the calculation on the exported data - it might be faster than binning in Python because the internal EDEM data querieng and export is parallelised and C++ based.
Best regards,
Stefan
1 -
Hi Jeroen,
It should be possible to speed up the process a bit.
The main culprit will be the for-loop in the binning function - if you can remove this you may improve performance a little bit. A possible solution is to use a multi-dimensional histogram such as SciPy binned statistic function (https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.binned_statistic_dd.html). This may allow you to run things quite a bit faster and I've found good performance gains with it on other tasks before.
Another package I really like is bottleneck which is a drop in replacement for some NumPy function such as sum or mean. On large arrays, it can often do this calculation several times faster so there may be a small gain to be made here. this may speed up both the binning and RSD functions.
Also, the binning functions is likely to be a good candidate for Numba optimisation which may make things substantially quicker. Numba will both JiT compile you code and allow some parallelisation to happen so worth exploring Numba too. If you are brave, try dask.
Best option is to try and remove the loop, and then look for the other smaller gains. Numba could be a huge difference, but also it could easily be very small gains. Hopefully you can get that run time down to minutes rather than hours - I think it should be possible.
Good luck!
JP
1