DBSCAN taking very long time

moritz_moeller
moritz_moeller New Altair Community Member
edited November 5 in Community Q&A
Hello there,

I am currently trying to do a cluster analysis with DBSCAN. Since it is my first time to either do a clusteranalysis or using DBSCAN I only have knowledge from papers and online documents. But maybe someone of you is able to help me out:

I am analyzing a kind of huge amount of data (I know it's relative). It's 10 columns and around 6 million rows. I am selecting attributes, filter them, normalize and then put them into the dbscan clustering. My parameters are epsilon=0.5 and minpts=4. I want to look at 2 attributes at a time since I'll compare it to k-means.

But the problem is that it already takes over an hour to preprocess the data (there is the loading circle on the clustering part) before it even starts to go from 1 to 100. Is there anything I can change in my process that would maybe make it faster? Perhaps there are some beginner mistakes involved which is quite likely..

Thanks for your answers and have a nice day.

EDIT: I have 64GB of RAM and the process uses around 32GB at the moment. I put the maximum to 50GB. In addition I can say that I only have numeric attributes

Best Answer

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi Moritz,
    i guess 6M rows are just a lot for this.. If i remember correctly the runtime is in O(n²).

    BR,
    martin

Answers

  • MartinLiebig
    MartinLiebig
    Altair Employee
    Answer ✓
    Hi Moritz,
    i guess 6M rows are just a lot for this.. If i remember correctly the runtime is in O(n²).

    BR,
    martin
  • moritz_moeller
    moritz_moeller New Altair Community Member
    Well it seems like you're correct. I am working with only a range of my rows now and the runtime is fairly lower.

    Thanks for the answer, I assume that this is the correct one.