tackle large Files

choose_username
choose_username New Altair Community Member
edited November 5 in Community Q&A
Hello all,

i have a large Data set (15 Attributes and almost 50.000 records). The Problem is : For example if a use the Operator Detect Outlier, RapidMiner need a very long time to perform it. Is there  a Solution to this (I mean without using a different Computer)? Or do i need to look for a new Data set ?


Thanks in advance

User
Tagged:

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hello,

    well, there is no general answer for this. There simply exist some algorithms which have long runtimes (like neural networks, relevance vector machine and - as far as it seems - also the outlier detection operator). In contrast to other data mining solutions, RapidMiner does not remove such algorithms since they work quite well on smaller data sets (or faster machines  ;) ). Actually, there is not much you can do beside
    • using only a sample of the data
    • trying different schemes or different approaches for you problem, in this case for outlier detection
    • check if the algorithm is available in a parallel working mode and use more than one CPU core only
    • inspect the source code and check if it can be optimized / parallelized which we are than happy to include into RapidMiner if you allow this
    Cheers,
    Ingo
  • choose_username
    choose_username New Altair Community Member
    thank u for ur fast answer  :).  i think i will look for another Data set.

    greetings

    user