🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Count examples in overlapping time frame

User: "DocMusher"
New Altair Community Member
Updated by Jocelyn
Hi RM friends,
My data set consists of ID, a time of admission, a time of discharge. I would like  to calculate the number of examples (patients in casu) that are simultaneously admitted. As such a new attribute should provide me the number of patients admitted for each patient (example) during his/her admission time frame. 
Thanks a lot!!!
Sven

Find more posts tagged with

Sort by:
1 - 4 of 41
    User: "lionelderkrikor"
    New Altair Community Member
    Accepted Answer
    Hi Sven,

    Here a solution using a Python script : 
    First, at the beginning of the python script, you have to set : 
    - the names of your attributes
     - the date format

    As a result, you obtain an exampleset like that : 


    The process is in attached file.

    Hope this helps,

    Regards,

    Lionel




     


    User: "lionelderkrikor"
    New Altair Community Member
    Accepted Answer
    You're welcome, Sven.

    Keep us informed of the results of your work ! (it it's not confidential of course)

    Regards,

    Lionel
    User: "lionelderkrikor"
    New Altair Community Member
    Accepted Answer
    Hi Sven,

    Sven, to be honest, I am pessimist.... Here the results of my investigations and experimentations : 
    I timed the process for 100 examples : the duration is 10 seconds.(my PC = quad-core / 16 Go RAM)
    Your whole dataset has around 69000 examples, so I would say in first approximation
    that the duration for the whole dataset is (69000*10)/100 = 6900 seconds = around 2 hours. ==> This is obviously not the case.
    So I suspect the complexity of this algorithm to be proportional to NxN (and not proportional to N) where N is the number of your examples. In this case, the time duration will be (for the whole dataset)  :
    10s x 690 x 690 = 4 761 000 seconds = 1322, 5 hours = 55 days !!!! 
    That 's why the process duration is so long...
    Moreover I remember of a thread where an user observed that the execution of a  Python script inside RapidMiner is significantly slower than the same script in a Python Notebook. ==> So I will try to execute the script directly in a Python notebook to see if there is an acceleration of the execution.
    And finally to answer to your question, the algorithm needs  to go through the entire dataset to find the overlaps, thus, from my point of view, it is impossible to run the process by steps...

    Regards,

    Lionel





    User: "BalazsBaranyRM"
    New Altair Community Member
    Accepted Answer
    Hi,

    this is a problem that you can reduce by executing in batches if you can sort and separate your data.

    For example, it doesn't make sense to compare records from different years (unless some patients are in care for years). 

    Instead of making n^2 comparisons (for a large n), you could make 10 joins of (n / 10)^2, which can be some orders of magnitude faster.

    Regards,
    Balázs