"Decision Tree on a huge sparse dataset"

aryan_hosseinza
aryan_hosseinza New Altair Community Member
edited November 2024 in Community Q&A
Hi,

I have very sparse dataset with huge number of attributes (~12 K features and 700K records) I can not fit it in memory (attribute values are binomial i.e. True/False) ,

As it is sparse I keep the dataset in (ID , Feature) format, so for example I would have the following records :
(ID , Feature)
(110 , d_0022)
(110 , d_2393)
(110 , i_2293)
(822 , d_933)
(822 , p_2003)
....

So we would have three attributes with true value (d_0022 ; 2_2393 ; i_2293) for the record with ID : 110 and the rest are false (attributes are all distinct values of the attribute "feature")

Is it possible to train decision tree while not making the whole dataset first ?

Thanks

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    No, it's not possible to train directly on the de-pivoted data. You'll have to use the Pivot operator to create a row-based format. If your data is sparse, however, you can try to set the datamanagement to double_sparse_array to save memory.

    Best regards,
    Marius

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.