"Decision Tree on a huge sparse dataset"

New Altair Community Member

Jan 5, 2013

Updated Nov 5, 2024 by Jocelyn

Hi,

I have very sparse dataset with huge number of attributes (~12 K features and 700K records) I can not fit it in memory (attribute values are binomial i.e. True/False) ,

As it is sparse I keep the dataset in (ID , Feature) format, so for example I would have the following records :
(ID , Feature)
(110 , d_0022)
(110 , d_2393)
(110 , i_2293)
(822 , d_933)
(822 , p_2003)
....

So we would have three attributes with true value (d_0022 ; 2_2393 ; i_2293) for the record with ID : 110 and the rest are false (attributes are all distinct values of the attribute "feature")

Is it possible to train decision tree while not making the whole dataset first ?

Thanks

Find more posts tagged with

AI Studio

Decision Tree

Sort by:

1 - 1 of 11

MariusHelf

New Altair Community Member

Jan 7, 2013

No, it's not possible to train directly on the de-pivoted data. You'll have to use the Pivot operator to create a row-based format. If your data is sparse, however, you can try to set the datamanagement to double_sparse_array to save memory.

Best regards,
Marius

"Decision Tree on a huge sparse dataset"

Find more posts tagged with

Quick Links