🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Working with large datasets

User: "MuehliMan"
New Altair Community Member
Updated by Jocelyn
Dear all,

I am working with a really large dataset (with >2,6 million examples, ~25 attributes, 1 polynominal ID).
After renaming some attributes and generating a basic mathematical calulation with another attribute, I wanted to apply a model on the predict those large set with the model. Unfortunately, it always crashed havin exceed memory limit. Even when I split them in subsets of 1 million examples this happens.

So my questions:

- Is there a smarter way to store those data (short array or some other options)?
- Would it be better to convert the ID into interger values?
- Interestingly, the workflow crashes when using materialize data and/or free memory.

Could you give me some tips, working with larger datasets?

Cheers,
Markus

Find more posts tagged with