Hi, I am a newbie so apologies in advance if I'm missing something obvious.
I am working on a binary classifier for use with a large synthetic dataset for credit card fraud which I have split and sampled into a training and testing dataset, both with balanced classes, 1000 of each. However, there seems to be something up somewhere along the line. The full dataset with 6.3M examples occupies 538MB. However, my training and test datasets are taking up 95.3MB and should only be a tiny fraction of this size. They also behave like 100MB files, taking ages up to open up etc. Training dataset caused AM to crash. Can somebody tell me where I am going wrong please? TIA Ray.

