Reading data using field name

New Altair Community Member

Nov 14, 2018

Updated Nov 5, 2024 by Jocelyn

I am read a file into RM where there is no header row, each field has the name included in the filed value.

So where a typical CSV file would be:
ice_cream ,chocolate, candy
1,4,5
6,4,2

My files looks like:
"ice_cream"="1","chocolate"="4","candy"="5"
"ice_cream"="6","chocolate"="4","candy"="2"

Various other data mining programs allow for the "retain name" function, how does one deal with this inside of RapidMiner?

The problem that I face is that these files are large, reading them in retaining the field information and replacing it later with an operator uses more than the available system memory.

Find more posts tagged with

AI Studio

Sort by:

1 - 5 of 51

MartinLiebig

Altair Employee

Nov 14, 2018

Hi @robin ,

this format looks very wired. Why is this being used? It produces a ton on overhead while storing it.

Anyway, is the ordering always the same? If yes, you can just read it as polynominals and replace.

BR,
Martin

robin

New Altair Community Member

Nov 14, 2018

Yes, it is very heavy. It makes the file enormous. So large that I am unable to read the entire file into RM for processing, just cannot get to the point of using the replace operator.

In other programs there is the ability to read this in as a field name, can one do this in RM?

robin

New Altair Community Member

Nov 14, 2018

In Linux I would use the stream editor and do:

sed 's/"ice_cream"="/g'

<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">

</context>

</operator>

</process>

</operator>

</process>

But this is a a windows machine I am working on.

MartinLiebig

Altair Employee

Nov 14, 2018

Hi,

you would need to read this in completly using Read CSV and then parse it with Replace. There is currently no version of processing a file line by line. It's not to hard to write it though.

BR,
Martin

jczogalla

New Altair Community Member

Accepted Answer

Nov 14, 2018

Hi @robin!

If the data is to be big to fit in in one go, you could try to do a more "manual" approach. As I described in this thread, you can use the text extension to split the csv files into lines and the lines into separate values. It should also be possible to then modify each cell value before it is put into an example set.

Cheers

Jan

Sort by:

1 - 1 of 11

jczogalla

New Altair Community Member

Accepted Answer

Nov 14, 2018

Hi @robin!

Cheers

Jan

View in context

Reading data using field name

Find more posts tagged with

Quick Links