importing data with null values
Legacy User
New Altair Community Member
Is there a way to replace null values, or at least reject lines with nulls, during import?
I am trying to import a file with scattered missing values and I can only import up to the first omission.
The example for dealing with missing data I found in the tutorial has '?' in the data file for missing values. My data has nothing; here is an example of my data: the 1st & 3rd lines are complete, the 2nd line is missing the 1st & last columns.
N282WN,WN,978,91,91,1525,1630,65,308,2,-1
,WN,1114,91,91,1850,1955,65,308,2,
N207WN,WN,1182,91,91,1405,1510,65,308,2,-1
This is the error I get:
[Error] Data format error in line 393: the line does not provide the expected number of columns (was: 10, expected: 11)! Stop reading...
Thanks much!!
I am trying to import a file with scattered missing values and I can only import up to the first omission.
The example for dealing with missing data I found in the tutorial has '?' in the data file for missing values. My data has nothing; here is an example of my data: the 1st & 3rd lines are complete, the 2nd line is missing the 1st & last columns.
N282WN,WN,978,91,91,1525,1630,65,308,2,-1
,WN,1114,91,91,1850,1955,65,308,2,
N207WN,WN,1182,91,91,1405,1510,65,308,2,-1
This is the error I get:
[Error] Data format error in line 393: the line does not provide the expected number of columns (was: 10, expected: 11)! Stop reading...
Thanks much!!
Tagged:
0
Answers
-
Hello b2
I copied your data into a simple text-file and loaded it with the operator "SimpleExampleSource" default settings using RapidMiner 4.2. I had no problems, the operator recognized all missing values.
idea: maybe the line 393 of your data is corrupted, e.g. a comma is missing.
hope this was helpful
Steffen0 -
Steffen,
Thank you for your help.
There are no missing commas. Could it have to do with the fact that one of the missing fields is at the end or beginning of the line? Is there an option I need to set?
I am using version community 4.1
I tried duplicating what you did. I switched from ExampleSource to SimpleExampleSource and copied the input data back off this post into a new file. I got a similar error. This is the error:
Error in: SimpleExampleSource (SimpleExampleSource) Could not read file ...\twig.txt': Number of columns in line 1 was unexpected, was: 10, expected: 11
0 -
Hello b2
Maybe it depends on the version. I remember something like this but I am not sure....
Is there any specific reason you cannot switch to 4.2 ?
greetings
Steffen0 -
Hi Steffen,
You have all what is value replenishment, either replacing "unknown" values in metadata by a constant (typically zero), or by the attribute's mean. You have more sophisticated approaches where a learner trained on complete values is used to guess missing values, but I have never been able to understand how the operator works and is organized. You can use "Sparse array management" option in your (file/database)ExampleSource if needed.
This item could be a good wiki article in "data formats" ;D
Cheers,
Jean-Charles.0 -
Hello Jean-Charles
Yes, but not during import.jean-charles wrote:
You have all what is value replenishment, either replacing "unknown" values in metadata by a constant (typically zero), or by the attribute's mean. You have more sophisticated approaches where a learner trained on complete values is used to guess missing values, but I have never been able to understand how the operator works and is organized.
Why ? As far I as see, Sparse Data Format is for data wiith a lot of missing values or a small number of different values (for efficient storage).jean-charles wrote:
You can use "Sparse array management" option in your (file/database)ExampleSource if needed.
True, true... :-[
This item could be a good wiki article in "data formats" ;D
greetings
Steffen0 -
Hi all,
actually there was a bug in versions < 4.2 for reading CSV-like data with missing values at the end of lines. The new version 4.2 which is available now on our web site does no longer contain this bug and everything should work fine as Steffen has pointed out. So I would suggest to upgrade to RM 4.2.
Cheers,
Ingo0 -
Thank you all very much for your help.
I have upgraded to 4.2 and the same error occurs. I have found that it happens when I have missing integer-type data, but not when I have missing nominal-type data. I am beginning to think this may be a follow-on to the bug in version 4.1.
Is there a way to have the import skip incomplete lines?
thank you.0 -
ExampleSource was giving me trouble.
CSVExampleSource works fine.0 -
Hi again,
maybe it would have worked with the ExampleSource operator, too (both operators are basically the same but with different parameter settings), so it might have something to do with quoting, line trimming, or the column separation parameter. However: good to hear it works now ;D
Cheers,
Ingo0