Reading non-standard data files structures - pls help
I am evaluating RapidMiner as a solution to performing research and applicatioin prototyping. It's important to have an easy way to import data easily and manipulate it into the structure I need it before storing the result to a DB - I need to create this capability to work repetetively for many files.
However, I have hit an early block, as although I can read in data from a file containing a standard table, I hit issues if the file contains a slightly different structure. Is there a straightforward way to read in csv and excel data when the header structure is either not standard or even repeats (e.g. multiple data sets in one file appended one after another)
I have provided one example of one of the data files below, in which one of the columns is time, however there is no date column as the date is instead stored as meta data in the top of the file. I need to add the date to the time to create a date-time column but I can't find a straightforward way to read in the different parts of the data file - meta data and column data - separately and consequently perform the data transformation to create a new table to store to the DB.
Any advice would be welcome.
Thanks
Roger
ABC | Aircraft Registration | ||||
XYZ | Nose Number | ||||
123 | Flight Number | ||||
CDE | Departure Station | ||||
FGH | Destination Station | ||||
31.10.2014 | Date | ||||
Offset | AIR/GROUND | GMT (HH:MM:SS) | PRESENT POSN LATITUDE (DEG) | PRESENT POSN LONGITUDE (DEG) | ALTITUDE (FEET) |
785 | GROUND | 18:02:44 | 11.97018 | -8.92304 | 874 |
845 | GROUND | 18:12:44 | 21.9698 | -7.9315 | 881 |
905 | GROUND | 18:22:43 | 31.96881 | -6.93081 | 892 |
Answers
-
-
Yes, you can comment out the repeating header lines in the Read CSV or XLS wizard. I do this all the time with NOAA weather data. The Read CSV operator is like the swiss army knife of data loaders in RapidMiner, it can handle many other different file formats and encoding too. I used to read in txt files too.
0