🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

data ETL for multiple text line file

User: "Telcontar120"
New Altair Community Member
Updated by Jocelyn

I have a data extract from a system that is definitely not data-mining friendly.  Essentially the data is coming in a csv containing text blocks (paragraphs) where each block is an example and each line is an attribute, and the same relative line in each block is the same attribute, like this:

line 1

line 2

line 3

 

line 4

line 5

line 6

...

where line 1 and line 4 are the same attribute, line 2 & line 5, line 3 & line 6, etc.  A sample file is attached--the format is accurate but the data have been anonymized.  Each line may contain spaces or other special characters.  A blank line separates each block from the others.  The blocks all have the same number of lines so at least there are no missings to worry about.

 

I'd like to turn this into a typical exampleset where each block is an example with the relevant attributes, but I'm struggling to do the necessary conversions.  Any ideas or help from the community?  Thanks in advance!

 

 

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "BalazsBaranyRM"
    New Altair Community Member
    Accepted Answer

    Hi,

     

    I don't have a complete solution but I've done similar stuff before.

     

    You might want to read the file with Read CSV making sure that the entire lines end up in the result (e. g. separator = $, no header line).

    Then add an ID with Generate ID.

    Calculate the modulo so you have the same number for the same attribute.

    Pivot by the calculated modulo attribute.

     

    Regards,

     

    Balázs