data ETL for multiple text line file

New Altair Community Member

Jun 6, 2017

Updated Nov 5, 2024 by Jocelyn

I have a data extract from a system that is definitely not data-mining friendly. Essentially the data is coming in a csv containing text blocks (paragraphs) where each block is an example and each line is an attribute, and the same relative line in each block is the same attribute, like this:

line 1

line 2

line 3

line 4

line 5

line 6

...

where line 1 and line 4 are the same attribute, line 2 & line 5, line 3 & line 6, etc. A sample file is attached--the format is accurate but the data have been anonymized. Each line may contain spaces or other special characters. A blank line separates each block from the others. The blocks all have the same number of lines so at least there are no missings to worry about.

I'd like to turn this into a typical exampleset where each block is an example with the relevant attributes, but I'm struggling to do the necessary conversions. Any ideas or help from the community? Thanks in advance!