[SOLVED] Pivoting several lines into one single field
MacPhotoBiker
New Altair Community Member
Hi guys,
this is my first post in this forum, I hope it´s on the right spot.
I´m using currently RapidMiner mainly for ETL processes, and it´s truly impressive what RM is capable of. I´m now looking for a particluar feature which I could not find, and I´d appreciate somebody could put me on the right track. Let me make a very simple example.
Let´s say there are currently two columns, Column 1 shows several cheque numbers, column two shows which invoices were paid with each cheque. A cheque can pay one or several invoices (i. e. a typical payment allocation table).
CHEQUE INVOICE_PAID
C1 Inv1
C1 Inv2
C2 Inv3
C3 Inv4
C3 Inv5
C3 Inv6
What I try to achieve is a table with only two columns, where the first column is grouped by Cheque number, and the second column contains ALL invoices paid by a particular cheque:
CHEQUE INVOICES_PAID
C1 Inv1, Inv2
C2 Inv3
C3 Inv4, Inv5, Inv6
It´s very important for me that all paid invoices are contained in one single field per record because I want to use it on a report (a statement, to be precise). Unfortunately, I wasn´t able to figure it out myself.
Any help would be greatly appreciated.
this is my first post in this forum, I hope it´s on the right spot.
I´m using currently RapidMiner mainly for ETL processes, and it´s truly impressive what RM is capable of. I´m now looking for a particluar feature which I could not find, and I´d appreciate somebody could put me on the right track. Let me make a very simple example.
Let´s say there are currently two columns, Column 1 shows several cheque numbers, column two shows which invoices were paid with each cheque. A cheque can pay one or several invoices (i. e. a typical payment allocation table).
CHEQUE INVOICE_PAID
C1 Inv1
C1 Inv2
C2 Inv3
C3 Inv4
C3 Inv5
C3 Inv6
What I try to achieve is a table with only two columns, where the first column is grouped by Cheque number, and the second column contains ALL invoices paid by a particular cheque:
CHEQUE INVOICES_PAID
C1 Inv1, Inv2
C2 Inv3
C3 Inv4, Inv5, Inv6
It´s very important for me that all paid invoices are contained in one single field per record because I want to use it on a report (a statement, to be precise). Unfortunately, I wasn´t able to figure it out myself.
Any help would be greatly appreciated.
Tagged:
0
Answers
-
Anyone?0
-
Hello
Something like this would work...
Loop values for the cheque attribute
Inside the loop, filter examples where cheque = the current value
Create a macro containing null
Start another loop for all filtered examples
Inside this loop update the macro with the value of the invoice attribute for the current example
Regards,
Andrew0 -
Hi Andrew,
thank you very much for your answer.
Frankly, something really hilarious just happened! Based on your answer I wanted to comment that I believe this functionality should be built into the "aggregate" functionality. I wanted to use the correct terminology, so I looked up the operator in RapidMiner, and - shame on me - I actually figured that it is already there, it´s the aggregation function "concatenation".
That´s precisely what I was looking for, and SO simple!
Again Andrew, I do thank you for your answer, after all you pointed me in the right direction, and you also gave me a glimpse of a more flexible approach. (Not that I could do that (yet), but down the road... who knows? )0 -
Hello MacPhotoBiker
How cool - I've learnt something new too.
regards
Andrew0 -
Thanks for that solution MacPhotoBiker.
Did you get the " |" between your values? For some reason I've got those characters and I can't get rid of them. Any ideas?
Cheers,0 -
Hi rowan.g,
"|" is simply a field delimiter, and RM decided to choose one that usually does not occur in any other context, which is a very good thing in order to keep the different values (Inv1, Inv2 etc. in my example) clearly separated.
It's very easy to change those, simply do the following:
- After the aggregate function, add a "Generate Attribute" operator
- "Attribute name" is the field you just generated, e.g. "MyListInOneField"
- "function expression" is "replace(MyListInOneField","|",",")
This replaces all occurrences of "|" with "," in the field "MyListInOneField".
In my example, I want one field that reads "Inv. # 4, 5, 6", here's how I do that:
- My initial "ListOfInvoices" is "4|5|6" -> replace(ListOfInvoices,"|",", ") -> "4, 5, 6"
- To add "Inv. # " I use the "concat" function: concat("Inv. # ", replace(ListOfInvoices,"|",", ")) -> "Inv. # 4, 5, 6"
That did the trick for me, I hope it helps.0 -
Thanks MacPhotoBiker. Worked perfectly.0
-
Glad to hear that it worked.0