Concatenate Values

kevin_m
kevin_m New Altair Community Member
edited November 2024 in Community Q&A
Gday,

I am missing the concatenate function in Radoop like in Rapidminer Studio. In Studio I can concat grouped Values with the Aggregate Operator. In Radoop is this function not available. How can I illustrate this in Radoop?

Thanks in advance!
Kevin

Best Answer

  • phellinger
    phellinger New Altair Community Member
    Answer ✓

    Hi Kevin,

     

    true, the Radoop Aggregate does not have a built-in concatenate option currently.

     

    You can use the Hive Script operator in this case with a modified version of the following script that demonstrates group concatenation on the Golf sample dataset (from the Sample repository), explanation is below.

     

    CREATE VIEW ##outputtable## AS 
    SELECT play, concat_ws(', ', sort_array(collect_set(outlook))) outlook_values
    FROM ##inputtable1##
    GROUP BY play

    The result will look like this:

    Screen Shot 2017-08-03 at 11.49.10.png

     

    The aggregation function here is collect_set that collects the distinct values from the outlook column. If you need all values instead of just the distinct values, use collect_list instead. The sort_array function is only required, if you want the value list to be deterministic and sorted. If not, it can be omitted. The concat_ws function concatenates the values from an array with the specified separator.

     

    I hope this helps,

    Peter

     

Answers

  • phellinger
    phellinger New Altair Community Member
    Answer ✓

    Hi Kevin,

     

    true, the Radoop Aggregate does not have a built-in concatenate option currently.

     

    You can use the Hive Script operator in this case with a modified version of the following script that demonstrates group concatenation on the Golf sample dataset (from the Sample repository), explanation is below.

     

    CREATE VIEW ##outputtable## AS 
    SELECT play, concat_ws(', ', sort_array(collect_set(outlook))) outlook_values
    FROM ##inputtable1##
    GROUP BY play

    The result will look like this:

    Screen Shot 2017-08-03 at 11.49.10.png

     

    The aggregation function here is collect_set that collects the distinct values from the outlook column. If you need all values instead of just the distinct values, use collect_list instead. The sort_array function is only required, if you want the value list to be deterministic and sorted. If not, it can be omitted. The concat_ws function concatenates the values from an array with the specified separator.

     

    I hope this helps,

    Peter