🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Superset Operator Tips

JEdwardUser: "JEdward"
New Altair Community Member
Updated by Jocelyn
Hi all,

I have a process where I'm using the Superset operator to add attributes in my dataset.  (Around 25,000 empty attributes each time) 
However, the operator is quite a bottleneck in the process. 

Does anyone have any best practices for adding large numbers of attributes to a dataset efficiently? 

Thanks,
JEdward

Find more posts tagged with

Sort by:
1 - 3 of 31
    Hi,
    Have you tried to use the Radoop extension with Hive for that? The process is not exactly the same but the speed Radoop provides gave me at least the feeling I did not have to wait longer than I normally can handle... There is no need to build your entire process externally, I think. Ideally would be to have these processes taking a lot of time, externally running and the rest on you local machine.
    Cheers
    Sven
    Hi,

    have you tried to add a materialize data? Might be a bit faster afterwards.

    Cheers,
    Martin
    JEdwardUser: "JEdward"
    New Altair Community Member
    OP
    Thanks guys,

    Martin, good point on materialise data.  I'll give that a try. 

    Sven, I will be moving this project onto Hadoop eventually, but for now I'm stuck in RDBMS land with this one.