Superset Operator Tips

JEdward
New Altair Community Member
Hi all,
I have a process where I'm using the Superset operator to add attributes in my dataset. (Around 25,000 empty attributes each time)
However, the operator is quite a bottleneck in the process.
Does anyone have any best practices for adding large numbers of attributes to a dataset efficiently?
Thanks,
JEdward
I have a process where I'm using the Superset operator to add attributes in my dataset. (Around 25,000 empty attributes each time)
However, the operator is quite a bottleneck in the process.
Does anyone have any best practices for adding large numbers of attributes to a dataset efficiently?
Thanks,
JEdward
Tagged:
0
Answers
-
Hi,
Have you tried to use the Radoop extension with Hive for that? The process is not exactly the same but the speed Radoop provides gave me at least the feeling I did not have to wait longer than I normally can handle... There is no need to build your entire process externally, I think. Ideally would be to have these processes taking a lot of time, externally running and the rest on you local machine.
Cheers
Sven0 -
Hi,
have you tried to add a materialize data? Might be a bit faster afterwards.
Cheers,
Martin0 -
Thanks guys,
Martin, good point on materialise data. I'll give that a try.
Sven, I will be moving this project onto Hadoop eventually, but for now I'm stuck in RDBMS land with this one.0