Radoop on Amazon EMR fails to initialize

User: "Andrew"
New Altair Community Member
Updated by Jocelyn

I'm very close to being able to get Radoop working with an Amazon EMR cluster. My set up involves RapidMiner Studio and Radoop on a Windows laptop which has full unfettered firewall access to the EMR machines. I am not using SOCKS (although I started with this). I am using the absolute latest Spark, Hive and Hadoop components that Amazon makes available.

 

The full connection test fails at the point where components are being uploaded to the /tmp/radoop/_shared/db_default/ HDFS location. I can see that the data nodes are being contacted on port 50010 and it looks like this fails from my laptop because the ip addresses are not known. I have tried the dfs.client.use.datanode.hostname true/false workaround and I see this changes the name that it attempts to use - in one setting the node is <name>/<ipaddress>:50010 (which is odd) while in the other it is <ipaddress>:50010 (which is believable but doesn't resolve).

 

I don't have the luxury of installing RapidMiner components on the EMR cluster so my question is what is the best way to get the name nodes exposed to the PC running RapidMiner Studio and Radoop?

Find more posts tagged with