Radoop and Hortonworks Sandbox Connection Problem
Dear friends
I have a problem while connecting from Rpidminer7.3 Radoop to Hortonworks sandbox.
I have installed the following hortonworks sandbox on the vmware workstation sandbox HDP_2.5_docker_vmware_25_10_2016_08_59_25_hdp_2_5_0_0_1245_ambari_2_4_0_0_1225.ovf
and also applied the Distribution-Specific Notes of Radoop documents on it
http://docs.rapidminer.com/radoop/installation/distribution-notes.html#hdp-sandbox
But when I make a connection from Radoop to the sandbox and run Quick Test, I get the following error(Screenshots are included)
[Dec 21, 2016 6:25:57 PM] SEVERE: com.rapidminer.operator.UserError: Could not upload the necessary component to the directory on the HDFS: '/tmp/radoop/_shared/db_default/'
[Dec 21, 2016 6:25:57 PM] SEVERE: Hive jar (with additional functions) upload failed. Please check that the NameNode and DataNodes run and are accessible on the address and port you specified.
[Dec 21, 2016 6:25:57 PM] SEVERE: Test failed: UDF jar upload
[Dec 21, 2016 6:25:57 PM] SEVERE: Connection test for 'Hortonworks_Hadoop' failed.
Regards
when quick test is pressed a file will be created in db_default
Best Answer
-
Hi All,
We have updated the guide to connecting to the latest Hortonworks Sandbox virtual machine. Thoroughly following the steps should solve the above issues.
Please follow the guide at http://docs.rapidminer.com/radoop/installation/distribution-notes.html.
For those interested in technical details, here is some explanation. The Hortonworks Sandbox connection problems appeared as Hortonworks updated their Sandbox environment, so that now Hadoop runs on Docker inside the Virtual Box. After this change in the networking, a hostname must be used to access the DataNodes, because it can be resolved to either the external or the internal IP depending on where it is resolved. Moreover, not all ports are exposed properly, that's why we need to add the permanent iptables rules as a workaround.
Best,
Peter
2
Answers
-
Hi,
Please try the following Advanced Hadoop Parameter:
Key = dfs.client.use.datanode.hostname
Value = trueBest, Zoltan
0 -
I also has the same issues.
Tried to import hadoop configuration files and also import from cluster manager. Added the extra advanced hadoop parameters as @zprekopcsak instructed.
But I still get the error
[Jan 19, 2017 10:25:46 AM]: Connection test for 'Sandbox (192.168.8.128)' started.
[Jan 19, 2017 10:25:46 AM]: Using Radoop version 7.4.0-ALPHA.
[Jan 19, 2017 10:25:46 AM]: Running tests: [Hive connection, Fetch dynamic settings, Java version, HDFS, MapReduce, Radoop temporary directory, MapReduce staging directory, Spark staging directory, Spark assembly jar existence, UDF jar upload, Create permanent UDFs]
[Jan 19, 2017 10:25:46 AM]: Running test 1/11: Hive connection
[Jan 19, 2017 10:25:46 AM]: Hive server 2 connection (sandbox.hortonworks.com:10000) test started.
[Jan 19, 2017 10:25:46 AM]: Test succeeded: Hive connection (0.141s)
[Jan 19, 2017 10:25:46 AM]: Running test 2/11: Fetch dynamic settings
[Jan 19, 2017 10:25:46 AM]: Retrieving required configuration properties...
[Jan 19, 2017 10:25:46 AM]: Successfully fetched property: hive.execution.engine
[Jan 19, 2017 10:25:46 AM]: Successfully fetched property: dfs.user.home.dir.prefix
[Jan 19, 2017 10:25:46 AM]: Successfully fetched property: system:hdp.version
[Jan 19, 2017 10:25:46 AM]: The specified local value of mapreduce.reduce.speculative (false) differs from remote value (true).
[Jan 19, 2017 10:25:46 AM]: The specified local value of dfs.client.use.datanode.hostname (true) differs from remote value (false).
[Jan 19, 2017 10:25:46 AM]: The specified local value of yarn.nodemanager.admin-env (MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX) differs from remote value (MALLOC_ARENA_MAX).
[Jan 19, 2017 10:25:46 AM]: The specified local value of yarn.app.mapreduce.am.command-opts (-Xmx409m -Dhdp.version=${hdp.version}) differs from remote value (-Xmx200m).
[Jan 19, 2017 10:25:46 AM]: The specified local value of mapreduce.admin.map.child.java.opts (-server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=${hdp.version}) differs from remote value (-server -XX:NewRatio).
[Jan 19, 2017 10:25:46 AM]: The specified local value of mapreduce.admin.reduce.child.java.opts (-server -XX:NewRatio=8 -Djava.net.preferIPv4Stack=true -Dhdp.version=${hdp.version}) differs from remote value (-server -XX:NewRatio).
[Jan 19, 2017 10:25:46 AM]: The specified local value of yarn.nodemanager.recovery.dir ({{yarn_log_dir_prefix}}/nodemanager/recovery-state) differs from remote value (/var/log/hadoop-yarn/nodemanager/recovery-state).
[Jan 19, 2017 10:25:46 AM]: The specified local value of yarn.app.mapreduce.am.admin-command-opts (-Dhdp.version=${hdp.version}) differs from remote value (-Dhdp.version).
[Jan 19, 2017 10:25:46 AM]: The specified local value of mapreduce.admin.user.env (LD_LIBRARY_PATH=/usr/hdp/${hdp.version}/hadoop/lib/native:/usr/hdp/${hdp.version}/hadoop/lib/native/Linux-amd64-64) differs from remote value (LD_LIBRARY_PATH).
[Jan 19, 2017 10:25:46 AM]: Test succeeded: Fetch dynamic settings (0.209s)
[Jan 19, 2017 10:25:46 AM]: Running test 3/11: Java version
[Jan 19, 2017 10:25:46 AM]: Cluster Java version: 1.8.0_111-b15
[Jan 19, 2017 10:25:46 AM]: Test succeeded: Java version (0.000s)
[Jan 19, 2017 10:25:46 AM]: Running test 4/11: HDFS
[Jan 19, 2017 10:25:46 AM]: Test succeeded: HDFS (0.151s)
[Jan 19, 2017 10:25:46 AM]: Running test 5/11: MapReduce
[Jan 19, 2017 10:25:46 AM]: Test succeeded: MapReduce (0.088s)
[Jan 19, 2017 10:25:46 AM]: Running test 6/11: Radoop temporary directory
[Jan 19, 2017 10:25:46 AM]: Test succeeded: Radoop temporary directory (0.011s)
[Jan 19, 2017 10:25:46 AM]: Running test 7/11: MapReduce staging directory
[Jan 19, 2017 10:25:46 AM]: Test succeeded: MapReduce staging directory (0.087s)
[Jan 19, 2017 10:25:46 AM]: Running test 8/11: Spark staging directory
[Jan 19, 2017 10:25:46 AM]: Test succeeded: Spark staging directory (0.017s)
[Jan 19, 2017 10:25:46 AM]: Running test 9/11: Spark assembly jar existence
[Jan 19, 2017 10:25:46 AM]: Spark assembly jar existence in the local:// file system cannot be checked. Test skipped.
[Jan 19, 2017 10:25:46 AM]: Test succeeded: Spark assembly jar existence (0.000s)
[Jan 19, 2017 10:25:46 AM]: Running test 10/11: UDF jar upload
[Jan 19, 2017 10:25:46 AM]: File uploaded: 97.01 KB written in 0 seconds (67.72 MB/sec)
[Jan 19, 2017 10:25:48 AM] SEVERE: File /tmp/radoop/_shared/db_default/radoop_hive-v4_UPLOADING_1484839546975_xdi3i5w.jar could only be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1641)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3198)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3122)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:843)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:500)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2313)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2309)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2307)
[Jan 19, 2017 10:25:48 AM] SEVERE: Test failed: UDF jar upload
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: UDF jar upload
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: Spark assembly jar existence
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: Spark staging directory
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: MapReduce staging directory
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: Radoop temporary directory
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: MapReduce
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: HDFS
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: Java version
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: Fetch dynamic settings
[Jan 19, 2017 10:25:48 AM]: Cleaning after test: Hive connection
[Jan 19, 2017 10:25:48 AM]: Total time: 1.761s
[Jan 19, 2017 10:25:48 AM] SEVERE: com.rapidminer.operator.UserError: Could not upload the necessary component to the directory on the HDFS: '/tmp/radoop/_shared/db_default/'
[Jan 19, 2017 10:25:48 AM] SEVERE: Hive jar (with additional functions) upload failed. Please check that the NameNode and DataNodes run and are accessible on the address and port you specified.
[Jan 19, 2017 10:25:48 AM] SEVERE: Test failed: UDF jar upload
[Jan 19, 2017 10:25:48 AM] SEVERE: Connection test for 'Sandbox (192.168.8.128)' failed.0 -
Hi All,
We have updated the guide to connecting to the latest Hortonworks Sandbox virtual machine. Thoroughly following the steps should solve the above issues.
Please follow the guide at http://docs.rapidminer.com/radoop/installation/distribution-notes.html.
For those interested in technical details, here is some explanation. The Hortonworks Sandbox connection problems appeared as Hortonworks updated their Sandbox environment, so that now Hadoop runs on Docker inside the Virtual Box. After this change in the networking, a hostname must be used to access the DataNodes, because it can be resolved to either the external or the internal IP depending on where it is resolved. Moreover, not all ports are exposed properly, that's why we need to add the permanent iptables rules as a workaround.
Best,
Peter
2