🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Connecting to CDH5 in an EC2 instance

User: "pau_fernandez_q"
New Altair Community Member
Updated by Jocelyn

Dear all,

 

I have recently launched an EC2 instance with a CDH 5.11 within it. All services seem to be up and running. I have passed several tests to validate the installation.

 

I have also installed RapidMiner Studio on my desktop as well as the Radoop extension. Currently, I am trying to connect to my hadoop cluster. The EC2 instance is not configured to use Elastic IPs, I am ussing tunnels through ssh session. 

 

I am currently trying to pass the full test to validate the connection. Initially, configuration was imported from Cloudera Manager. Then I modified several properties to adjust to my environment. Hive, Java version, Map Reduce, NNode networking test connections have been passed successfully but I am stucked with the upload of a jar file to HDFS. I guess the problem is given by a previous warning when doing DataNode networking test:

 

 WARNING: Reverse DNS lookup failed! Expected hostname for ip <public-ip>: <fqdn>, but received <public DNS>.

 WARNING: DataNode port 50010 on the ip/hostname <fqdn> cannot be reached. Please check that you can access the DataNodes of your cluster.

 

I guess that tunnel on port 50010 is working fine but there is something I am missing. Output of netstat command shows this port is listening to all IPs (0.0.0.0).

 

Things I have tried:

 

- Edit my local hosts file to resolve public ip to internal server hostname. Then Radoop complains because server is unreachable.

- Format namenode previously deleting all data in hdfs data directory

- Edit dfs.client.use.datanode.hostname and dfs.datanode.use.datanode.hostname on the client configuration to true.

- Try to upload a file using another client such as toad. Same error.

- Edit dfs.datanode.address in server to be like hostname:port is not allowed by Cloudera Manager. Only can be set as the port number.

- Edit dfs.datanode.address in the client conf does not change Radoop behaviour.

 

The error when trying to upload the jar file is the following:

[----] SEVERE: File /tmp/radoop/_shared/db_default/radoop_hive-v4_UPLOADING_1498636293395_dy8gaul.jar could only be replicated to 0 nodes instead of minReplication (=1).  There are 1 datanode(s) running and 1 node(s) are excluded in this operation.

 

Somehow the client knows the number of datanodes in hdfs service. Could I say ssh tunnel on port 50010 is working fine? Can someone point me to the right direction?

 

Thank you!!

Find more posts tagged with

Sort by:
1 - 1 of 11