I am new to Radoop and trying to setup a development enviornment. My setup is
- Virtual Machine (Ubuntu) running in Virtual Box (I am not using HDP Image)
- 5GB Ram is assinged to the VM
- Spark 2.0.0
- Hadoop 2.8.5
- Hive 2.3.3
My quick tests are all okay. When I run full tests, I get the following error
[Nov 4, 2018 7:50:46 PM]: Running test 17/25: Hive load data
[Nov 4, 2018 7:50:52 PM]: Test succeeded: Hive load data (6.356s)
[Nov 4, 2018 7:50:52 PM]: Running test 18/25: Import job
[Nov 4, 2018 7:51:07 PM] SEVERE: Test failed: Import job
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Import job
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Hive load data
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Radoop jar upload
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: HDFS upload
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Create permanent UDFs
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: UDF jar upload
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Spark assembly jar existence
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Spark staging directory
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: MapReduce staging directory
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Radoop temporary directory
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: MapReduce
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: HDFS
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: YARN services networking
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: DataNode networking
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: NameNode networking
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Java version
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Fetch dynamic settings
[Nov 4, 2018 7:51:07 PM]: Cleaning after test: Hive connection
[Nov 4, 2018 7:51:07 PM]: Total time: 22.634s
[Nov 4, 2018 7:51:07 PM]: java.lang.Exception: Import job failed, see the job logs on the cluster for details.
at eu.radoop.connections.service.test.integration.TestHdfsImport.call(TestHdfsImport.java:95)
at eu.radoop.connections.service.test.integration.TestHdfsImport.call(TestHdfsImport.java:40)
at eu.radoop.connections.service.test.RadoopTestContext.lambda$runTest$1(RadoopTestContext.java:279)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[Nov 4, 2018 7:51:07 PM] SEVERE: java.lang.Exception: Import job failed, see the job logs on the cluster for details.
[Nov 4, 2018 7:51:07 PM] SEVERE: Test data import from the distributed file system to Hive server 2 failed. Please check the logs of the MapReduce job on the ResourceManager web interface at http://${yarn.resourcemanager.hostname}:8088.
[Nov 4, 2018 7:51:07 PM] SEVERE: Test failed: Import job
[Nov 4, 2018 7:51:07 PM] SEVERE: Integration test for 'VirtualBoxVM' failed.
In Yarn container logs, I see the following error
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Further, If I run just the spark tests, then I get the following
My Spark Radoop settings ->
- Spark 2.0
- Assembly path -> hdfs:///spark/jars/*
- Resource Allocation Policy -> Static, Default Configuration
Logs
[Nov 4, 2018 7:55:44 PM]: Running test 3/4: HDFS upload
[Nov 4, 2018 7:55:44 PM]: Uploaded test data file size: 5642
[Nov 4, 2018 7:55:44 PM]: Test succeeded: HDFS upload (0.075s)
[Nov 4, 2018 7:55:44 PM]: Running test 4/4: Spark job
[Nov 4, 2018 7:55:44 PM]: Assuming Spark version Spark 2.0.
[Nov 4, 2018 7:56:38 PM]: Assuming Spark version Spark 1.4 or below.
[Nov 4, 2018 7:56:38 PM] SEVERE: Test failed: Spark job
[Nov 4, 2018 7:56:38 PM]: Cleaning after test: Spark job
[Nov 4, 2018 7:56:38 PM]: Cleaning after test: HDFS upload
[Nov 4, 2018 7:56:38 PM]: Cleaning after test: Spark staging directory
[Nov 4, 2018 7:56:38 PM]: Cleaning after test: Fetch dynamic settings
[Nov 4, 2018 7:56:38 PM]: Total time: 53.783s
[Nov 4, 2018 7:56:38 PM] SEVERE: com.rapidminer.operator.UserError: The specified Spark assembly jar, archive or lib directory does not exist or cannot be read.
[Nov 4, 2018 7:56:38 PM] SEVERE: The Spark test failed. Please verify your Hadoop and Spark version and check if your assembly jar location is correct. If the job failed, check the logs on the ResourceManager web interface at http://${yarn.resourcemanager.hostname}:8088.
[Nov 4, 2018 7:56:38 PM] SEVERE: Test failed: Spark job
[Nov 4, 2018 7:56:38 PM] SEVERE: Integration test for 'VirtualBoxVM' failed.
Resource Manager logs: (Full logs attached with the post)
User class threw exception: org.apache.spark.SparkException: Spark test failed: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/tmp/radoop/training-vm/tmp_1541357744748_x0migqc
Apart from this, I have also attached my yarn-site.xml and mapred-site.xml
Any help would be much appreciated.