Spark Radoop connection
Hi everyone,
I am using Cloudera and the upgraded to Spark 2.2. I am having trouble when performing a Full Test. So in the configration what should be in "Spark Archive (or libs) path"?
I have tried getting the jar from (http://spark.apache.org/downloads.html) I wasn't able to find the "..assembly.jar" file. So I tried putting (local:///opt/cloudera/parcels/CDR/lib/spark/lib/spark-2.2.0-bin-hadoop2.6/jars/*), but didn't work. Also, I have tried the jars from (https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html#packaging) with no luck.
[Dec 18, 2017 10:01:31 PM]: --------------------------------------------------
[Dec 18, 2017 10:01:31 PM]: Integration test for 'cluster (master)' started.
[Dec 18, 2017 10:01:31 PM]: Using Radoop version 8.0.0.
[Dec 18, 2017 10:01:31 PM]: Running tests: [Hive connection, Fetch dynamic settings, Java version, HDFS, MapReduce, Radoop temporary directory, MapReduce staging directory, Spark staging directory, Spark assembly jar existence, UDF jar upload, Create permanent UDFs, HDFS upload, Spark job]
[Dec 18, 2017 10:01:31 PM]: Running test 1/13: Hive connection
[Dec 18, 2017 10:01:31 PM]: Hive server 2 connection (master.c.strange-mason-188717.internal:10000) test started.
[Dec 18, 2017 10:01:31 PM]: Test succeeded: Hive connection (0.042s)
[Dec 18, 2017 10:01:31 PM]: Running test 2/13: Fetch dynamic settings
[Dec 18, 2017 10:01:31 PM]: Retrieving required configuration properties...
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: hive.execution.engine
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: mapreduce.jobhistory.done-dir
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: mapreduce.jobhistory.intermediate-done-dir
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: dfs.user.home.dir.prefix
[Dec 18, 2017 10:01:31 PM]: Could not fetch property dfs.encryption.key.provider.uri
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: spark.executor.memory
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: spark.executor.cores
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: spark.driver.memory
[Dec 18, 2017 10:01:31 PM]: Could not fetch property spark.driver.cores
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: spark.yarn.executor.memoryOverhead
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: spark.yarn.driver.memoryOverhead
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: spark.dynamicAllocation.enabled
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: spark.dynamicAllocation.initialExecutors
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: spark.dynamicAllocation.minExecutors
[Dec 18, 2017 10:01:31 PM]: Successfully fetched property: spark.dynamicAllocation.maxExecutors
[Dec 18, 2017 10:01:31 PM]: Could not fetch property spark.executor.instances
[Dec 18, 2017 10:01:31 PM]: The specified local value of mapreduce.job.reduces (1) differs from remote value (-1).
[Dec 18, 2017 10:01:31 PM]: The specified local value of mapreduce.reduce.speculative (false) differs from remote value (true).
[Dec 18, 2017 10:01:31 PM]: The specified local value of mapreduce.job.redacted-properties (fs.s3a.access.key,fs.s3a.secret.key) differs from remote value (fs.s3a.access.key,fs.s3a.secret.key,yarn.app.mapreduce.am.admin.user.env,mapreduce.admin.user.env,hadoop.security.credential.provider.path).
[Dec 18, 2017 10:01:31 PM]: Test succeeded: Fetch dynamic settings (0.024s)
[Dec 18, 2017 10:01:31 PM]: Running test 3/13: Java version
[Dec 18, 2017 10:01:31 PM]: Cluster Java version: 1.8.0_151-b12
[Dec 18, 2017 10:01:31 PM]: Test succeeded: Java version (0.000s)
[Dec 18, 2017 10:01:31 PM]: Running test 4/13: HDFS
[Dec 18, 2017 10:01:31 PM]: Test succeeded: HDFS (0.125s)
[Dec 18, 2017 10:01:31 PM]: Running test 5/13: MapReduce
[Dec 18, 2017 10:01:31 PM]: Test succeeded: MapReduce (0.022s)
[Dec 18, 2017 10:01:31 PM]: Running test 6/13: Radoop temporary directory
[Dec 18, 2017 10:01:31 PM]: Test succeeded: Radoop temporary directory (0.007s)
[Dec 18, 2017 10:01:31 PM]: Running test 7/13: MapReduce staging directory
[Dec 18, 2017 10:01:31 PM]: Test succeeded: MapReduce staging directory (0.040s)
[Dec 18, 2017 10:01:31 PM]: Running test 8/13: Spark staging directory
[Dec 18, 2017 10:01:31 PM]: Test succeeded: Spark staging directory (0.020s)
[Dec 18, 2017 10:01:31 PM]: Running test 9/13: Spark assembly jar existence
[Dec 18, 2017 10:01:31 PM]: Spark assembly jar existence in the local:// file system cannot be checked. Test skipped.
[Dec 18, 2017 10:01:31 PM]: Test succeeded: Spark assembly jar existence (0.000s)
[Dec 18, 2017 10:01:31 PM]: Running test 10/13: UDF jar upload
[Dec 18, 2017 10:01:32 PM]: Remote radoop_hive-v4.jar is up to date.
[Dec 18, 2017 10:01:32 PM]: Test succeeded: UDF jar upload (0.007s)
[Dec 18, 2017 10:01:32 PM]: Running test 11/13: Create permanent UDFs
[Dec 18, 2017 10:01:32 PM]: Remote radoop_hive-v4.jar is up to date.
[Dec 18, 2017 10:01:32 PM]: Test succeeded: Create permanent UDFs (0.025s)
[Dec 18, 2017 10:01:32 PM]: Running test 12/13: HDFS upload
[Dec 18, 2017 10:01:32 PM]: Uploaded test data file size: 5642
[Dec 18, 2017 10:01:32 PM]: Test succeeded: HDFS upload (0.047s)
[Dec 18, 2017 10:01:32 PM]: Running test 13/13: Spark job
[Dec 18, 2017 10:01:32 PM]: Assuming Spark version Spark 2.2.
[Dec 18, 2017 10:01:32 PM] SEVERE: Test failed: Spark job
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: Spark job
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: HDFS upload
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: Create permanent UDFs
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: UDF jar upload
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: Spark assembly jar existence
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: Spark staging directory
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: MapReduce staging directory
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: Radoop temporary directory
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: MapReduce
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: HDFS
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: Java version
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: Fetch dynamic settings
[Dec 18, 2017 10:01:32 PM]: Cleaning after test: Hive connection
[Dec 18, 2017 10:01:32 PM]: Total time: 0.732s
[Dec 18, 2017 10:01:32 PM]: java.lang.IllegalArgumentException: Required AM memory (1024+384 MB) is above the max threshold (1024 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.
at org.apache.spark.deploy.yarn.Client.verifyClusterResources(Client.scala:311)
at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:164)
at eu.radoop.datahandler.mapreducehdfs.YarnHandlerLowLevel.runSpark_invoke(YarnHandlerLowLevel.java:813)
at eu.radoop.datahandler.mapreducehdfs.YarnHandlerLowLevel.runSpark_invoke(YarnHandlerLowLevel.java:510)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at eu.radoop.datahandler.mapreducehdfs.MRHDFSHandlerLowLevel$2.run(MRHDFSHandlerLowLevel.java:650)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1807)
at eu.radoop.datahandler.mapreducehdfs.MRHDFSHandlerLowLevel.invokeAs(MRHDFSHandlerLowLevel.java:646)
at sun.reflect.GeneratedMethodAccessor123.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at eu.radoop.datahandler.mapreducehdfs.MapReduceHDFSHandler.invokeAs(MapReduceHDFSHandler.java:1801)
at eu.radoop.datahandler.mapreducehdfs.MapReduceHDFSHandler.invokeAs(MapReduceHDFSHandler.java:1759)
at eu.radoop.datahandler.mapreducehdfs.MapReduceHDFSHandler.lambda$runSpark$26(MapReduceHDFSHandler.java:1021)
at eu.radoop.tools.ExceptionTools.checkOnly(ExceptionTools.java:474)
at eu.radoop.datahandler.mapreducehdfs.MapReduceHDFSHandler.runSpark(MapReduceHDFSHandler.java:1016)
at eu.radoop.datahandler.mapreducehdfs.MapReduceHDFSHandler.runSpark(MapReduceHDFSHandler.java:913)
at eu.radoop.connections.service.test.integration.TestSpark.runTestSparkJob(TestSpark.java:331)
at eu.radoop.connections.service.test.integration.TestSpark.runJobWithVersion(TestSpark.java:218)
at eu.radoop.connections.service.test.integration.TestSpark.call(TestSpark.java:109)
at eu.radoop.connections.service.test.integration.TestSpark.call(TestSpark.java:52)
at eu.radoop.connections.service.test.RadoopTestContext.lambda$runTest$0(RadoopTestContext.java:255)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[Dec 18, 2017 10:01:32 PM] SEVERE: java.lang.IllegalArgumentException: Required AM memory (1024+384 MB) is above the max threshold (1024 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum-allocation-mb'.
[Dec 18, 2017 10:01:32 PM] SEVERE: The Spark test failed. Please verify your Hadoop and Spark version and check if your assembly jar location is correct. If the job failed, check the logs on the ResourceManager web interface at http://master.c.strange-mason-188717.internal:8088.
[Dec 18, 2017 10:01:32 PM] SEVERE: Test failed: Spark job
[Dec 18, 2017 10:01:32 PM] SEVERE: Integration test for 'cluster (master)' failed.
Best Answer
-
Hi all,
I got it working! So, I had one slave node 2 vCPUs, 7.5 GB memory. I went to the Cloudera manager -> Yarn -> Configration ->
Container Memory yarn.nodemanager.resource.memory-mb = 7 GiB and Container Virtual CPU Cores yarn.nodemanager.resource.cpu-vcores = 2.Also, I had to copy the jars file to the slave node, which I missed doing before.The Result:[Dec 20, 2017 9:39:31 PM]: Integration test for 'cluster3' completed successfully.Thank you Peter for helping out your replies on the other posts were a tremendous guide2
Answers
-
Hi,
you may have, actually, set the Spark Archive path properly (local:// path seems correct), but the Spark client gives the following error:
"Required AM memory (1024+384 MB) is above the max threshold (1024 MB) of this cluster! Please increase the value of 'yarn.scheduler.maximum-allocation-mb'."
The memory max threshold setting on the cluster is not enough for running the Spark test job. 1 GB seems to be too low (the overhead itself is 0.5 GB). If this is a Virtual Machine, the RAM settings may have been low during installation and yarn.scheduler.maximum-allocation-mb may have been calculated using that low value.
Best,
Peter
1 -
Thanks Peter!
I am using Cloudera the master node is an 8GB memory. So what I did is increase the "yarn.scheduler.maximum-allocation-mb" up to 8GB (from Yarn -> configuration) then restarted yarn but it didn't workout. Then I increased "yarn.nodemanager.resource.memory-mb" to 2GB and it worked out. But now I'm getting another error (shown below), I have tried to decreased the Resource Allocation% in the Radoop connection from the default 70% to 50%, but it didn't workout.
Thanks,
[Dec 19, 2017 5:43:43 PM]: --------------------------------------------------
[Dec 19, 2017 5:43:43 PM]: Integration test for 'cluster2' started.
[Dec 19, 2017 5:43:43 PM]: Using Radoop version 8.0.0.
[Dec 19, 2017 5:43:43 PM]: Running tests: [Hive connection, Fetch dynamic settings, Java version, HDFS, MapReduce, Radoop temporary directory, MapReduce staging directory, Spark staging directory, Spark assembly jar existence, UDF jar upload, Create permanent UDFs, HDFS upload, Spark job]
[Dec 19, 2017 5:43:43 PM]: Running test 1/13: Hive connection
[Dec 19, 2017 5:43:43 PM]: Hive server 2 connection (master.c.strange-mason-188717.internal:10000) test started.
[Dec 19, 2017 5:43:43 PM]: Test succeeded: Hive connection (0.105s)
[Dec 19, 2017 5:43:43 PM]: Running test 2/13: Fetch dynamic settings
[Dec 19, 2017 5:43:43 PM]: Retrieving required configuration properties...
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: hive.execution.engine
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: mapreduce.jobhistory.done-dir
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: mapreduce.jobhistory.intermediate-done-dir
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: dfs.user.home.dir.prefix
[Dec 19, 2017 5:43:43 PM]: Could not fetch property dfs.encryption.key.provider.uri
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: spark.executor.memory
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: spark.executor.cores
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: spark.driver.memory
[Dec 19, 2017 5:43:43 PM]: Could not fetch property spark.driver.cores
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: spark.yarn.executor.memoryOverhead
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: spark.yarn.driver.memoryOverhead
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: spark.dynamicAllocation.enabled
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: spark.dynamicAllocation.initialExecutors
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: spark.dynamicAllocation.minExecutors
[Dec 19, 2017 5:43:43 PM]: Successfully fetched property: spark.dynamicAllocation.maxExecutors
[Dec 19, 2017 5:43:43 PM]: Could not fetch property spark.executor.instances
[Dec 19, 2017 5:43:43 PM]: The specified local value of mapreduce.job.reduces (1) differs from remote value (-1).
[Dec 19, 2017 5:43:43 PM]: The specified local value of mapreduce.reduce.speculative (false) differs from remote value (true).
[Dec 19, 2017 5:43:43 PM]: The specified local value of mapreduce.job.redacted-properties (fs.s3a.access.key,fs.s3a.secret.key) differs from remote value (fs.s3a.access.key,fs.s3a.secret.key,yarn.app.mapreduce.am.admin.user.env,mapreduce.admin.user.env,hadoop.security.credential.provider.path).
[Dec 19, 2017 5:43:43 PM]: The specified local value of yarn.scheduler.maximum-allocation-mb (8192) differs from remote value (2048).
[Dec 19, 2017 5:43:43 PM]: Test succeeded: Fetch dynamic settings (0.022s)
[Dec 19, 2017 5:43:43 PM]: Running test 3/13: Java version
[Dec 19, 2017 5:43:43 PM]: Cluster Java version: 1.8.0_151-b12
[Dec 19, 2017 5:43:43 PM]: Test succeeded: Java version (0.000s)
[Dec 19, 2017 5:43:43 PM]: Running test 4/13: HDFS
[Dec 19, 2017 5:43:43 PM]: Test succeeded: HDFS (0.117s)
[Dec 19, 2017 5:43:43 PM]: Running test 5/13: MapReduce
[Dec 19, 2017 5:43:43 PM]: Test succeeded: MapReduce (0.043s)
[Dec 19, 2017 5:43:43 PM]: Running test 6/13: Radoop temporary directory
[Dec 19, 2017 5:43:43 PM]: Test succeeded: Radoop temporary directory (0.022s)
[Dec 19, 2017 5:43:43 PM]: Running test 7/13: MapReduce staging directory
[Dec 19, 2017 5:43:43 PM]: Test succeeded: MapReduce staging directory (0.023s)
[Dec 19, 2017 5:43:43 PM]: Running test 8/13: Spark staging directory
[Dec 19, 2017 5:43:43 PM]: Test succeeded: Spark staging directory (0.028s)
[Dec 19, 2017 5:43:43 PM]: Running test 9/13: Spark assembly jar existence
[Dec 19, 2017 5:43:43 PM]: Spark assembly jar existence in the local:// file system cannot be checked. Test skipped.
[Dec 19, 2017 5:43:43 PM]: Test succeeded: Spark assembly jar existence (0.000s)
[Dec 19, 2017 5:43:43 PM]: Running test 10/13: UDF jar upload
[Dec 19, 2017 5:43:43 PM]: Remote radoop_hive-v4.jar is up to date.
[Dec 19, 2017 5:43:43 PM]: Test succeeded: UDF jar upload (0.085s)
[Dec 19, 2017 5:43:43 PM]: Running test 11/13: Create permanent UDFs
[Dec 19, 2017 5:43:43 PM]: Remote radoop_hive-v4.jar is up to date.
[Dec 19, 2017 5:43:44 PM]: Test succeeded: Create permanent UDFs (0.060s)
[Dec 19, 2017 5:43:44 PM]: Running test 12/13: HDFS upload
[Dec 19, 2017 5:43:44 PM]: Uploaded test data file size: 5642
[Dec 19, 2017 5:43:44 PM]: Test succeeded: HDFS upload (0.077s)
[Dec 19, 2017 5:43:44 PM]: Running test 13/13: Spark job
[Dec 19, 2017 5:43:44 PM]: Assuming Spark version Spark 2.2.
[Dec 19, 2017 5:47:44 PM] SEVERE: Test failed: Spark job
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: Spark job
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: HDFS upload
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: Create permanent UDFs
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: UDF jar upload
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: Spark assembly jar existence
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: Spark staging directory
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: MapReduce staging directory
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: Radoop temporary directory
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: MapReduce
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: HDFS
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: Java version
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: Fetch dynamic settings
[Dec 19, 2017 5:47:44 PM]: Cleaning after test: Hive connection
[Dec 19, 2017 5:47:44 PM]: Total time: 240.708s
[Dec 19, 2017 5:47:44 PM] SEVERE: java.util.concurrent.TimeoutException
[Dec 19, 2017 5:47:44 PM] SEVERE: Timeout on the Spark test job. Please verify your Spark Resource Allocation settings on the Advanced Connection Properties window. You can check the logs of the Spark job on the ResourceManager web interface at http://master.c.strange-mason-188717.internal:8088.
[Dec 19, 2017 5:47:44 PM] SEVERE: Test failed: Spark job
[Dec 19, 2017 5:47:44 PM] SEVERE: Integration test for 'cluster2' failed.0 -
Hi,
good move on the memory settings front. Though, the second setting means that effectively only 2 GB is allocated per NodeManager / node, which is quite small in the YARN world. Resource calculations (like the heuristic percent in the connection) are realistic only when 8 GB memory is available to jobs (NodeManager) per node.
This timeout may be caused by the fact that the job did not get the resources, thus, did not started. This may caused by the fact that only the Spark driver got the resources, but not any executor (worker). Two ways to confirm: 1) accessing the Resource Manager web UI, where the running job and its resource allocation is visible 2) if you add Log Panel via View -> Show Panel in Studio Design View, then right click on it and set log level to FINE could show you after the test in the dialog, if the Spark job did not get the resources.
Hope this helps.
Peter
0 -
Thanks agailn Peter
You are right, I have checked and this is what I found:
Total Resource Preempted: <memory:0, vCores:0> Total Number of Non-AM Containers Preempted: 0 Total Number of AM Containers Preempted: 0 Resource Preempted from Current Attempt: <memory:0, vCores:0> Number of Non-AM Containers Preempted from Current Attempt: 0 Aggregate Resource Allocation: 0 MB-seconds, 0 vcore-seconds
Any idea why this is happening? currently, I only have one master node and one slave node, could that be the problem? because, I can't see that the slave node has been used. So, will adding more slave nodes solve this?
0 -
Hi all,
I got it working! So, I had one slave node 2 vCPUs, 7.5 GB memory. I went to the Cloudera manager -> Yarn -> Configration ->
Container Memory yarn.nodemanager.resource.memory-mb = 7 GiB and Container Virtual CPU Cores yarn.nodemanager.resource.cpu-vcores = 2.Also, I had to copy the jars file to the slave node, which I missed doing before.The Result:[Dec 20, 2017 9:39:31 PM]: Integration test for 'cluster3' completed successfully.Thank you Peter for helping out your replies on the other posts were a tremendous guide2 -
how can i adjust this solution to my case. i'm working on windows and apache hadoop. how can i raise the capacity memory of yarn0
-
Maybe you can try by disabling the Enable ResourceManager ACLs ( yarn.acl.enable ) to false
0