Problem with Radoop on Spark failed error

tasbihmr
tasbihmr New Altair Community Member
edited November 5 in Community Q&A

Dear All,

I am using Radoop and connected to a single node Hadoop cluster. I have also Hive running, and Spark is also on my machine. I run the Quick test of Radoop connection and is successful.

Next I have started to follow the Titanic data tutorial to understand more about Radoop and Rapidminer functions. When I run an example which is retrieving data from Hive and performing Radoop Nest validation, in which an SPARK job Decision Tree Binomial MLib is being used, I recieve an error as following, a full listinng of the rest of the errors I have also included, can you guide me why the SPARK job fails, have not installed SPARK correctly, is the problem with my process, or is this a connection error ?

********************************************************************************************8

Dec 12, 2016 11:16:04 PM FINE: Hive query: CREATE VIEW radoop__tmp_maziar_1481571964383_mf8n48v AS SELECT age,passenger_class,sex,no_of_siblings_or_spouses_on_board,no_of_parents_or_children_on_board,passenger_fare,survived FROM radoop__tmp_maziar_1481571909693_kee9ej8 WHERE rndp=0
Dec 12, 2016 11:16:05 PM FINE: Hive query: CREATE VIEW radoop__tmp_maziar_1481571965033_rtiirp4 AS SELECT age,passenger_class,sex,no_of_siblings_or_spouses_on_board,no_of_parents_or_children_on_board,passenger_fare,survived FROM radoop__tmp_maziar_1481571909693_kee9ej8 WHERE rndp=1
Dec 12, 2016 11:16:05 PM FINE: Hive query: DESCRIBE radoop__tmp_maziar_1481571964383_mf8n48v
Dec 12, 2016 11:16:06 PM FINE: Hive query: DESCRIBE radoop__tmp_maziar_1481571965033_rtiirp4
Dec 12, 2016 11:16:06 PM INFO: Decision Tree (MLlib binominal): Materializing input HadoopExampleSet as Parquet.
Dec 12, 2016 11:16:06 PM FINE: Hive query: DROP VIEW IF EXISTS radoop__tmp_maziar_1481571966498a_fwxtjmd
Dec 12, 2016 11:16:06 PM FINE: Hive query: DROP TABLE IF EXISTS radoop__tmp_maziar_1481571966498a_fwxtjmd PURGE
Dec 12, 2016 11:16:07 PM FINE: (Decision Tree (MLlib binominal)) Hive query: CREATE TABLE radoop__tmp_maziar_1481571966498a_fwxtjmd STORED AS PARQUET LOCATION '/tmp/radoop/maziar/tmp_1481571966498_tbf6xaw/tmp_1481571966919_higs6s4' AS SELECT * FROM radoop__tmp_maziar_1481571964383_mf8n48v
Dec 12, 2016 11:16:43 PM INFO: Decision Tree (MLlib binominal): Materialized.
Dec 12, 2016 11:16:43 PM FINE: Hive query: DESCRIBE FORMATTED radoop__tmp_maziar_1481571966498a_fwxtjmd
Dec 12, 2016 11:16:43 PM FINE: Hive query: DESCRIBE FORMATTED radoop__tmp_maziar_1481571966498a_fwxtjmd
Dec 12, 2016 11:16:43 PM FINE: Hive query: DESCRIBE FORMATTED radoop__tmp_maziar_1481571966498a_fwxtjmd
Dec 12, 2016 11:16:43 PM FINE: Hive query: DESCRIBE radoop__tmp_maziar_1481571966498a_fwxtjmd
Dec 12, 2016 11:16:44 PM INFO: Getting radoop-common.jar file from plugin jar...
Dec 12, 2016 11:16:44 PM INFO: Remote radoop-common-7.3.0.jar is up to date.
Dec 12, 2016 11:16:44 PM INFO: Getting radoop-spark20.jar file from plugin jar...
Dec 12, 2016 11:16:44 PM INFO: Remote radoop-spark20-7.3.0.jar is up to date.
Dec 12, 2016 11:16:46 PM FINE: Decision Tree (MLlib binominal): Setting lower driver memory (512 Mb) than requested (2048 Mb), because the request exceeded the cluster's resources.
Dec 12, 2016 11:16:47 PM INFO: Decision Tree (MLlib binominal): Spark application submitted.
Dec 12, 2016 11:16:52 PM FINE: Decision Tree (MLlib binominal): Yarn application state of application_1481571703578_0004: ACCEPTED
Dec 12, 2016 11:16:57 PM FINE: Decision Tree (MLlib binominal): Yarn application state of application_1481571703578_0004: ACCEPTED
Dec 12, 2016 11:17:02 PM FINE: Decision Tree (MLlib binominal): Yarn application state of application_1481571703578_0004: FAILED
Dec 12, 2016 11:17:02 PM INFO: Decision Tree (MLlib binominal): Spark application finished.
Dec 12, 2016 11:17:02 PM INFO: Decision Tree (MLlib binominal): Distributed final state: FAILED
Dec 12, 2016 11:17:02 PM WARNING: Radoop Nest: com.rapidminer.operator.UserError: The Spark job failed.
Dec 12, 2016 11:17:02 PM FINE: Hive query: DROP VIEW IF EXISTS radoop__tmp_maziar_1481571909693_kee9ej8
Dec 12, 2016 11:17:02 PM FINE: THREADS AT CLOSE:
Dec 12, 2016 11:17:02 PM FINE: Deleting temporary directory or file: /tmp/radoop/maziar/tmp_1481571966495_kmwaf9l/
Dec 12, 2016 11:17:02 PM FINE: Deleting temporary directory or file: /tmp/radoop/maziar/tmp_1481571966498_tbf6xaw/
Dec 12, 2016 11:17:02 PM SEVERE: Process failed: The Spark job failed.
Dec 12, 2016 11:17:02 PM SEVERE: Here:
Dec 12, 2016 11:17:02 PM SEVERE: Process[1] (Process)
Dec 12, 2016 11:17:02 PM SEVERE: subprocess 'Main Process'
Dec 12, 2016 11:17:02 PM SEVERE: +- Radoop Nest[1] (Radoop Nest)
Dec 12, 2016 11:17:02 PM SEVERE: subprocess 'Radoop Nest'
Dec 12, 2016 11:17:02 PM SEVERE: +- Retrieve[1] (Retrieve from Hive)
Dec 12, 2016 11:17:02 PM SEVERE: +- Validation[1] (Split Validation)
Dec 12, 2016 11:17:02 PM SEVERE: subprocess 'Training'
Dec 12, 2016 11:17:02 PM SEVERE: ==> | | +- Decision Tree (MLlib binominal)[1] (Decision Tree (MLlib binominal))
Dec 12, 2016 11:17:02 PM SEVERE: subprocess 'Testing'
Dec 12, 2016 11:17:02 PM SEVERE: | +- Apply Model[0] (Apply Model)
Dec 12, 2016 11:17:02 PM SEVERE: | +- Performance[0] (Performance (Classification))
Dec 12, 2016 11:17:02 PM SEVERE: +- Retrieve (2)[0] (Retrieve from Hive)
Dec 12, 2016 11:17:02 PM SEVERE: +- Apply Model (2)[0] (Apply Model)
Dec 12, 2016 11:17:02 PM FINE: Hive query: DROP TABLE IF EXISTS radoop__tmp_maziar_1481571909693_kee9ej8 PURGE
Dec 12, 2016 11:17:03 PM FINE: Hive query: DROP VIEW IF EXISTS radoop__tmp_maziar_1481571964383_mf8n48v
Dec 12, 2016 11:17:04 PM FINE: Hive query: DROP TABLE IF EXISTS radoop__tmp_maziar_1481571964383_mf8n48v PURGE
Dec 12, 2016 11:17:04 PM FINE: Hive query: DROP VIEW IF EXISTS radoop__tmp_maziar_1481571965033_rtiirp4
Dec 12, 2016 11:17:04 PM FINE: Hive query: DROP TABLE IF EXISTS radoop__tmp_maziar_1481571965033_rtiirp4 PURGE
Dec 12, 2016 11:17:04 PM FINE: Hive query: DROP VIEW IF EXISTS radoop__tmp_maziar_1481571966498a_fwxtjmd
Dec 12, 2016 11:17:05 PM FINE: Hive query: DROP TABLE IF EXISTS radoop__tmp_maziar_1481571966498a_fwxtjmd PURGE
Dec 12, 2016 11:18:58 PM FINE: Mnemonic key q not found for action hive.write_query (Execute Query...).
Dec 12, 2016 11:18:59 PM FINE: Hive query: SHOW TABLES

Answers

  • ztoth
    ztoth New Altair Community Member

    Hi,

     

    this probably indicates a connection failure. Could you try to run the Spark integration test? On The "Manage Radoop Connections..." dialog select your connection and click "Full Test...". Select the Customize option and enable only the Spark test (it would be nice to execute every test, but let's focus on the Spark integration now).

    Please share the logs of the test.

     

    Additionally, you can check the logs of application_1481571703578_0004 on the Resource Manager web interface to have a better understanding of the error.

  • tasbihmr
    tasbihmr New Altair Community Member

    Hi,

    I enabled the Spark Job test like you indicated on the Full Test, I got the following output, I just want to tell you here my system is functioning with 8 GB Ram, but it is severely low on Ram as I am running Hadoop, Hive, Spark all on the same machine, is it maybe that the Spark job needs more resources ?

    Regards, Maziar

    **************************************************************************************************************

    [Dec 13, 2016 10:01:48 AM]: Integration test for 'Radoop1' started.
    [Dec 13, 2016 10:01:48 AM]: Using Radoop version 7.3.0.
    [Dec 13, 2016 10:01:48 AM]: Running tests: [Hive connection, Fetch dynamic settings, Java version, HDFS, MapReduce, Radoop temporary directory, MapReduce staging directory, Spark staging directory, Spark assembly jar existence, UDF jar upload, Create permanent UDFs, HDFS upload, Spark job]
    [Dec 13, 2016 10:01:48 AM]: Running test 1/13: Hive connection
    [Dec 13, 2016 10:01:48 AM]: Hive server 2 connection (localhost:10000) test started.
    [Dec 13, 2016 10:01:50 AM]: Test succeeded: Hive connection (2.537s)
    [Dec 13, 2016 10:01:50 AM]: Running test 2/13: Fetch dynamic settings
    [Dec 13, 2016 10:01:50 AM]: Retrieving required configuration properties...
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: hive.execution.engine
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: yarn.resourcemanager.scheduler.address
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: yarn.resourcemanager.resource-tracker.address
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: yarn.resourcemanager.admin.address
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: yarn.app.mapreduce.am.staging-dir
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: mapreduce.jobhistory.done-dir
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: mapreduce.jobhistory.intermediate-done-dir
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: mapreduce.jobhistory.address
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: yarn.scheduler.maximum-allocation-mb
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: yarn.scheduler.maximum-allocation-vcores
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: dfs.user.home.dir.prefix
    [Dec 13, 2016 10:01:50 AM]: Could not fetch property mapreduce.application.classpath
    [Dec 13, 2016 10:01:50 AM]: Successfully fetched property: dfs.client.use.datanode.hostname
    [Dec 13, 2016 10:01:50 AM]: Could not fetch property dfs.encryption.key.provider.uri
    [Dec 13, 2016 10:01:50 AM]: Test succeeded: Fetch dynamic settings (0.250s)
    [Dec 13, 2016 10:01:50 AM]: Running test 3/13: Java version
    [Dec 13, 2016 10:01:50 AM]: Cluster Java version: 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14
    [Dec 13, 2016 10:01:50 AM]: Test succeeded: Java version (0.013s)
    [Dec 13, 2016 10:01:50 AM]: Running test 4/13: HDFS
    [Dec 13, 2016 10:01:53 AM]: Test succeeded: HDFS (2.262s)
    [Dec 13, 2016 10:01:53 AM]: Running test 5/13: MapReduce
    [Dec 13, 2016 10:01:53 AM]: Test succeeded: MapReduce (0.561s)
    [Dec 13, 2016 10:01:53 AM]: Running test 6/13: Radoop temporary directory
    [Dec 13, 2016 10:01:53 AM]: Test succeeded: Radoop temporary directory (0.183s)
    [Dec 13, 2016 10:01:53 AM]: Running test 7/13: MapReduce staging directory
    [Dec 13, 2016 10:01:54 AM]: Test succeeded: MapReduce staging directory (0.336s)
    [Dec 13, 2016 10:01:54 AM]: Running test 8/13: Spark staging directory
    [Dec 13, 2016 10:01:54 AM]: Test succeeded: Spark staging directory (0.105s)
    [Dec 13, 2016 10:01:54 AM]: Running test 9/13: Spark assembly jar existence
    [Dec 13, 2016 10:01:54 AM]: Spark assembly jar existence in the local:// file system cannot be checked. Test skipped.
    [Dec 13, 2016 10:01:54 AM]: Test succeeded: Spark assembly jar existence (0.001s)
    [Dec 13, 2016 10:01:54 AM]: Running test 10/13: UDF jar upload
    [Dec 13, 2016 10:01:54 AM]: Remote radoop_hive-v4.jar is up to date.
    [Dec 13, 2016 10:01:54 AM]: Test succeeded: UDF jar upload (0.148s)
    [Dec 13, 2016 10:01:54 AM]: Running test 11/13: Create permanent UDFs
    [Dec 13, 2016 10:01:54 AM]: Remote radoop_hive-v4.jar is up to date.
    [Dec 13, 2016 10:01:54 AM]: Test succeeded: Create permanent UDFs (0.117s)
    [Dec 13, 2016 10:01:54 AM]: Running test 12/13: HDFS upload
    [Dec 13, 2016 10:01:55 AM]: Uploaded test data file size: 5642
    [Dec 13, 2016 10:01:55 AM]: Test succeeded: HDFS upload (0.681s)
    [Dec 13, 2016 10:01:55 AM]: Running test 13/13: Spark job
    [Dec 13, 2016 10:05:55 AM] SEVERE: Test failed: Spark job
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: Spark job
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: HDFS upload
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: Create permanent UDFs
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: UDF jar upload
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: Spark assembly jar existence
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: Spark staging directory
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: MapReduce staging directory
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: Radoop temporary directory
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: MapReduce
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: HDFS
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: Java version
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: Fetch dynamic settings
    [Dec 13, 2016 10:05:55 AM]: Cleaning after test: Hive connection
    [Dec 13, 2016 10:05:55 AM]: Total time: 247.438s
    [Dec 13, 2016 10:05:55 AM] SEVERE: java.util.concurrent.TimeoutException
    [Dec 13, 2016 10:05:55 AM] SEVERE: Timeout on the Spark test job. Please verify your Spark Resource Allocation settings on the Advanced Connection Properties window. You can check the logs of the Spark job on the ResourceManager web interface at http://127.0.0.1:8088.
    [Dec 13, 2016 10:05:55 AM] SEVERE: Test failed: Spark job
    [Dec 13, 2016 10:05:55 AM] SEVERE: Integration test for 'Radoop1' failed.

  • phellinger
    phellinger New Altair Community Member

    Hi,

     

    I can only confirm that 8 GB is quite low even when the node would only run the Hadoop services and nothing else.

    You may need to tweak with YARN memory settings to get things to work.

     

    The ResourceManager web interface at http://127.0.0.1:8088 should show you the currently running jobs. I think the test job has not started. It will be in ACCEPTED state until it gets the resources it requests (then it becomes RUNNING). You can kill jobs that are in ACCEPTED or RUNNING state with the command yarn application -kill <applicationId>.

     

    I suggest to re-run the test after you stopped all other jobs on the cluster (Memory Used should show 0 GB). You can change the default Resource Allocation % in the Radoop connection (Spark settings) from the default 70% to 50%, or lower.

     

    Best,

    Peter

  • 1131600132
    1131600132 New Altair Community Member
    Maybe you can try by disabling the Enable ResourceManager ACLs ( yarn.acl.enable ) to false