"Classloader problem integrating Hadoop to Rapidminer"

mbeckmann
mbeckmann New Altair Community Member
edited November 5 in Community Q&A
Dear Gentlemen,

First of all, many thanks for your amazing job with Rapidminer. As everybody tells, the Rapidminer team is composed of super heroes.

I'm creating an extension in order to integrate Rapidminer to Haddop, Mahout, Hive and so on, and I'm getting the following exception when I try to submmit a job:

java.lang.RuntimeException: java.io.IOException: failure to login
        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:546)
        at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318)
        at com.rapidminer.operator.rmahout.clustering.KMeans.doWork(KMeans.java:116)
        at com.rapidminer.operator.Operator.execute(Operator.java:834)
        at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
        at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
        at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
        at com.rapidminer.operator.rmahout.configuration.MastersNode.doWork(MastersNode.java:51)
        at com.rapidminer.operator.Operator.execute(Operator.java:834)
        at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
        at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
        at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
        at com.rapidminer.operator.Operator.execute(Operator.java:834)
        at com.rapidminer.Process.run(Process.java:925)
        at com.rapidminer.Process.run(Process.java:848)
        at com.rapidminer.Process.run(Process.java:807)
        at com.rapidminer.Process.run(Process.java:802)
        at com.rapidminer.Process.run(Process.java:792)
        at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Caused by: java.io.IOException: failure to login
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452)
        at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1494)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1395)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
        at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:542)
        ... 18 more
Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org.apache.hadoop.security.UserGroupInformati
on$HadoopLoginModule
        at javax.security.auth.login.LoginContext.invoke(Unknown Source)
        at javax.security.auth.login.LoginContext.access$000(Unknown Source)
        at javax.security.auth.login.LoginContext$5.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.login.LoginContext.invokeCreatorPriv(Unknown Source)
        at javax.security.auth.login.LoginContext.login(Unknown Source)
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471)

But in fact,the class org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule is inside the extension jar, togheter with other dependencies that runs fine with a public static void main code.


Find bellow my Operator.doWork() code:

public void doWork() throws OperatorException {


...       Configuration config = new Configuration();
        config.set("fs.default.name", "hdfs://" + host + ":"+ hdfsPort);
        config.set("mapred.job.tracker",host+":" + mapredPort);
     
        JobConf job = new JobConf(config);
     
        job.setJarByClass(org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.class);
        job.setJobName("K-Means");

       
       
       
        FileInputFormat.setInputPaths(job, new Path("/user/beckmann/testdata"));
        FileOutputFormat.setOutputPath(job, new Path("b"));

        JobClient.runJob(job);

}


As I could check, this is a clas loader problem. Even if I put the dependencies inside Rapidminer\lib directory, the things go wrong.

Do you have some idea how to fix it?

Thanks in advance,

Answers

  • mbeckmann
    mbeckmann New Altair Community Member
    Hi!

    I figured out this problem is not related to classloading, and in fact this is not a rapidminer problem.

    The problem lies on a reported bug (https://issues.apache.org/jira/browse/HADOOP-7982), that was fixed in hadoop 1.1.2.

    When I moved from hadoop 1.0.4 to 1.1.2, the problem desapeared, and my work is going ahead.

    I'll let you known when everything be done,

    Best regards,
  • mbeckmann
    mbeckmann New Altair Community Member
    Hi there,

    The work with the "Rapidminer Hadoop extension" is going ahead and for sure will be a 100% open source extension, like the other Hadoop related components did before.
    Unfortunatelly not in time for RCOMM 2013.

    Just to let you know and to avoid pitfalls, the Hadoop related components have several security constraints,
    and some class definitions and security contexts must be in the main class loader, not in the plugin classloader, otherwise we'll face strange behaviors
    during plugin execution.

    To workarround this, for a while I put all Hadoop's dependent jar inside the rapidminer.jar (like other components did),
    but will be great to do this in the right way, and to avoid to create a "proprietary" rapiminer.jar.

    Does someone know how to do this?

    Best regards,