"Text mining"

nabilsalhi
nabilsalhi New Altair Community Member
edited November 5 in Community Q&A
Hi all

I want to modify the Text plugin to support arabic text. I started by converting the text plugin sources  to jar file and then copy it to plugin folder in rapidminer. I got a strange result. In rapidminer i can not see all the text operators. I see only 1 text operator and 3 web operators. I used the eclipse as ide environment and i used the netbeans. The two ide enviroments gives the same strange result.
any help please
Thank you
Nabil Salhi

Answers

  • steffen
    steffen New Altair Community Member
    Hello

    This error description is rather general ... you need to be more specific. However...
    • The IDE should not matter
    • I assume that you read and understood the tutorial.pdf file (also available from RapidMiner sourceforge page): Make sure that the file operators.xml and META-INF have all required properties.
    • If none of the above is helping, please start RapidMiner using the *bat or related linux command (instead of via the desktop icon) to see detailed error messages why the loading of certain operators has failed. Post this message here ... so that it is easier for us to find the source of the problem.

    kind regards,

    Steffen
  • nabilsalhi
    nabilsalhi New Altair Community Member
    Dear steffan

    I think that there is nothing wrong with operators.xml and META-INF
    Here is the output when I lunch RapidMiner using bat file

    Thank you
    Best regards

    Nabil

    Using local jre: C:\Program Files\Rapid-I\RapidMiner\jre\bin\java.exe...
    Starting RapidMiner from 'C:\Program Files\Rapid-I\RapidMiner' using classes from 'C:\Program Files\Rapid-I\RapidMiner\lib\rapidminer.jar'...
    G Apr 14, 2009 7:05:37 PM: rapidminer.home is 'C:\Program Files\Rapid-I\RapidMiner'.
    G Apr 14, 2009 7:05:37 PM: Loading operators from 'C:\Program Files\Rapid-I\RapidMiner\lib\plugins\text-plugin.jar'.
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'TextInput': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'StringTextInput': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'TextObjectTextInput': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'SingleTextInput': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'FeatureExtraction': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolLogger
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'MashUp': com.rapidminer.operator.OperatorCreationException: Operator cannot be constructed: 'MashUp(com.rapidminer.operator.MashupOperator)': org/jaxen/JaxenException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'LogFileSource': java.lang.NoClassDefFoundError: org/jdom/JDOMException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'Segmenter': com.rapidminer.operator.OperatorCreationException: Operator cannot be constructed: 'Segmenter(com.rapidminer.operator.DocumentSegmenterOperator)': org/jaxen/JaxenException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'TokenReplace': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'StringTokenizer': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'NGramTokenizer': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'TermNGramGenerator': com.rapidminer.operator.OperatorCreationException: Operator cannot be constructed: 'TermNGramGenerator(com.rapidminer.operator.tokenizer.TermNGramGenerator)': edu/udo/cs/wvtool/main/WVTTokenSequence
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'DictionaryStemmer': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'GermanStemmer': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'LovinsStemmer': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'PorterStemmer': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'SnowballStemmer': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'ToLowerCaseConverter': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'EnglishStopwordFilter': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'GermanStopwordFilter': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'StopwordFilterFile': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: [Error] Cannot register 'TokenLengthFilter': java.lang.NoClassDefFoundError: edu/udo/cs/wvtool/util/WVToolException
    G Apr 14, 2009 7:05:38 PM: ----------------------------------------------------
    G Apr 14, 2009 7:05:38 PM: Initialization Settings
    G Apr 14, 2009 7:05:38 PM: ----------------------------------------------------
    G Apr 14, 2009 7:05:38 PM: Default system encoding for IO: windows-1256
    G Apr 14, 2009 7:05:38 PM: Load core operators...
    G Apr 14, 2009 7:05:38 PM: Load Weka operators: true
    G Apr 14, 2009 7:05:38 PM: Load JDBC drivers from lib directory: true
    G Apr 14, 2009 7:05:38 PM: Load JDBC drivers from classpath: false
    G Apr 14, 2009 7:05:38 PM: Load plugins: true
    G Apr 14, 2009 7:05:38 PM: Load plugins from 'C:\Program Files\Rapid-I\RapidMiner\lib\plugins'
    G Apr 14, 2009 7:05:38 PM: ----------------------------------------------------
    G Apr 14, 2009 7:05:38 PM: Read rcfile 'C:\Program Files\Rapid-I\RapidMiner\etc\rapidminerrc'.
    G Apr 14, 2009 7:05:38 PM: Trying rcfile 'C:\Program Files\Rapid-I\RapidMiner\etc\rapidminerrc.Windows XP'...skipped
    G Apr 14, 2009 7:05:38 PM: Trying rcfile 'C:\Documents and Settings\ajial\.rapidminer\4_4_0_rapidminerrc'...skipped
    G Apr 14, 2009 7:05:38 PM: Read rcfile 'C:\Documents and Settings\ajial\.rapidminer\4_4_0_rapidminerrc.Windows XP'.
    G Apr 14, 2009 7:05:38 PM: Trying rcfile 'C:\Program Files\Rapid-I\RapidMiner\scripts\rapidminerrc'...skipped
    G Apr 14, 2009 7:05:38 PM: Trying rcfile 'C:\Program Files\Rapid-I\RapidMiner\scripts\rapidminerrc.Windows XP'...skipped
    G Apr 14, 2009 7:05:38 PM: Trying rapidminer.rcfile. Property not specified...skipped
    G Apr 14, 2009 7:05:38 PM: Loading operators from 'operators.xml'.
    G Apr 14, 2009 7:05:42 PM: Loading JDBC driver information from 'etc:jdbc_properties.xml'.
  • steffen
    steffen New Altair Community Member
    Good morning (from europe ;))

    As far as I see, the loading of the operators fails because the loading of the required libraries has failed. If you take a look in the zipped source file, you will find a directory called "lib" where all required libraries are stored. Since Java does not support jars in jars, you have to put the unzipped jars in the final jar-file (I suggest to compare the source file with the textplugin*.jar - file)

    as an additional note, see here: http://rapid-i.com/rapidforum/index.php/topic,576.0.html

    I did not try it out myself yet, but since the required ant build file is included in the textplugin source file, this should not be a big deal.

    hope this was helpful

    kind regards,

    Steffen
  • Legacy User
    Legacy User New Altair Community Member
    When I debug, I am getting the following error message, please any one help.


    Error in: TextInput (TextInput) wvtool caused an error: edu.udo.cs.wvtool.util.WVToolException: Could not extract main text from file C:\Program Files\Rapid-I\RapidMiner\lib\plugins\data\A001J\A001J.txt An external program or library has reported an error. Please see the documentation of this program or library for further information.



    Thanks,

    With Regards,

    Thirumal Valavan
  • p_thiru2002
    p_thiru2002 New Altair Community Member
    Hi All

    How can I use HTML document with search function in rapidminer.

    Please help me.


    For example in IE 8 we have search function, if we key in any word then it shows that word with bold and it gives how many times it appear in that document (Total Occurrences).



    With Regards,

    Palani Thirumal Valavan
  • mksaad
    mksaad New Altair Community Member
    Hello Nabil,

    Why should you modify RM code to support Arabic text. I performed text mining on Arabic with out any problem. I just modified encoding property for RM system and for TextInput operator.

    Do you want to add Arabic Stemmer?


    Best Regards,
    --
    Motaz K. Saad