"Log File Source error after upgrade from 4.6 to 5"

Unknown
edited November 5 in Community Q&A
Hello all,

I previously had a process in RM46 that I used to look at Apache web log files; this works fine.
http://dl.dropbox.com/u/5878888/RMFiles/logRM4.xml

I imported this into RM5 and immediately I get the error "Duplicate attribute name: time"

I've made a cut down version of the process and the various files it requires
http://dl.dropbox.com/u/5878888/RMFiles/logRM5.xml
http://dl.dropbox.com/u/5878888/RMFiles/apache.xml
http://dl.dropbox.com/u/5878888/RMFiles/robots.txt
http://dl.dropbox.com/u/5878888/RMFiles/logs/smallLog

Of course, I can use RM46 to make the example sets and continue from there in RM5 but it would be nice to know if there is something else I need to do to make it work in RM5 only.

regards,

Andrew
Tagged:

Answers

  • antoine
    antoine New Altair Community Member
    Hi Andrew,

    Like you I'm actually having the same problem  with the error  "duplicate attribute name : time" on importing my web log files in RM5.

    I wanted to know if you solved your problem, did you succeed in launching your process in RM5 without the error ?
    If yes, could you explain me how did you do, please ?
    If not, I would like to know how do you succeed in running the process on the RM46 version.

    I've downloaded your various files on my hard disk and I've installed RM46. I am not having the error "duplicate attribute name : time" but it says "No results produced".

    What I've done for now :

                *dl your files in the RM46 folder (logRM4.xml, apache.xml, robots.txt)
                *created a folder named "logs" with the file smallLog in it
                *create a new process in RM46
                *copy/paste the content of logRM4.xml in the xml tab in RM46
                *runned the process

    The logs :

    P May 20, 2010 7:34:14 PM: Initialising process setup
    P May 20, 2010 7:34:14 PM: [NOTE] No filename given for result file, using stdout for logging results!
    P May 20, 2010 7:34:14 PM: Checking properties...
    P May 20, 2010 7:34:14 PM: Properties are ok.
    P May 20, 2010 7:34:14 PM: Checking process setup...
    P May 20, 2010 7:34:14 PM: Inner operators are ok.
    P May 20, 2010 7:34:14 PM: Checking i/o classes...
    P May 20, 2010 7:34:14 PM: i/o classes are ok.
    P May 20, 2010 7:34:14 PM: Process ok.
    P May 20, 2010 7:34:14 PM: Process initialised
    P May 20, 2010 7:34:14 PM: [NOTE] Process starts
    P May 20, 2010 7:34:14 PM: Process:
      Root[0] (Process)
    P May 20, 2010 7:34:14 PM: Process:
      Root[1] (Process)
    P May 20, 2010 7:34:14 PM: Produced output:
    IOContainer (0 objects):
    P May 20, 2010 7:34:14 PM: [NOTE] Process finished successfully after 0 s

    ------------------------
    What is weird is that I've not added any operator in my operator tree located on the left in RM46, is it normal ?


  • Hello Antoine,

    The rm46 process works; the rm5 version does not. I can't fix it.

    You may be having troubles with the rm46 process because you need to edit the log file source configuration to point to the files. If that's not the problem then you could always post the xml.

    Regards,

    Andrew
  • antoine
    antoine New Altair Community Member
    Hi,

    thanks for your quick reply. In fact I want to import web logs in order to apply some operators in the webmining extension. In RM46 I didn't find the webmining extension so I think it is a new feature of the RM5. Do you think it is possible for me to import my web logs in the RM46 and then import them in the RM5 ?


    For my previous question : I did actually point the files to the log dir and to the config file named "apache.xml". I have token the same files as yours, I d'ont understand why it is working without results ! I have the same xml files  !

    Could you explain me how you did step by step  ?

      Thank you in anticipation.


  • Hello Antoine,

    I suggest you post your xml. As for using rm46 first and then rm5 second, I was simply going to export the example set to a file and then reimport in rm5.

    Regards,

    Andrew
  • antoine
    antoine New Altair Community Member
    Hi again Andrew,


    Fine !  Here it is my xml :
    <process version="4.6">

    <operator name="Root" class="Process" expanded="yes">
        <parameter key="resultfile" value="/Users/Antoine/Documents/ISEP/Marketing/webmining/rapidminer46/rapidminer-4.6/myresults.res"/>
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>

    <operator name="LogFileSource" class="LogFileSource">
    <parameter key="config_file" value="apache.xml"/>
    <parameter key="log_dir" value="logs"/>
    <parameter key="dns_lookup" value="false"/>
    <parameter key="robot_filter" value="robots.txt"/>
    <parameter key="filetype_filter" value="\.[ico|gif|jpg|jpeg|css|js|GIF|JPG|png|PNG|jpg|xml]"/>
    <parameter key="only_HTTP_200" value="true"/>
    <list key="browser_matcher">
              </list>

    <list key="os_matcher">
    <parameter key="windows" value="(.*)(Windows|windows|WINDOWS)(.*)"/>
    <parameter key="unix" value="(Linux|Unix|unix|UNIX)(.*)"/>
    <parameter key="mac" value="(.*)(Mac|OS X)(.*)"/>
    </list>
    <list key="language_matcher">
              </list>
    <parameter key="session_timeout" value="400000"/>
    </operator>
    </operator>
    </process>
    My apache.xml file is in the RM46 folder. Same for the folder "logs", it is in RM46 folder with the other folders "etc, lib, licences..." .
    I've pasted this code the "xml" tab in RM46.

    So when I run the process it asks me to save the process file.
    Here it is the code :
    <?xml version="1.0" encoding="MacRoman"?>
    <process version="4.6">

      <operator name="Root" class="Process" expanded="yes">
          <parameter key="logverbosity" value="init"/>
          <parameter key="random_seed" value="2001"/>
          <parameter key="send_mail" value="never"/>
          <parameter key="process_duration_for_mail" value="30"/>
          <parameter key="encoding" value="SYSTEM"/>
      </operator>

    </process>

    I think i've forgotten an important step before runnning the process in RM46, but i don't know what !!

    Thanks a lot again to be reactive.
  • land
    land New Altair Community Member
    Hi,
    I'm really excited to hear, that in fact there is any user of this operator. We already argued about dropping it completely. But I think the current solution isn't really more useful for you :)
    If there's a bug, please make a detailed description and add it to our bugtracker. We will solve this issue as soon as possible. If you are enterprise customer with a bug fix guarantee, please use the Online Ticket System to contact us.

    Greetings,
      Sebastian
  • Antoine,

    I edited the process - here it is

    <operator name="Root" class="Process" expanded="yes">
       <parameter key="resultfile" value="/Users/Antoine/Documents/ISEP/Marketing/webmining/rapidminer46/rapidminer-4.6/myresults.res"/>
       <operator name="LogFileSource" class="LogFileSource">
           <parameter key="config_file" value="c:\apache.xml"/>
           <parameter key="log_dir" value="c:\logs"/>
           <parameter key="robot_filter" value="c:\robots.txt"/>
           <parameter key="filetype_filter" value="\.[ico|gif|jpg|jpeg|css|js|GIF|JPG|png|PNG|jpg|xml]"/>
           <parameter key="only_HTTP_200" value="true"/>
           <list key="browser_matcher">
           </list>
           <list key="os_matcher">
             <parameter key="windows" value="(.*)(Windows|windows|WINDOWS)(.*)"/>
             <parameter key="unix" value="(Linux|Unix|unix|UNIX)(.*)"/>
             <parameter key="mac" value="(.*)(Mac|OS X)(.*)"/>
           </list>
           <list key="language_matcher">
           </list>
       </operator>
    </operator>

    It assumes that the files are located on the c:\ drive and that the smalllogs file is located in a folders called c:\logs. You will have to put the files there yourself and if you have a Unix machine you might have to do a bit more jiggery pokery.

    regards,

    Andrew
  • Sebastian,

    Please don't drop this operator :)

    The error is that a process that worked ok in rm46 fails after import into rm5. It's a regression error - anyone who was using this operator before won't be able to with rm5 and so will not be able to migrate to the new version.

    If I have time, I will try and raise a bug.

    regards,

    Andrew
  • antoine
    antoine New Altair Community Member
    Hi Andrew,


    As Sebastian told me, I've raised the bug. But until it has been fixed, I would like to use your solution, however it is still not working!!. You say that you edited the process but except the paths to the robots.txt , the log dir and the apache.xml file, it is exactly the same as the one I posted previously.

    The result is always the same, RM46 tell me that "No results were produced" even with your config file... I don't understand how can I have the example set produced in RM46.

      regards,


    Antoine
  • Antoine,

    I know it sounds obvious but where are the files and folders located on your computer? Ensure the full path is entered in the configuration.

    If you are running on a Unix system then there might be another issue and I won't be able to comment.

    regards

    Andrew
  • antoine
    antoine New Altair Community Member
    Hi again Andrew,



    I am actually on Mac OSX, I've done what you told me to do and I tested it on a Windows computer. It still does the same thing.  For the last time could you explain me step by step (or click by click) what do you do EXACTLY to import your apache log?

    Here are some screenprints illustrating step by step how I try to get things working :

    1. My folder where the files (apache.xml,RM4.xml and robts.txt) with the file smallLog opened.
    image

    Uploaded with ImageShack.us


    2.Now it is the picture when I started RM46 and clicked on new and then RootProcess
    image

    Uploaded with ImageShack.us

    3. I copy/pasted the content of the RM4.xml file with the paths of my differents files and folders :
    image

    Uploaded with ImageShack.us

    4. Then I click on the blue play button, it asks me if I want to save the process, I select no and then appears :
    image

    Uploaded with ImageShack.us

     

    So there are no results !!!

        Thanks for your time,

       
                              Antoine
  • Antoine,

    Mac - that's awkward  :-\

    Nonetheless, looking at your screenshots I suspect you might not have downloaded the right plug in. The way to check is to go to the "New Operator" tab and type "logfilesource" into the box at the bottom of the screen.

    If you see something under Root->IO->Web then you are OK. In fact you could just drag this over to the root process and away you go. If not then you will have to download and install it. It's more manual than the current version but not difficult at all. My helpful tip is to download everything you can find.

    If you wait a few days, you might find that the Rapid-i chaps will fix it in rm5. They don't have to of course since they are walking the cliff edge of encouraging a vibrant user community versus the need to put beer on the table.

    regards

    Andrew
  • antoine
    antoine New Altair Community Member
    Andrew,


    I have good news !! It is working !!!

    I just forgot to add the plugin in RM46. So for the greater good, it is not the "web" or "web mining" plugin... it is the text-plugin which has to be pasted in the folder rapid-miner4.6/lib/plugins/ and then like you said it add the Web tab in ROOT->IO. It was weird because all long RM46 didn't raise any errors for the operator named "logfilesource"...

    So in a word, THANK YOU !

    Now I can import my log in RM46, save my results in an exampleSet (.dat + .aml). I've tried to import in RM5 the exampleSet through the operator "Read AML" and it seems working. It is the good way to import it or is there a better one ?


              THANK YOU AGAIN
  • Hello Antoine

    Glad it works - if amr and dat works for you then I can't suggest anything better.

    regards

    Andrew
  • land
    land New Altair Community Member
    Hi,
    I just wanted to announce, that I could find this bug and fix it. The correct version will be available with the next update.

    Greetings,
      Sebastian