"Web mining Rapidminer robot_filter"

antoine
antoine New Altair Community Member
edited November 2024 in Community Q&A
Hello all,


I don't if it is the right place to post my request. I need to know how you ( a Rapid Miner user who uses it as a web miningusage tool)- when you're importing your web log file- do to set your robot_filter file.

  It works when I type in my robot_filter file just [g|G]oogle for example. However I don't really want to do so for a thousand different bots...

So I've tried to find a list which I can paste in my file. On this website http://www.robotstxt.org/db/all.txt   they offer the possibility to download the robots list in a .txt format .
But apparently RapidMiner doesn't like it, i got many errors due to bad characters and wrong enclosure...

  So what do I have to do in order to have a proper robots list which can be read by rapidminer ?


Thank you in advance,


          Antoine

Answers

  • land
    land New Altair Community Member
    Hi Antoine,
    what does RapidMiner complain about in detail? Unfortunately I'm not too familiar with the web mining operators, but I assume the file must consists of regular expressions? Then you would need to escape special characters of regular expressions, you will find some advice on this on google.

    Greetings,
      Sebastian

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.