Get Pages - Error parsing HTTP headers

miguelal
miguelal New Altair Community Member
edited November 5 in Community Q&A
Hi,

When a web server sends a header string that violates the cookie specification, the method getHeaderFields() of the HttpURLConnection class throws an IllegalArgumentException which is not handled by RapidMiner and makes the "Get Pages" operator in the Web Mining extension fail. I have added a try/catch around that code, and it seem to be working now.

This the code I modified in line 164 of the GetWebPageOperator:

try { // El metodo GetHeaderFields falla si hay cookies que no tienen el simbolo = entre el nombre y el valor
for (Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
getLogger().info("Response Header:" + header.getKey() + ": " + header.getValue());
}
} catch(IllegalArgumentException ex) {
getLogger().warning("Failed to get HTTP header fields. Error: " + ex.getMessage());
}
I am posting this in case it helps anyone.

Thanks,
Miguel
Tagged:

Answers

  • MariusHelf
    MariusHelf New Altair Community Member
    Hi Miguel,

    thanks for reporting. Can you give us a link to a page that reproduces this error?
    Best regards,
    Marius
  • miguelal
    miguelal New Altair Community Member
    Hi Marius,

    I am sorry, but unfortunately I forgot to keep the URL that was causing this problem. I process lots of different URLs everyday, and tried looking for the one (since I know the problem happened on Nov 6th) but I couldn't find it. The only thing I have is the screenshot of the error in RapidAnalytics, which I know isn't going to be of much help to you.  :'(

    image

    Thanks,
    Miguel
  • MariusHelf
    MariusHelf New Altair Community Member
    Ok, thanks for searching in any case :)
    I will forward your error description and the piece of code above to the developers. Let's see if it is of use to them.

    Best regards,
    Marius