🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Get Pages - Error parsing HTTP headers

User: "miguelal"
New Altair Community Member
Updated by Jocelyn
Hi,

When a web server sends a header string that violates the cookie specification, the method getHeaderFields() of the HttpURLConnection class throws an IllegalArgumentException which is not handled by RapidMiner and makes the "Get Pages" operator in the Web Mining extension fail. I have added a try/catch around that code, and it seem to be working now.

This the code I modified in line 164 of the GetWebPageOperator:

try { // El metodo GetHeaderFields falla si hay cookies que no tienen el simbolo = entre el nombre y el valor
for (Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
getLogger().info("Response Header:" + header.getKey() + ": " + header.getValue());
}
} catch(IllegalArgumentException ex) {
getLogger().warning("Failed to get HTTP header fields. Error: " + ex.getMessage());
}
I am posting this in case it helps anyone.

Thanks,
Miguel

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "MariusHelf"
    New Altair Community Member
    Hi Miguel,

    thanks for reporting. Can you give us a link to a page that reproduces this error?
    Best regards,
    Marius
    User: "miguelal"
    New Altair Community Member
    OP
    Hi Marius,

    I am sorry, but unfortunately I forgot to keep the URL that was causing this problem. I process lots of different URLs everyday, and tried looking for the one (since I know the problem happened on Nov 6th) but I couldn't find it. The only thing I have is the screenshot of the error in RapidAnalytics, which I know isn't going to be of much help to you.  :'(

    image

    Thanks,
    Miguel
    User: "MariusHelf"
    New Altair Community Member
    Ok, thanks for searching in any case :)
    I will forward your error description and the piece of code above to the developers. Let's see if it is of use to them.

    Best regards,
    Marius