Hi,
When a web server sends a header string that violates the cookie specification, the method getHeaderFields() of the HttpURLConnection class throws an IllegalArgumentException which is not handled by RapidMiner and makes the "Get Pages" operator in the Web Mining extension fail. I have added a try/catch around that code, and it seem to be working now.
This the code I modified in line 164 of the GetWebPageOperator:
try { // El metodo GetHeaderFields falla si hay cookies que no tienen el simbolo = entre el nombre y el valor
for (Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
getLogger().info("Response Header:" + header.getKey() + ": " + header.getValue());
}
} catch(IllegalArgumentException ex) {
getLogger().warning("Failed to get HTTP header fields. Error: " + ex.getMessage());
}
I am posting this in case it helps anyone.
Thanks,
Miguel