Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
How to set DTD parameter in FeatureExtraction (rapidminer UI)
skarab
because I keep getting IOException thrown from FeatureExtraction:
Server returned HTTP response code: 503 for URL:
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Regards,
skarab
Find more posts tagged with
AI Studio
Accepted answers
All comments
land
Hi,
I'm sorry, but what exactly are you doing? It would be the easiest to post the process and do a little explanation. And for motivating all other users to answer your questions, it could be a smart move to add something like "hello" in front of your message...
Greetings,
Sebastian
skarab
I parse html page and here is code:
<operator name="FeatureExtraction" class="FeatureExtraction" breakpoints="before,within,after">
<list key="texts">
<parameter key="tmp_file" value="%{parent_path}\tmp%{file_name}\%{file_name}"/>
</list>
<parameter key="default_content_type" value="html"/>
<parameter key="default_content_encoding" value="UTF-8"/>
<parameter key="default_content_language" value="pl"/>
<parameter key="use_content_attributes" value="true"/>
<parameter key="id_attribute_type" value="long"/>
<list key="attributes">
<parameter key="html" value="/h:html"/>
</list>
<list key="namespaces">
<!-- I tried to set it in namespaces -->
<parameter key="html" value="C:\\workspace-rapidminer\xhtml1-transitional.dtd"/>
</list>
</operator>
land
Hi,
I don't think, the namespace is either needed, nor is it correctly defined. So the easiest solution would be to erase this parameter...
Anyway it is only used for XPath requests for more complicated XML objects...I have never had to use them for HTML...
Greetings,
Sebastian
skarab
Hi,
Defining namespace does not matter in my case, I still get this exception... I am using Java 1.6.0.16 on VISTA.
Regards
Skarab
skarab
Hi,
I solved the problem...
First I removed
<!DOCTYPE html PUBLIC [^>]*> using TextCleaner.
After that I attached a path to local dtd:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >
using SingleTextObjectInput:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >%{loop_value}
Here is my brute force solution (I get a html page as a TextObject):
<operator name="TextCleaner" class="TextCleaner">
<parameter key="deletion_regex" value="<!DOCTYPE html PUBLIC [^>]*>"/>
</operator>
<operator name="TextObject2ExampleSet" class="TextObject2ExampleSet">
<parameter key="keep_text_object" value="true"/>
<parameter key="text_attribute" value="my_doc_text"/>
<parameter key="label_attribute" value="my_doc_label"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="my_doc_text"/>
<operator name="SingleTextObjectInput" class="SingleTextObjectInput">
<parameter key="text" value="<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >%{loop_value}"/>
</operator>
</operator>
Regards,
Wojtek
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups