How to set DTD parameter in FeatureExtraction (rapidminer UI)
because I keep getting IOException thrown from FeatureExtraction:
Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Regards,
skarab
Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
Regards,
skarab
Find more posts tagged with
Sort by:
1 - 5 of
51
I parse html page and here is code:
<operator name="FeatureExtraction" class="FeatureExtraction" breakpoints="before,within,after">
<list key="texts">
<parameter key="tmp_file" value="%{parent_path}\tmp%{file_name}\%{file_name}"/>
</list>
<parameter key="default_content_type" value="html"/>
<parameter key="default_content_encoding" value="UTF-8"/>
<parameter key="default_content_language" value="pl"/>
<parameter key="use_content_attributes" value="true"/>
<parameter key="id_attribute_type" value="long"/>
<list key="attributes">
<parameter key="html" value="/h:html"/>
</list>
<list key="namespaces">
<!-- I tried to set it in namespaces -->
<parameter key="html" value="C:\\workspace-rapidminer\xhtml1-transitional.dtd"/>
</list>
</operator>
<operator name="FeatureExtraction" class="FeatureExtraction" breakpoints="before,within,after">
<list key="texts">
<parameter key="tmp_file" value="%{parent_path}\tmp%{file_name}\%{file_name}"/>
</list>
<parameter key="default_content_type" value="html"/>
<parameter key="default_content_encoding" value="UTF-8"/>
<parameter key="default_content_language" value="pl"/>
<parameter key="use_content_attributes" value="true"/>
<parameter key="id_attribute_type" value="long"/>
<list key="attributes">
<parameter key="html" value="/h:html"/>
</list>
<list key="namespaces">
<!-- I tried to set it in namespaces -->
<parameter key="html" value="C:\\workspace-rapidminer\xhtml1-transitional.dtd"/>
</list>
</operator>
Hi,
I solved the problem...
First I removed
<!DOCTYPE html PUBLIC [^>]*> using TextCleaner.
After that I attached a path to local dtd:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >
using SingleTextObjectInput:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >%{loop_value}
Here is my brute force solution (I get a html page as a TextObject):
<operator name="TextCleaner" class="TextCleaner">
<parameter key="deletion_regex" value="<!DOCTYPE html PUBLIC [^>]*>"/>
</operator>
<operator name="TextObject2ExampleSet" class="TextObject2ExampleSet">
<parameter key="keep_text_object" value="true"/>
<parameter key="text_attribute" value="my_doc_text"/>
<parameter key="label_attribute" value="my_doc_label"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="my_doc_text"/>
<operator name="SingleTextObjectInput" class="SingleTextObjectInput">
<parameter key="text" value="<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >%{loop_value}"/>
</operator>
</operator>
Regards,
Wojtek
I solved the problem...
First I removed
<!DOCTYPE html PUBLIC [^>]*> using TextCleaner.
After that I attached a path to local dtd:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >
using SingleTextObjectInput:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >%{loop_value}
Here is my brute force solution (I get a html page as a TextObject):
<operator name="TextCleaner" class="TextCleaner">
<parameter key="deletion_regex" value="<!DOCTYPE html PUBLIC [^>]*>"/>
</operator>
<operator name="TextObject2ExampleSet" class="TextObject2ExampleSet">
<parameter key="keep_text_object" value="true"/>
<parameter key="text_attribute" value="my_doc_text"/>
<parameter key="label_attribute" value="my_doc_label"/>
</operator>
<operator name="ValueIterator" class="ValueIterator" expanded="yes">
<parameter key="attribute" value="my_doc_text"/>
<operator name="SingleTextObjectInput" class="SingleTextObjectInput">
<parameter key="text" value="<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "C:\workspace-rapidminer\xhtml1-transitional.dtd" >%{loop_value}"/>
</operator>
</operator>
Regards,
Wojtek
I'm sorry, but what exactly are you doing? It would be the easiest to post the process and do a little explanation. And for motivating all other users to answer your questions, it could be a smart move to add something like "hello" in front of your message...
Greetings,
Sebastian