Find more posts tagged with
Sort by:
1 - 19 of
191
Hi Simon,
the Text Processing Extension contains Operators for extracting XPath querries. It's called Generate Extract. If you have stored the contents of a web page in an ExampleSet, you might use this operator to extract the content of a h4 tag as a new attribute. If you take a look at the current version of the Process Documents from Data operator, it allows you to select attributes from where the text should be taken. In this list, you can also assign a weight to each attribute. Combining these two things should suit your needs.
If this does not proof helpful, we could think of implementing some sort of weight applier, that will assing weights on tokens if it fulfills some condition.
Greetings,
Sebastian
the Text Processing Extension contains Operators for extracting XPath querries. It's called Generate Extract. If you have stored the contents of a web page in an ExampleSet, you might use this operator to extract the content of a h4 tag as a new attribute. If you take a look at the current version of the Process Documents from Data operator, it allows you to select attributes from where the text should be taken. In this list, you can also assign a weight to each attribute. Combining these two things should suit your needs.
If this does not proof helpful, we could think of implementing some sort of weight applier, that will assing weights on tokens if it fulfills some condition.
Greetings,
Sebastian
thanks for your help.
but i've got some problems with the "generate extract" operator. more precise, im not getting any results, furthermore im getting empty results :-)
maybe im using it in the wrong way
simon
but i've got some problems with the "generate extract" operator. more precise, im not getting any results, furthermore im getting empty results :-)
maybe im using it in the wrong way
regards,
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<parameter key="parallelize_main_process" value="true"/>
<process expanded="true" height="746" width="1091">
<operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document" width="90" x="313" y="165">
<parameter key="text" value="<html> <title>Hallo Titel</title> <h4>Hallo Überschrift 3</h4> <h3>Hallo Überschrift 3</h3> <p><h4>Ein H4</h4> <span>in einem Paragraph</span></p> </html>"/>
</operator>
<operator activated="true" class="text:process_documents" expanded="true" height="94" name="Process Documents" width="90" x="581" y="75">
<process expanded="true" height="724" width="770">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:generate_extract" expanded="true" height="60" name="Generate Extract" width="90" x="782" y="75">
<parameter key="source_attribute" value="source_ATTR"/>
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="title_html" value="//h:title/text()"/>
</list>
<list key="namespaces"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_op="Generate Extract" to_port="Example Set"/>
<connect from_op="Generate Extract" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
simon
Hi Simon,
the problem with your setup is, that the source attribute does not exists. My problem with that is, that the operator does not complain about this, but instead simply doesn't deliver anything. I changed that behavior...
For getting the text into an attribute, you can uncheck the create_word_vector parameter in the Process Document and instead add Keep_text. Then a new attribute called text will be added containing the text. You can select this for the generate extract operator and then it works as below:
Sebastian
the problem with your setup is, that the source attribute does not exists. My problem with that is, that the operator does not complain about this, but instead simply doesn't deliver anything. I changed that behavior...
For getting the text into an attribute, you can uncheck the create_word_vector parameter in the Process Document and instead add Keep_text. Then a new attribute called text will be added containing the text. You can select this for the generate extract operator and then it works as below:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>Greetings,
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<parameter key="parallelize_main_process" value="true"/>
<process expanded="true" height="746" width="1091">
<operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document" width="90" x="112" y="75">
<parameter key="text" value="<html> <title>Hallo Titel</title> <h4>Hallo Überschrift 3</h4> <h3>Hallo Überschrift 3</h3> <p><h4>Ein H4</h4> <span>in einem Paragraph</span></p> </html>"/>
</operator>
<operator activated="true" class="text:process_documents" expanded="true" height="94" name="Process Documents" width="90" x="246" y="75">
<parameter key="create_word_vector" value="false"/>
<parameter key="keep_text" value="true"/>
<process expanded="true" height="724" width="770">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:generate_extract" expanded="true" height="60" name="Generate Extract" width="90" x="380" y="75">
<parameter key="source_attribute" value="text"/>
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="title_html" value="//h:title/text()"/>
</list>
<list key="namespaces"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_op="Generate Extract" to_port="Example Set"/>
<connect from_op="Generate Extract" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Sebastian
thanks, now it works also for me. but still i got some questions
regards
simon
why im gettin' here just one result and not every href entry seperated by ";"
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<parameter key="logverbosity" value="3"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="1"/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="parallelize_main_process" value="false"/>
<process expanded="true" height="629" width="950">
<operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document" width="90" x="112" y="255">
<parameter key="text" value="<html> 	<a href="1">Details</a> 	<a href="2">Details</a> 	<a href="3">Details</a> 	<a href="4">Details</a> 	<a href="5">Details</a> 	<a href="6">Details</a> 	<a href="7">Details</a> 	<a href="8">Details</a> 	<a href="9">Details</a> 	<a href="0">Details</a> </html> "/>
<parameter key="add label" value="false"/>
<parameter key="label_type" value="0"/>
</operator>
<operator activated="true" class="text:process_documents" expanded="true" height="94" name="Process Documents" width="90" x="447" y="255">
<parameter key="create_word_vector" value="false"/>
<parameter key="vector_creation" value="0"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="true"/>
<parameter key="prune_method" value="0"/>
<parameter key="prunde_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="5.0"/>
<parameter key="prune_above_rank" value="5.0"/>
<parameter key="datamanagement" value="7"/>
<parameter key="parallelize_vector_creation" value="false"/>
<process expanded="true" height="629" width="950">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:generate_extract" expanded="true" height="60" name="Generate Extract" width="90" x="648" y="255">
<parameter key="source_attribute" value="text"/>
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<parameter key="attribute_type" value="Nominal"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="DetailsPage" value="//h:a[text()='Details']/@href"/>
</list>
<list key="namespaces"/>
<parameter key="ignore_CDATA" value="true"/>
<parameter key="assume_html" value="true"/>
<parameter key="value_seperator" value=";"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="example set" to_op="Generate Extract" to_port="Example Set"/>
<connect from_op="Generate Extract" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
regards
simon
Hi Simon,
as the operator documentation tries to say, if a query results in an enumeration of items like for example "en,de,fr", then this values are separated using the given characters. But anyway you have to enter the exact search expression more than once to specify more than one attribute name. Where should the operator store the second value, if you enter only one attribute?
Greetings,
Sebastian
as the operator documentation tries to say, if a query results in an enumeration of items like for example "en,de,fr", then this values are separated using the given characters. But anyway you have to enter the exact search expression more than once to specify more than one attribute name. Where should the operator store the second value, if you enter only one attribute?
Greetings,
Sebastian
hello sebastian,
unfortunatly i dont understand your suggestion. so what i want to achive is following:
having this "html" code
now if i use following xpath expression
to check this you simply can test it at http://www.mizar.dk/XPath/Default.aspx
so my question is now, how i can achive that in rapidminer?
unfortunatly i dont understand your suggestion. so what i want to achive is following:
having this "html" code
i want to extract all the href values (1,2,3,4,5,6,7,8,9,0)
<html>
<a href="1">Details</a>
<a href="2">Details</a>
<a href="3">Details</a>
<a href="4">Details</a>
<a href="5">Details</a>
<a href="6">Details</a>
<a href="7">Details</a>
<a href="8">Details</a>
<a href="9">Details</a>
<a href="0">Details</a>
</html>
now if i use following xpath expression
//a/@hreffrom the xpath point of view i get with this query all the href's.
to check this you simply can test it at http://www.mizar.dk/XPath/Default.aspx
so my question is now, how i can achive that in rapidminer?
Hi Simon,
sorry for the late answer, but I simply didn't find the time to answer questions here in the forum in the meanwhile. Here's a process that will show you how both ways work:
Greetings,
Sebastian
sorry for the late answer, but I simply didn't find the time to answer questions here in the forum in the meanwhile. Here's a process that will show you how both ways work:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>Please keep in mind, that there's the restriction, that each example of an example set must have the same attributes, so creating attributes depending on a the content of a text cannot be done!
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="296" width="480">
<operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document" width="90" x="3" y="45">
<parameter key="text" value="<html> 	<a href="1">Details</a> 	<a href="2">Details</a> 	<a href="3">Details</a> 	<a href="4">Details</a> 	<a href="5">Details</a> 	<a href="6">Details</a> 	<a href="7">Details</a> 	<a href="8">Details</a> 	<a href="9">Details</a> 	<a href="0">Details</a> </html>"/>
</operator>
<operator activated="true" class="text:documents_to_data" expanded="true" height="76" name="Documents to Data" width="90" x="112" y="120">
<parameter key="text_attribute" value="text"/>
</operator>
<operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="246" y="120"/>
<operator activated="true" class="text:process_document_from_data" expanded="true" height="76" name="Process Documents from Data" width="90" x="380" y="210">
<parameter key="create_word_vector" value="false"/>
<list key="specify_weights"/>
<process expanded="true" height="585" width="904">
<operator activated="true" class="text:cut_document" expanded="true" height="60" name="Cut Document" width="90" x="112" y="30">
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="unimportant" value="//a/@href"/>
</list>
<list key="namespaces"/>
<parameter key="assume_html" value="false"/>
<process expanded="true" height="585" width="904">
<operator activated="true" class="text:extract_information" expanded="true" height="60" name="Extract Information" width="90" x="45" y="30">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="hrefNumber" value="(.*)"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
</operator>
<connect from_port="segment" to_op="Extract Information" to_port="document"/>
<connect from_op="Extract Information" from_port="document" to_port="document 1"/>
<portSpacing port="source_segment" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_port="document" to_op="Cut Document" to_port="document"/>
<connect from_op="Cut Document" from_port="documents" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:generate_extract" expanded="true" height="60" name="Generate Extract" width="90" x="380" y="75">
<parameter key="source_attribute" value="text"/>
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="AttributeName1" value="//a[1]"/>
<parameter key="AttributeName2" value="//a[2]"/>
</list>
<list key="namespaces"/>
<parameter key="assume_html" value="false"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Documents to Data" to_port="documents 1"/>
<connect from_op="Documents to Data" from_port="example set" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Generate Extract" to_port="Example Set"/>
<connect from_op="Multiply" from_port="output 2" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_port="result 2"/>
<connect from_op="Generate Extract" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Greetings,
Sebastian
Hi,
take a look at the Data to Weights operator. With this you can convert an example set to a weight vector. You could create an example set having this weights for example with the logging funtionality and finally turn the log into a ExampleSet by using the log to data operator.
Greetings,
Sebastian
take a look at the Data to Weights operator. With this you can convert an example set to a weight vector. You could create an example set having this weights for example with the logging funtionality and finally turn the log into a ExampleSet by using the log to data operator.
Greetings,
Sebastian
Hey Sebastian,
thank you for your answer, but i dont get it.
So i have a process like this:
these extracted features result in the following example set
simon
thank you for your answer, but i dont get it.
So i have a process like this:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>in this process i extracted some features from a html document(for simplicity in this process generated by the "Create Document" operator).
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="546" width="1016">
<operator activated="true" class="text:create_document" expanded="true" height="60" name="Create Document" width="90" x="45" y="165">
<parameter key="text" value="<html> 	<head><title>Der Titel ist sehr toll</title></head> 	<a href="http://f12010.info">formel1</a> 	 <a href="http://dsds-2009.info">und einen dritten link</a> 	<a href="http://simonknoll.com">semmel</a> 	<title>Wir Haben auch einen zweitet Titel</title> </html>"/>
<parameter key="label_type" value="numeric"/>
</operator>
<operator activated="true" class="multiply" expanded="true" height="94" name="Multiply" width="90" x="179" y="165"/>
<operator activated="true" class="text:process_documents" expanded="true" height="94" name="Process Documents (2)" width="90" x="313" y="255">
<parameter key="create_word_vector" value="false"/>
<process expanded="true">
<operator activated="true" class="text:cut_document" expanded="true" height="60" name="Cut Document (2)" width="90" x="394" y="30">
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="html_linktext" value="//h:a/text()"/>
</list>
<list key="namespaces"/>
<process expanded="true">
<operator activated="true" class="text:extract_information" expanded="true" height="60" name="Extract Information (2)" width="90" x="394" y="30">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="use_it" value="(.*)"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
</operator>
<connect from_port="segment" to_op="Extract Information (2)" to_port="document"/>
<connect from_op="Extract Information (2)" from_port="document" to_port="document 1"/>
<portSpacing port="source_segment" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_port="document" to_op="Cut Document (2)" to_port="document"/>
<connect from_op="Cut Document (2)" from_port="documents" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:process_documents" expanded="true" height="94" name="Process Documents" width="90" x="313" y="30">
<parameter key="create_word_vector" value="false"/>
<process expanded="true">
<operator activated="true" class="text:cut_document" expanded="true" height="60" name="Cut Document" width="90" x="246" y="165">
<parameter key="query_type" value="XPath"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries"/>
<list key="regular_region_queries"/>
<list key="xpath_queries">
<parameter key="html_title" value="//h:title/text()"/>
</list>
<list key="namespaces"/>
<process expanded="true">
<operator activated="true" class="text:extract_information" expanded="true" height="60" name="Extract Information" width="90" x="246" y="30">
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="use_it" value="(.*)"/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
</operator>
<connect from_port="segment" to_op="Extract Information" to_port="document"/>
<connect from_op="Extract Information" from_port="document" to_port="document 1"/>
<portSpacing port="source_segment" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_port="document" to_op="Cut Document" to_port="document"/>
<connect from_op="Cut Document" from_port="documents" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID" width="90" x="447" y="30"/>
<operator activated="true" class="generate_id" expanded="true" height="76" name="Generate ID (2)" width="90" x="447" y="255"/>
<operator activated="true" class="union" expanded="true" height="76" name="Union" width="90" x="581" y="120"/>
<connect from_op="Create Document" from_port="output" to_op="Multiply" to_port="input"/>
<connect from_op="Multiply" from_port="output 1" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Multiply" from_port="output 2" to_op="Process Documents (2)" to_port="documents 1"/>
<connect from_op="Process Documents (2)" from_port="example set" to_op="Generate ID (2)" to_port="example set input"/>
<connect from_op="Process Documents" from_port="example set" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Union" to_port="example set 1"/>
<connect from_op="Generate ID (2)" from_port="example set output" to_op="Union" to_port="example set 2"/>
<connect from_op="Union" from_port="union" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
these extracted features result in the following example set
now my question. how i can add weighting for the different features that i extracted (e.g weight html_title with 2 and html_linktext with 1) wich then maybe could result in such a example set(or how ever a weightng looks like, i added a weight column just to get the point):
Row No. id query_key use_it
-----------------------------------------------------------------
1 1.0 html_title Der Titel ist sehr toll
2 2.0 html_title Wir Haben auch einen zweitet Titel
3 1.0 html_linktext formel1
4 2.0 html_linktext und einen dritten link
5 3.0 html_linktext semmel
thanks in advance
Row No. id query_key use_it weight
---------------------------------------------------------------------------------
1 1.0 html_title Der Titel ist sehr toll 2
2 2.0 html_title Wir Haben auch einen zweitet Titel 2
3 1.0 html_linktext formel1 1
4 2.0 html_linktext und einen dritten link 1
5 3.0 html_linktext semmel 1
simon
Hi Simon,
if this weight should only depend on the query_key this is no problem. Simply use the [tt]Generate Attributes[/tt] operator and use [tt]if(query_key="html_title",2,1)[/tt] as expression. Of course, you can nest the [tt]if(...,...,...)[/tt] expressions as you would like to.
Kind regards,
Tobias
if this weight should only depend on the query_key this is no problem. Simply use the [tt]Generate Attributes[/tt] operator and use [tt]if(query_key="html_title",2,1)[/tt] as expression. Of course, you can nest the [tt]if(...,...,...)[/tt] expressions as you would like to.
Kind regards,
Tobias
Thank you for your advice.
my question is now, how can i feed a k-means algorithm with this data, if i want to cluster the documents regarding the extracted features. if im just giving the resulting exampleset as input, it clusters every single example for its own. but i want to cluster the documents and not the extractions.
any advice?
best regards
simon
my question is now, how can i feed a k-means algorithm with this data, if i want to cluster the documents regarding the extracted features. if im just giving the resulting exampleset as input, it clusters every single example for its own. but i want to cluster the documents and not the extractions.
any advice?
best regards
simon
maybe i post a screenshot of an example set
here i have an exampleset with several examples describing 2 different objects.
now if i want to apply a clustering algorithm on this, and i want to cluster these 2 objects (in reality there are obviously more than just 2 objects) and not every single example, how i have to do?

best regards
simon knoll
here i have an exampleset with several examples describing 2 different objects.
now if i want to apply a clustering algorithm on this, and i want to cluster these 2 objects (in reality there are obviously more than just 2 objects) and not every single example, how i have to do?

best regards
simon knoll
I included it today into the new TextProcessing Extension of RapidMiner 5. The current Plugin does not support this, so you might wait until we release RapidMiner 5...
Greetings,
Sebastian