[Solved] Problem with Filter Tokens (by Region)
Kallust
New Altair Community Member
Hi,
I'm currently working on my bachelor thesis in which I examine how companies report about Research and Development in their annual reports.
Unfortunately I'm completly unexperienced with rapidminer.
My current approach is as follows:
1. Load annual reports in rapid miner (I have them as .txt files)
2. Tokenize the reports into sentences, using .:?! as charakters
3.Use the Filter Tokens (by Region) to extract the sentence before and after a sentence containing a keyword (tokens before = 1 and tokens after = 1).
4. Save in an .xlsx file for further editing.
1. and 2. work as they should my problem is with 3.
I don't geht any results out of the Region filter. It just seems to do nothing. It also forces me to insert both a string and a regular expression, the differnce isn't clear to me (don't know if this is important). .I think this guy had a similar problem http://rapid-i.com/rapidforum/index.php/topic,6021.0.html
I used an online XML validator and it didn't find an error.
Hopefully you could help me with my problem.
my code looks as follows:
I'm currently working on my bachelor thesis in which I examine how companies report about Research and Development in their annual reports.
Unfortunately I'm completly unexperienced with rapidminer.
My current approach is as follows:
1. Load annual reports in rapid miner (I have them as .txt files)
2. Tokenize the reports into sentences, using .:?! as charakters
3.Use the Filter Tokens (by Region) to extract the sentence before and after a sentence containing a keyword (tokens before = 1 and tokens after = 1).
4. Save in an .xlsx file for further editing.
1. and 2. work as they should my problem is with 3.
I don't geht any results out of the Region filter. It just seems to do nothing. It also forces me to insert both a string and a regular expression, the differnce isn't clear to me (don't know if this is important). .I think this guy had a similar problem http://rapid-i.com/rapidforum/index.php/topic,6021.0.html
I used an online XML validator and it didn't find an error.
Hopefully you could help me with my problem.
my code looks as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.015">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="120">
<list key="text_directories">
<parameter key="input" value="C:\Users\Administrator\Documents\Studium\Bachelorarbeit\RapidMiner\Test\Quelle"/>
</list>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="5.3.002" expanded="true" height="60" name="Tokenize" width="90" x="179" y="120">
<parameter key="mode" value="specify characters"/>
<parameter key="characters" value=".:?!"/>
</operator>
<operator activated="true" class="text:filter_tokens_by_regions" compatibility="5.3.002" expanded="true" height="60" name="Filter Tokens (by Region)" width="90" x="380" y="120">
<parameter key="string" value="Programm"/>
<parameter key="regular_expression" value="Programm"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by Region)" to_port="document"/>
<connect from_op="Filter Tokens (by Region)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="write_excel" compatibility="5.3.015" expanded="true" height="76" name="Write Excel" width="90" x="380" y="165">
<parameter key="excel_file" value="C:\Users\Administrator\Documents\Studium\Bachelorarbeit\RapidMiner\Test\Ergebnis.xlsx"/>
<parameter key="file_format" value="xlsx"/>
<parameter key="sheet_name" value="RapidMiner Data0"/>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_op="Write Excel" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
[ /code]
Tagged:
0
Answers
-
I figured out the solution myself, problem was that I searched for "equal" and "contains" at the same time. With sentences as token I could never find a single word because it didn't equal the whole sentence.0