nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

Dears I need help I have log file as text file Contains about 500 line I need to count the numbers

Ahmedte1234

The lines on file as

1 Jan 10:00 the chassis normal status

1 Jan 10:30 log I'd lost

1 Jan 12:30 interface down

1 Jan 1:00 power off system

2 Jan 11:00 the high temperature

2 Jan 2:00 the user log in successfully

And alot of statements like that so some statements useful and some statements no.

So the output like that

Down appear 10 times

Power off appear 1 time

Interface down 3 times

And I need the algorithm to suggest the most words and how many appear in file. And also how to reduce with certain pattern.

Find more posts tagged with

AI Studio

Text Mining + NLP

Accepted answers

varunm1

You can go through the tutorial of FP - Growth in RapidMiner. Type FP in search of rapidminer operators, then drag and drop the operator in your process, if you click on the operator you can see tutorials in the help window. You can see below screenshot. You can also see tutorial on academy here https://academy.rapidminer.com/learn/video/text-association-rules

All comments

varunm1

Hello @Ahmedte1234

First, install " text processing" and "Web mining" extensions from marketplace in rapidminer. To count the repetition of words in your document, you first need to read your text file into RapidMiner. Then you can use the below XML code (click on show) to extract details about your data attach your text file instead of the one in this XML. To use this XML, you first need to copy the XML code from here and then open a new blank process in rapidminer, you need to enable XML window by going to VIEW --> Show Panel --> XML in menu bar of RapidMiner. Copy the code from here and paste it in XML window of rapidminer new process, then click the green tick mark which will show you the process as seen in below figure. Once you get this delete the retrieve files and attach your file imported into rapidminer. I also attached the result of the process based on some data you provided. The term occurances is giving you the number of times the word is repeated in your file. There are multiple community samples as well to understand how TF-IDF works

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve files" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Local Repository/RapidMIner/files"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="9.2.001" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="136">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="447" y="136">
<parameter key="create_word_vector" value="true"/>
<parameter key="vector_creation" value="TF-IDF"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="false"/>
<parameter key="prune_method" value="none"/>
<parameter key="prune_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.95"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="select_attributes_and_weights" value="false"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="85">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve files" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
<connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>

Hope this helps. Please inform if you are looking for a different thing.

Ahmedte1234

good but in my research I need to use algorithm like apriori or FPgrowth algorithm

varunm1