Find more posts tagged with
Sort by:
1 - 23 of
231


Accepted Answer
Updated by Marco_Barradas
Ilyas,
Just adjust the filter by regex string on the parameters or remove it if you only have he file you need at that folder.
The error that is shows is telling me that there are no files with the .docx (Word Document) on your folder. If your files are .txt (Text Files) just change the .docx to a .txt
If you need further help please type and @ and my name and I'll receive an e-mail alert with the latest update on your post.
Just adjust the filter by regex string on the parameters or remove it if you only have he file you need at that folder.
The error that is shows is telling me that there are no files with the .docx (Word Document) on your folder. If your files are .txt (Text Files) just change the .docx to a .txt
If you need further help please type and @ and my name and I'll receive an e-mail alert with the latest update on your post.
@MarcoBarradas,
Thank you again for the direction. I can't get the Loop Files operator to see txt files. Could you please help?
In summary, I still cannot make the process run. I have 10 separate txt files (for the 10 interviews I conducted). I also have a combined single txt file for all the interviews. Which one is best to use; individual txt files or a single large file?




Thank you again for the direction. I can't get the Loop Files operator to see txt files. Could you please help?
In summary, I still cannot make the process run. I have 10 separate txt files (for the 10 interviews I conducted). I also have a combined single txt file for all the interviews. Which one is best to use; individual txt files or a single large file?

Hi @Ilyas ,
rmp files are nothing else then XML files. So you posted the right thing. You can open the XML panel to make the export easier, but thats a minor thing.
Attached is an updated process. You had no operator within the Loop files which then actually reads the files. I added Read Document for txt files.
Best,
Martin
Sort by:
1 - 23 of
231
Ilyas,
You basis setup would be a Loop Files (To grab all your documents) with a Read Document inside of it.
The (?i).*docx tells the operator to only use the files that have a docx extension


You basis setup would be a Loop Files (To grab all your documents) with a Read Document inside of it.
The (?i).*docx tells the operator to only use the files that have a docx extension

<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="concurrency:loop_files" compatibility="9.9.002" expanded="true" height="82" name="Loop Folder With Files" width="90" x="179" y="85"> <parameter key="filter_type" value="regex"/> <parameter key="filter_by_regex" value="(?i).*docx"/> <parameter key="recursive" value="false"/> <parameter key="enable_macros" value="false"/> <parameter key="macro_for_file_name" value="file_name"/> <parameter key="macro_for_file_type" value="file_type"/> <parameter key="macro_for_folder_name" value="folder_name"/> <parameter key="reuse_results" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="text:read_document" compatibility="9.3.001" expanded="true" height="68" name="Read Document" width="90" x="246" y="34"> <parameter key="extract_text_only" value="true"/> <parameter key="use_file_extension_as_type" value="true"/> <parameter key="content_type" value="txt"/> <parameter key="encoding" value="SYSTEM"/> </operator> <connect from_port="file object" to_op="Read Document" to_port="file"/> <connect from_op="Read Document" from_port="output" to_port="output 1"/> <portSpacing port="source_file object" spacing="0"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <operator activated="true" class="operator_toolbox:lda" compatibility="2.11.000" expanded="true" height="124" name="Extract Topics from Documents (LDA)" width="90" x="380" y="85"> <parameter key="number_of_topics" value="10"/> <parameter key="show_optimization_settings" value="false"/> <parameter key="use_alpha_heuristics" value="true"/> <parameter key="alpha_sum" value="0.1"/> <parameter key="use_beta_heuristics" value="true"/> <parameter key="beta" value="0.01"/> <parameter key="optimize_hyperparameters" value="true"/> <parameter key="optimize_interval_for_hyperparameters" value="10"/> <parameter key="iterations" value="1000"/> <parameter key="top_words_per_topic" value="5"/> <parameter key="stopword language" value="english"/> <parameter key="reproducible" value="false"/> <parameter key="enable_logging" value="false"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="include_meta_data" value="true"/> </operator> <connect from_op="Loop Folder With Files" from_port="output 1" to_op="Extract Topics from Documents (LDA)" to_port="col"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> </process> </operator> </process>


Accepted Answer
Updated by Marco_Barradas
Ilyas,
Just adjust the filter by regex string on the parameters or remove it if you only have he file you need at that folder.
The error that is shows is telling me that there are no files with the .docx (Word Document) on your folder. If your files are .txt (Text Files) just change the .docx to a .txt
If you need further help please type and @ and my name and I'll receive an e-mail alert with the latest update on your post.
Just adjust the filter by regex string on the parameters or remove it if you only have he file you need at that folder.
The error that is shows is telling me that there are no files with the .docx (Word Document) on your folder. If your files are .txt (Text Files) just change the .docx to a .txt
If you need further help please type and @ and my name and I'll receive an e-mail alert with the latest update on your post.
@MarcoBarradas,
Thank you again for the direction. I can't get the Loop Files operator to see txt files. Could you please help?
In summary, I still cannot make the process run. I have 10 separate txt files (for the 10 interviews I conducted). I also have a combined single txt file for all the interviews. Which one is best to use; individual txt files or a single large file?




Thank you again for the direction. I can't get the Loop Files operator to see txt files. Could you please help?
In summary, I still cannot make the process run. I have 10 separate txt files (for the 10 interviews I conducted). I also have a combined single txt file for all the interviews. Which one is best to use; individual txt files or a single large file?

Hi @Ilyas ,
rmp files are nothing else then XML files. So you posted the right thing. You can open the XML panel to make the export easier, but thats a minor thing.
Attached is an updated process. You had no operator within the Loop files which then actually reads the files. I added Read Document for txt files.
Best,
Martin
@mschmitz Thank you Martin.
I am finally making progress.
My first file in the Local Repository (All Transcripts Combined) includes everything. That is all I need to use. So can I do the below?

I am finally making progress.
My first file in the Local Repository (All Transcripts Combined) includes everything. That is all I need to use. So can I do the below?

You basis setup would be a Loop Files (To grab all your documents) with a Read Document inside of it.
The (?i).*docx tells the operator to only use the files that have a docx extension