[SOLVED] Process for concatenating files ?
dara
New Altair Community Member
Answers
-
Hi,
once you have imported your files via one of the various import operators or one of the import wizards (Files -> Import Data), you can concatenate example sets via the "Append" operator.
Regards,
Marco0 -
Thanks, I already imported the files from a database remotely + my local desktop.
I do not need to concatenate Example Sets, I need to concatenate actually text files as in file-concatenate.
I tell you why:
1. We want to associate a series of files to each other e.g. HR documents for one employee
2. We want to perform text classification on the entire series of documents, not just one
3. So I thought a crude setup would be to concatenate all the files for one employee and treat it as a single file
D0 -
Hi,
to read multiple files for text classification, you can use the "Process Documents from Files" operator. There should be plenty of help available in the forums because text mining questions are pretty common
Regards,
Marco0 -
Hi,
I quickly created a process to read files, concatenate the contents and write to another file. See if you can adapt this to your needs.
enjoy,
Matthew
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="5.3.000" expanded="true" height="60" name="Create Document" width="90" x="112" y="75">
<parameter key="text" value="This is to initialize the content of Remember/Recall operators. "/>
</operator>
<operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember (2)" width="90" x="246" y="75">
<parameter key="name" value="doc"/>
<parameter key="io_object" value="Document"/>
</operator>
<operator activated="true" class="loop_files" compatibility="5.3.005" expanded="true" height="60" name="Loop Files" width="90" x="380" y="165">
<parameter key="directory" value="/Users/mdc/Texts"/>
<process expanded="true">
<operator activated="true" class="text:read_document" compatibility="5.3.000" expanded="true" height="60" name="Read Document (2)" width="90" x="112" y="120">
<parameter key="file" value="%{file_path}"/>
</operator>
<operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall" width="90" x="112" y="30">
<parameter key="name" value="doc"/>
<parameter key="io_object" value="Document"/>
</operator>
<operator activated="true" class="text:combine_documents" compatibility="5.3.000" expanded="true" height="94" name="Combine Documents (2)" width="90" x="313" y="120"/>
<operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember" width="90" x="447" y="120">
<parameter key="name" value="doc"/>
<parameter key="io_object" value="Document"/>
</operator>
<connect from_op="Read Document (2)" from_port="output" to_op="Combine Documents (2)" to_port="documents 2"/>
<connect from_op="Recall" from_port="result" to_op="Combine Documents (2)" to_port="documents 1"/>
<connect from_op="Combine Documents (2)" from_port="document" to_op="Remember" to_port="store"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall (2)" width="90" x="514" y="75">
<parameter key="name" value="doc"/>
<parameter key="io_object" value="Document"/>
</operator>
<operator activated="true" class="text:write_document" compatibility="5.3.000" expanded="true" height="76" name="Write Document" width="90" x="648" y="75">
<parameter key="file" value="/Users/matthewgarong/concatenated_text.txt"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Remember (2)" to_port="store"/>
<connect from_op="Recall (2)" from_port="result" to_op="Write Document" to_port="document"/>
<connect from_op="Write Document" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0 -
Thanx Matthew
It works! Impressed.
Could you kindly tell me how to make this a separate process by itself, like an IO box to use in other processes? I am not sure how to do this in general i.e. making my own processes from others
Dara0 -
To make it a separate process - Save and call from your process using 'Execute Process' operator. I have not tried this though.
You can also add this to your process - just copy and paste to your process (at top level or inside a 'Subprocess' operator. Do this in the Process window, not in XML.
Matthew0 -
Thanx mdc
Got it to work, really appreciate everyone's help
D0