Find more posts tagged with
Sort by:
1 - 7 of
71
Thanks, I already imported the files from a database remotely + my local desktop.
I do not need to concatenate Example Sets, I need to concatenate actually text files as in file-concatenate.
I tell you why:
1. We want to associate a series of files to each other e.g. HR documents for one employee
2. We want to perform text classification on the entire series of documents, not just one
3. So I thought a crude setup would be to concatenate all the files for one employee and treat it as a single file
D
I do not need to concatenate Example Sets, I need to concatenate actually text files as in file-concatenate.
I tell you why:
1. We want to associate a series of files to each other e.g. HR documents for one employee
2. We want to perform text classification on the entire series of documents, not just one
3. So I thought a crude setup would be to concatenate all the files for one employee and treat it as a single file
D
Hi,
I quickly created a process to read files, concatenate the contents and write to another file. See if you can adapt this to your needs.
enjoy,
Matthew
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.005">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:create_document" compatibility="5.3.000" expanded="true" height="60" name="Create Document" width="90" x="112" y="75">
<parameter key="text" value="This is to initialize the content of Remember/Recall operators. "/>
</operator>
<operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember (2)" width="90" x="246" y="75">
<parameter key="name" value="doc"/>
<parameter key="io_object" value="Document"/>
</operator>
<operator activated="true" class="loop_files" compatibility="5.3.005" expanded="true" height="60" name="Loop Files" width="90" x="380" y="165">
<parameter key="directory" value="/Users/mdc/Texts"/>
<process expanded="true">
<operator activated="true" class="text:read_document" compatibility="5.3.000" expanded="true" height="60" name="Read Document (2)" width="90" x="112" y="120">
<parameter key="file" value="%{file_path}"/>
</operator>
<operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall" width="90" x="112" y="30">
<parameter key="name" value="doc"/>
<parameter key="io_object" value="Document"/>
</operator>
<operator activated="true" class="text:combine_documents" compatibility="5.3.000" expanded="true" height="94" name="Combine Documents (2)" width="90" x="313" y="120"/>
<operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember" width="90" x="447" y="120">
<parameter key="name" value="doc"/>
<parameter key="io_object" value="Document"/>
</operator>
<connect from_op="Read Document (2)" from_port="output" to_op="Combine Documents (2)" to_port="documents 2"/>
<connect from_op="Recall" from_port="result" to_op="Combine Documents (2)" to_port="documents 1"/>
<connect from_op="Combine Documents (2)" from_port="document" to_op="Remember" to_port="store"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
</process>
</operator>
<operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall (2)" width="90" x="514" y="75">
<parameter key="name" value="doc"/>
<parameter key="io_object" value="Document"/>
</operator>
<operator activated="true" class="text:write_document" compatibility="5.3.000" expanded="true" height="76" name="Write Document" width="90" x="648" y="75">
<parameter key="file" value="/Users/matthewgarong/concatenated_text.txt"/>
</operator>
<connect from_op="Create Document" from_port="output" to_op="Remember (2)" to_port="store"/>
<connect from_op="Recall (2)" from_port="result" to_op="Write Document" to_port="document"/>
<connect from_op="Write Document" from_port="document" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
To make it a separate process - Save and call from your process using 'Execute Process' operator. I have not tried this though.
You can also add this to your process - just copy and paste to your process (at top level or inside a 'Subprocess' operator. Do this in the Process window, not in XML.
Matthew
once you have imported your files via one of the various import operators or one of the import wizards (Files -> Import Data), you can concatenate example sets via the "Append" operator.
Regards,
Marco