[SOLVED] Process for concatenating files ?

dara
dara New Altair Community Member
edited November 5 in Community Q&A
Is there any process for concatenating files?
Tagged:

Answers

  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    once you have imported your files via one of the various import operators or one of the import wizards (Files -> Import Data), you can concatenate example sets via the "Append" operator.

    Regards,
    Marco
  • dara
    dara New Altair Community Member
    Thanks, I already imported the files from a database remotely + my local desktop.

    I do not need  to concatenate Example Sets, I need to concatenate actually text files as in file-concatenate.

    I tell you why:

    1. We want to associate a series of files to each other e.g. HR documents for one employee
    2. We want to perform text classification on the entire series of documents, not just one
    3. So I thought a crude setup would be to concatenate all the files for one employee and treat it as a single file

    D
  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    to read multiple files for text classification, you can use the "Process Documents from Files" operator. There should be plenty of help available in the forums because text mining questions are pretty common ;)

    Regards,
    Marco
  • mdc
    mdc New Altair Community Member

    Hi,

    I quickly created a process to read files, concatenate the contents and write to another file. See if you can adapt this to your needs.

    enjoy,
    Matthew


    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="text:create_document" compatibility="5.3.000" expanded="true" height="60" name="Create Document" width="90" x="112" y="75">
            <parameter key="text" value="This is to initialize the content of Remember/Recall operators.&#10;&#10;"/>
          </operator>
          <operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember (2)" width="90" x="246" y="75">
            <parameter key="name" value="doc"/>
            <parameter key="io_object" value="Document"/>
          </operator>
          <operator activated="true" class="loop_files" compatibility="5.3.005" expanded="true" height="60" name="Loop Files" width="90" x="380" y="165">
            <parameter key="directory" value="/Users/mdc/Texts"/>
            <process expanded="true">
              <operator activated="true" class="text:read_document" compatibility="5.3.000" expanded="true" height="60" name="Read Document (2)" width="90" x="112" y="120">
                <parameter key="file" value="%{file_path}"/>
              </operator>
              <operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall" width="90" x="112" y="30">
                <parameter key="name" value="doc"/>
                <parameter key="io_object" value="Document"/>
              </operator>
              <operator activated="true" class="text:combine_documents" compatibility="5.3.000" expanded="true" height="94" name="Combine Documents (2)" width="90" x="313" y="120"/>
              <operator activated="true" class="remember" compatibility="5.3.005" expanded="true" height="60" name="Remember" width="90" x="447" y="120">
                <parameter key="name" value="doc"/>
                <parameter key="io_object" value="Document"/>
              </operator>
              <connect from_op="Read Document (2)" from_port="output" to_op="Combine Documents (2)" to_port="documents 2"/>
              <connect from_op="Recall" from_port="result" to_op="Combine Documents (2)" to_port="documents 1"/>
              <connect from_op="Combine Documents (2)" from_port="document" to_op="Remember" to_port="store"/>
              <portSpacing port="source_file object" spacing="0"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="recall" compatibility="5.3.005" expanded="true" height="60" name="Recall (2)" width="90" x="514" y="75">
            <parameter key="name" value="doc"/>
            <parameter key="io_object" value="Document"/>
          </operator>
          <operator activated="true" class="text:write_document" compatibility="5.3.000" expanded="true" height="76" name="Write Document" width="90" x="648" y="75">
            <parameter key="file" value="/Users/matthewgarong/concatenated_text.txt"/>
          </operator>
          <connect from_op="Create Document" from_port="output" to_op="Remember (2)" to_port="store"/>
          <connect from_op="Recall (2)" from_port="result" to_op="Write Document" to_port="document"/>
          <connect from_op="Write Document" from_port="document" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

  • dara
    dara New Altair Community Member
    Thanx Matthew

    It works! Impressed.

    Could you kindly tell me how to make this a separate process by itself, like an IO box to use in other processes? I am not sure how to do this in general i.e. making my own processes from others

    Dara
  • mdc
    mdc New Altair Community Member


    To make it a separate process - Save and call from  your process using 'Execute Process' operator. I have not tried this though.
    You can also add this to your process - just copy and paste to your process (at top level or inside a 'Subprocess' operator. Do  this in the Process window, not in XML.

    Matthew
  • dara
    dara New Altair Community Member
    Thanx mdc

    Got it to work, really appreciate everyone's help
    D