Extracting the data from a file. [SOLVED]

JEdward
JEdward New Altair Community Member
edited November 5 in Community Q&A
Hello,

I'm wanting to extract the data from a file so I can then use it as an example set.  
Basically to go:
Open File -> {extract file data as blob} -> Use blob data as example.

So far the closest I can find to do this is using the script operator, but how would I refer to the output of the 'Open File' operator within a Groovy script?  

Thanks,
John.
Tagged:

Answers

  • Marco_Boeck
    Marco_Boeck New Altair Community Member
    Hi,

    you can see how it could be done in the following process:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.013">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="open_file" compatibility="5.3.013" expanded="true" height="60" name="Open File" width="90" x="45" y="30">
            <parameter key="filename" value="C:\Users\xyz\Test.txt"/>
          </operator>
          <operator activated="true" class="execute_script" compatibility="5.3.013" expanded="true" height="76" name="Execute Script" width="90" x="179" y="30">
            <parameter key="script" value="import javax.swing.JOptionPane;&#10;&#10;import com.rapidminer.operator.nio.file.SimpleFileObject;&#10;import com.rapidminer.operator.nio.file.RepositoryBlobObject&#10;&#10;SimpleFileObject fileObject = input[0];&#10;// fileObject.getFile() returns the File which can be used to read it&#10;System.out.println(fileObject.getFile());&#10;&#10;// for Blobs use this:&#10;//RepositoryBlobObject blob = input[0];&#10;&#10;return fileObject;"/>
          </operator>
          <connect from_op="Open File" from_port="file" to_op="Execute Script" to_port="input 1"/>
          <connect from_op="Execute Script" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Regards,
    Marco
  • MariusHelf
    MariusHelf New Altair Community Member
    You may get the same result with Read Document -> Document to Data without any scripting.
    Not sure though how the encoding deals with binary files.

    Best  regards,
    Marius
  • JEdward
    JEdward New Altair Community Member
    Thanks for this Marco, that's perfect!

    I actually fudged it using a macro and reading the file within the Groovy script with the line:
    f = new File("%{file_path}")

    It works, but I much prefer your version as it should be more flexible overall & might offer a speed improvement as it's executing java directly. 
    I'll give a test both ways and compare. 

    Best,
    John.