🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"Read CSV to example set"

User: "Monaco"
New Altair Community Member
Updated by Jocelyn
Hi,

Just beginning RapidMiner experiment & having trouble with "Read CSV" operator.
I can output the data to res  (and see the ExampleSet), but when other operators require an example set in input, no data is available. Is this a limitation of Read CSV or is there a way to make the data available as an example set ?
Regards.

Find more posts tagged with

Sort by:
1 - 12 of 121
    User: "haddock"
    New Altair Community Member
    HI, and welcome!

    Start Rapidminer and go Help->Tutorial, that will load runnable examples, so you have some idea of what RM can and cannot do. Believe me, it saves time in the long run!

    User: "colo"
    New Altair Community Member
    Hi Monaco,

    if your operator provides an example set to the results port of the process, it will do the same for other operators. Did you check the connection from the output port of "Read CSV" to the input port of the following operator? Perhaps you might want to post your process (code from XML tab) here to reveal possible mistakes in process design.

    Regards
    Matthias
    User: "Monaco"
    New Altair Community Member
    OP
    Hi Colo,

    Many thanks for your quick reply.
    Here is the code (nothing fancy). Doesn't work with CSV Reader but works well with Read Excel or Retrieve.
    When you are modifying the file that has been stored as a Data Table in the repository, do you know how to automaticaly update this Data Table ?

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.1.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
        <process expanded="true" height="426" width="673">
          <operator activated="true" class="read_csv" compatibility="5.1.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="120">
            <parameter key="csv_file" value="D:\Data.csv"/>
            <parameter key="date_format" value="yyyyMMdd"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <parameter key="locale" value="French"/>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Date.true.date.id"/>
              <parameter key="1" value="Data.true.integer.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="series:windowing" compatibility="5.1.002" expanded="true" height="76" name="Windowing" width="90" x="179" y="30">
            <parameter key="horizon" value="1"/>
            <parameter key="window_size" value="1"/>
            <parameter key="create_label" value="true"/>
            <parameter key="label_attribute" value="Data"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Windowing" to_port="example set input"/>
          <connect from_op="Windowing" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    User: "Monaco"
    New Altair Community Member
    OP
    haddock wrote:

    HI, and welcome!

    Start Rapidminer and go Help->Tutorial, that will load runnable examples, so you have some idea of what RM can and cannot do. Believe me, it saves time in the long run!


    Hi Haddock,

    Thank you for your insight. I've studied this tutorial last week and effectively the ressource is amazingly powerful and educative. But I haven't found an answer to my current problem. I've posted the code, but I don't think it will help. You can try for yourself with a very simple csv file, when you drag the mouse cursor over the operator output, it indicates "number of examples=-1".
    Regards
    User: "IngoRM"
    New Altair Community Member
    Aehem, only a quick question: Did you actually have executed the process (i.e. pressed the "Play" icon in the toolbar?). Does it work then?

    Cheers,
    Ingo
    User: "Monaco"
    New Altair Community Member
    OP
    Hi Ingo,

    When I execute the process, I works fine to display the data (even if number of example set=-1). But when I add a windowing operator, which requires a number of example set superior to the horizon (set to 1), it fails.
    Cheers
    User: "IngoRM"
    New Altair Community Member
    Ok, then try the following:

    1. Load the data with "Read CSV", add an operator "Store" and save the data set directly again in your repository.
    2. Drag the freshly saved data from your repository (it will be transformed into a new operator named "Retrieve" which will load the data for you from the repository)

    Try again with this data set loaded with "Retrieve". Expected behaviour: Everything works like expected. Reason for your confusion: Search in the forum for "Repository" and "meta data". Best solution for you: Book a training at Rapid-I - it definitely will help  :D
    This would probably also the best option if you do not know what I mean with "Repository"  ;D

    Cheers,
    Ingo

    P.S. (for the more experienced readers here...): I never did expect that this - definitely very unique and innovative - feature of RapidMiner called "meta data propagation" would cause so much uncertainty for some users. I am open for all suggestions how we could make the difference more clear between "meta data" and "actual data" and why it is sometimes impossible to provide meta data (like for CSV files...)
    User: "colo"
    New Altair Community Member
    Hi Monaco,

    just to be sure... you didn't use the "Window Document" operator after "Read CSV", did you? Which operators did you try?
    I hoped you would post your process with this second operator to reveal possible problems ;)

    Regards
    Matthias
    User: "Monaco"
    New Altair Community Member
    OP
    Hey Ingo,

    Just read your post at http://rapid-i.com/rapidforum/index.php/topic,2902.msg11559.html#msg11559
    Frequent update of my csv files is why I don't use the repository (unless there is a way to easily and automatically update it).
    I don't understand why the same data can be output when in xls and can't in csv format. Fortunately I have found alternative ways to properly deal with this issue, but I would have prefered (it's not crucial) to output directly fron Read CSV.
    Many thanks for your support.

    Best regards.
    User: "dragoljub"
    New Altair Community Member
    Read CSV should pass the data correctly assuming you have set all the attributes types & special attributes correctly . Most times read CSV just produces the raw data, you still need to set things like labels, special attributes etc. Also maybe your values are not read in as reals or integers and imported as some wrong data type like polynomial. This can cause all types of problems. It might just be easier to run an import process right before you run your analysis, to make sure your data is perfect.

    -Gagi
    User: "SKOM"
    New Altair Community Member
    I've just run into a similar problem, with "Read CSV" output number of examples = -1, and one of subsequent nodes not working. Since apparently it's a feature and not a bug  :P , shouldn't the operator description include something like "recommended use with Store and Retrieve modes"?

    Best,
    PK
    User: "MariusHelf"
    New Altair Community Member
    SKOM wrote:

    I've just run into a similar problem, with "Read CSV" output number of examples = -1, and one of subsequent nodes not working. Since apparently it's a feature and not a bug  :P , shouldn't the operator description include something like "recommended use with Store and Retrieve modes"?

    Best,
    PK
    Good idea, we should probably promote the complete repository-based approach better to our users and explain why it is often easier to use than file-based approaches.

    Best regards,
    Marius