"Stream Database operator: metadata ?"

camielcoenen
camielcoenen New Altair Community Member
edited November 5 in Community Q&A
Hi,

I am working with a large dataset (approx. 250,000 rows and 300+ columns) which is loaded in a MySQL database table and would like to use the Stream Database operator to use this dataset in a proces. However, unlike the Read Database operator, the Stream Database operator doesn't output the metadata information, which makes it impossible to use other operators like Select Attributes in the steps following Stream Database.  I am using RapidMiner 5.1 .

Answers

  • Matthias
    Matthias New Altair Community Member
    Hi,

    I think all the Import Data Operators couldn't prepare the meta data informations directly.Because only when you start the process RM can read the meta data informations.
    The easiest way is to save the dataset with the store operator at the repository. And then you have an fast acces to the dataset with the Retrieve operator. And alway the meta data informations.

    Greetings

    Matthias
  • camielcoenen
    camielcoenen New Altair Community Member
    Matthias wrote:

    Hi,

    I think all the Import Data Operators couldn't prepare the meta data informations directly.Because only when you start the process RM can read the meta data informations.
    The easiest way is to save the dataset with the store operator at the repository. And then you have an fast acces to the dataset with the Retrieve operator. And alway the meta data informations.

    Greetings

    Matthias
    Well, the "Read Database" operator does prepare the metadata information, even when a project  has not been started or run yet. The "Stream Database" operator does not prepare the metadata information. So, why this difference ? Yes, I can use the Store operator, but it is basically the same as the "Read Database" operator. The "Stream Database" has the caching features I need.

    Greetings,

    Camiel
  • land
    land New Altair Community Member
    Hi,
    let me formulate it in this way: Do you use the Community Edition?

    Greetings,
      Sebastian
  • camielcoenen
    camielcoenen New Altair Community Member
    Sebastian Land wrote:

    Hi,
    let me formulate it in this way: Do you use the Community Edition?

    Greetings,
      Sebastian
    Yes, I do use the Community Version. Does it make a difference in case of the Stream Database operator ?

    Thanks,

    Camiel
  • land
    land New Altair Community Member
    Hi,
    currently not, but as a community edition user you simply have to wait until someone has idle time to fix it. As an enterprise customer your wishes would have a "little" bit more importance to us. Not to mention that we could hire more guys helping us coding things if you would become enterprise customer.
    Anyway I think that handling of large amounts of data will become an enterprise feature sooner or later. So I won't bet that the improvements of Stream Database will make it into the community edition.

    Greetings,
      Sebastian

  • camielcoenen
    camielcoenen New Altair Community Member
    Thanks,

    Is it a JDBC connection issue that needs to be fixed ? The "Read Database", on the other hand, is working fine.

    Nevertheless, I would like to know how to handle a large dataset in Rapidminer Community Edition, what kind of operators can be used to make the dataset more manageable? Are there tutorials/samples on how to do this ?

    Greetings,

    Camiel
  • land
    land New Altair Community Member
    Hi,
    aggregate it before loading it. Split the data set before loading it. Try to cluster things before by using samples where possible, apply in batches...

    Well, everything depends on your problem. But the basic idea is to use only samples or batches where possible or to compress the data even before loading.

    Greetings,
      Sebastian