Export wordlist into Database

guitarslinger
guitarslinger New Altair Community Member
edited November 2024 in Community Q&A
Hi,

I am trying to export a wordlist into a database table or a csv.
How can I do this?

The standard operators only accept examplesets as inputs.
Can i convert a wordlist into an example set?

Thx in advance,
Martin
Tagged:

Answers

  • colo
    colo New Altair Community Member
    Hi Martin,

    the operator "WordList to Data" should help you with that. ;)

    Greetings,
    Matthias
  • guitarslinger
    guitarslinger New Altair Community Member
    Oh, thanks...  :D

    Thank god there are no stupid questions... :)
  • alejandro_mauro
    alejandro_mauro New Altair Community Member
    Hi!!

    I am new to RapidMiner and I am trying to do the same, and I have found a problem when exporting to CSV using first the WordList to Data.

    I have words with a "Total Occurrences" superior to 100, and when exporting it to the CSV I only get those under 100

    Example in my wordlist I have
    Word      Total Occurrence
    Rx            327
    Dg            100
    Viene        96

    When exporting to CSV, I don't get "Dg" for example that has 100 ocurrences, I only get from "Viene" to under...

    I don't get why the CSV is using the total occurrences column as %, and not showing data greater than 100.

    Does anyone has an idea on how to solve this?
  • colo
    colo New Altair Community Member
    Hi,

    I don't experience this problem. The following process is working fine for me:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
       <process expanded="true" height="607" width="787">
         <operator activated="true" class="web:get_webpage" compatibility="5.0.3" expanded="true" height="60" name="Get Page" width="90" x="45" y="30">
           <parameter key="url" value="http://www.microsoft.com/en/us/default.aspx"/>
           <parameter key="random_user_agent" value="true"/>
           <list key="query_parameters"/>
         </operator>
         <operator activated="true" class="text:process_documents" compatibility="5.0.6" expanded="true" height="94" name="Process Documents" width="90" x="179" y="30">
           <process expanded="true" height="607" width="787">
             <operator activated="true" class="text:tokenize" compatibility="5.0.6" expanded="true" height="60" name="Tokenize" width="90" x="45" y="30"/>
             <connect from_port="document" to_op="Tokenize" to_port="document"/>
             <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
             <portSpacing port="source_document" spacing="0"/>
             <portSpacing port="sink_document 1" spacing="0"/>
             <portSpacing port="sink_document 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="text:wordlist_to_data" compatibility="5.0.6" expanded="true" height="76" name="WordList to Data" width="90" x="313" y="75"/>
         <operator activated="true" class="write_csv" compatibility="5.0.8" expanded="true" height="60" name="Write CSV" width="90" x="447" y="75">
           <parameter key="csv_file" value="C:\test.csv"/>
         </operator>
         <connect from_op="Get Page" from_port="output" to_op="Process Documents" to_port="documents 1"/>
         <connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
         <connect from_op="Process Documents" from_port="word list" to_op="WordList to Data" to_port="word list"/>
         <connect from_op="WordList to Data" from_port="example set" to_op="Write CSV" to_port="input"/>
         <connect from_op="Write CSV" from_port="through" to_port="result 2"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>
    I am using the latest version available through subversion, maybe there are some relevant fixes included which the official version doesn't include yet. Then you could perhaps try to convert the attributes containing the word count to a nominal value ("Numerical to Polynominal" operator) and hope that no conversion to a percentage value takes place.

    Regards,
    Matthias
  • land
    land New Altair Community Member
    Hi,
    does this only occur in the written csv file or already in the exampleSet? Set a breakpoint to find out what is in the example set.

    Greetings,
      Sebastian
  • up201708850
    up201708850 New Altair Community Member
    Add one  "Process Documents from Data"  between "WordList to Data" and  "Write Database"