Question on large number of attributes
jngai
New Altair Community Member
I am new to RM
I would like to initiate a project to produce a neural network.
Training data each instance has 10 parameters, each parameter have value from a pool of 500 non-English phrases. There would be thousands of instances with each instance on a line in Excel.
My first thinking is to change these into 500 variables with true/false to show existence of each phrase.
I am not sure this is the correct way of thinking, and I am wondering RM can handle this vast amount of parameters. And does RM support non-English text? I believe it is in Unicode (I am not very familiar with this also).
Appreciate anyone can point me the direction, or answer my concerns.
Thanks in advance
I would like to initiate a project to produce a neural network.
Training data each instance has 10 parameters, each parameter have value from a pool of 500 non-English phrases. There would be thousands of instances with each instance on a line in Excel.
My first thinking is to change these into 500 variables with true/false to show existence of each phrase.
I am not sure this is the correct way of thinking, and I am wondering RM can handle this vast amount of parameters. And does RM support non-English text? I believe it is in Unicode (I am not very familiar with this also).
Appreciate anyone can point me the direction, or answer my concerns.
Thanks in advance
Tagged:
0
Answers
-
Are you sure a neural network is the method you need? For text mining maybe a Naive Bayes or SVM performs better.
> I would like to initiate a project to produce a neural network.
This sound like a good idea. This process shows you how to do that:> Training data each instance has 10 parameters, each parameter have value from a pool of 500 non-English phrases. There would be thousands of instances with each instance on a line in Excel.
My first thinking is to change these into 500 variables with true/false to show existence of each phrase.<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="505" width="949">
<operator activated="true" class="generate_data" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
<parameter key="number_examples" value="1000"/>
</operator>
<operator activated="true" class="discretize_by_frequency" expanded="true" height="94" name="Discretize" width="90" x="179" y="30">
<parameter key="number_of_bins" value="100"/>
</operator>
<operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="380" y="30"/>
<connect from_op="Generate Data" from_port="output" to_op="Discretize" to_port="example set input"/>
<connect from_op="Discretize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
RM does support different encodings. You can set the encoding style wiht the "encoding" parameter in many Read operators.
> I am not sure this is the correct way of thinking, and I am wondering RM can handle this vast amount of parameters. And does RM support non-English text? I believe it is in Unicode (I am not very familiar with this also).
I hope I could help you,
Ciao Sebastian
0