A program to recognize and reward our most engaged community members
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="number_of_attributes" value="100"/> <parameter key="target_function" value="sum classification"/> </operator> <operator name="IdTagging" class="IdTagging"> </operator> <operator name="IOMultiplier" class="IOMultiplier"> <parameter key="io_object" value="ExampleSet"/> </operator> <operator name="CorpusBasedWeightingForPositive" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="positive"/> </operator> <operator name="AttributeWeightSelectionForPositive" class="AttributeWeightSelection"> <parameter key="weight_relation" value="top k"/> </operator> <operator name="IOSelector" class="IOSelector"> <parameter key="io_object" value="ExampleSet"/> <parameter key="select_which" value="2"/> </operator> <operator name="CorpusBasedWeightingForNegative" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="negative"/> </operator> <operator name="AttributeWeightSelectionForNegative" class="AttributeWeightSelection"> <parameter key="weight_relation" value="top k"/> </operator> <operator name="ExampleSetJoin" class="ExampleSetJoin"> </operator></operator>
<operator name="Feature Selection" class="OperatorChain" expanded="yes"> <operator name="IOMultiplier" class="IOMultiplier"> <parameter key="io_object" value="ExampleSet"/> <parameter key="number_of_copies" value="2"/> </operator> <operator name="CorpusBasedWeighting (prof)" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="purely professional"/> </operator> <operator name="AttributeWeightSelection (prof)" class="AttributeWeightSelection"> <parameter key="weight_relation" value="top k"/> </operator> <operator name="IOSelector" class="IOSelector"> <parameter key="io_object" value="ExampleSet"/> <parameter key="select_which" value="2"/> </operator> <operator name="CorpusBasedWeighting (pers)" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="purely personal"/> </operator> <operator name="AttributeWeightSelection (pers)" class="AttributeWeightSelection"> <parameter key="weight_relation" value="top k"/> </operator> <operator name="ExampleSetJoin" class="ExampleSetJoin"> </operator> <operator name="IOSelector (2)" class="IOSelector"> <parameter key="io_object" value="ExampleSet"/> <parameter key="select_which" value="3"/> </operator> <operator name="CorpusBasedWeighting (pers/prof)" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="personal, but in professional context"/> </operator> <operator name="AttributeWeightSelection (pers/prof)" class="AttributeWeightSelection"> <parameter key="weight_relation" value="top k"/> </operator> <operator name="ExampleSetJoin (2)" class="ExampleSetJoin"> </operator> </operator>
Unfortunately, the "IOSelector" operator outputs an IOObject, despite the fact that I've chosen ExampleSet as io_object, whereas operator "CorpusBasedWeighting (pers)" expects an ExampleSet. Do you have an idea where the problem is?
Due to the three classes that I have, I need to join three example sets. Is it right, that ExampleSetJoin is only capable of joining two ExampleSets. If so, is my "workaround" above correct?
As you pointed out the Genetic Feature Selection as well as the Brute Force both have an option to set the number of attributes selected. But unfortunately these features always result in a memory overflow. The dataset I am proccessing is probably too big.
Regarding your tip with the maximal fitness: what is the fitness criteria? is it a squared correlation coefficient (that would be over 0.8 for example) or can i find the fitness somewhere in the log?
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="number_examples" value="200"/> <parameter key="target_function" value="sum classification"/> </operator> <operator name="NoiseGenerator" class="NoiseGenerator"> <parameter key="label_noise" value="0.0"/> <list key="noise"> </list> <parameter key="random_attributes" value="5"/> </operator> <operator name="GeneticAlgorithm" class="GeneticAlgorithm" expanded="yes"> <parameter key="maximal_fitness" value="0.95"/> <parameter key="maximum_number_of_generations" value="50"/> <parameter key="population_size" value="2"/> <operator name="XValidation" class="XValidation" expanded="yes"> <parameter key="sampling_type" value="shuffled sampling"/> <operator name="JMySVMLearner" class="JMySVMLearner"> </operator> <operator name="OperatorChain" class="OperatorChain" expanded="yes"> <operator name="ModelApplier" class="ModelApplier"> <list key="application_parameters"> </list> </operator> <operator name="ClassificationPerformance" class="ClassificationPerformance"> <parameter key="accuracy" value="true"/> <list key="class_weights"> </list> <parameter key="classification_error" value="true"/> <parameter key="main_criterion" value="accuracy"/> <parameter key="spearman_rho" value="true"/> </operator> </operator> </operator> </operator></operator>
Last one: Isn't there a feature that lists the best correlating features (with the cummulative squared corr.), like it is done in a forward-stepping regression?
<operator name="Root" class="Process" expanded="yes"> <operator name="ExampleSetGenerator" class="ExampleSetGenerator"> <parameter key="number_examples" value="200"/> <parameter key="target_function" value="sum classification"/> </operator> <operator name="NoiseGenerator" class="NoiseGenerator"> <parameter key="label_noise" value="0.0"/> <list key="noise"> </list> <parameter key="random_attributes" value="5"/> </operator> <operator name="FeatureSelection" class="FeatureSelection" expanded="yes"> <operator name="CFSFeatureSetEvaluator" class="CFSFeatureSetEvaluator"> </operator> </operator></operator>
One final comment on the software for all those that read through the posts here. Although I ask questions here it is probably the most comprehensive software package on this topic I have seen so far.
<operator name="Root" class="Process" expanded="yes"> <operator name="DatabaseExampleSource" class="DatabaseExampleSource"> <parameter key="database_url" value="jdbc:mysql://localhost:3306/mail"/> <parameter key="id_attribute" value="id"/> <parameter key="label_attribute" value="label"/> <parameter key="query" value="SELECT * FROM `temp`"/> <parameter key="username" value="root"/> </operator> <operator name="IOMultiplier" class="IOMultiplier"> <parameter key="io_object" value="ExampleSet"/> <parameter key="number_of_copies" value="2"/> </operator> <operator name="Pers/Prof Attributes" class="OperatorChain" expanded="no"> <operator name="StringTextInput (4)" class="StringTextInput" expanded="no"> <parameter key="filter_nominal_attributes" value="true"/> <list key="namespaces"> </list> <operator name="StringTokenizer (4)" class="StringTokenizer"> </operator> </operator> <operator name="CorpusBasedWeightingForPersProf" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="personal, but in professional context"/> </operator> <operator name="AttributeWeightSelectionForPersProf" class="AttributeWeightSelection"> <parameter key="k" value="2"/> <parameter key="weight_relation" value="top k"/> </operator> </operator> <operator name="IOSelector" class="IOSelector"> <parameter key="io_object" value="ExampleSet"/> <parameter key="select_which" value="2"/> </operator> <operator name="Professional Attributes" class="OperatorChain" expanded="no"> <operator name="StringTextInput (3)" class="StringTextInput" expanded="no"> <parameter key="filter_nominal_attributes" value="true"/> <list key="namespaces"> </list> <operator name="StringTokenizer (3)" class="StringTokenizer"> </operator> </operator> <operator name="CorpusBasedWeightingForProfessional" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="purely professional"/> </operator> <operator name="AttributeWeightSelectionForProfessional" class="AttributeWeightSelection"> <parameter key="k" value="2"/> <parameter key="weight_relation" value="top k"/> </operator> </operator> <operator name="ExampleSetJoin" class="ExampleSetJoin"> </operator> <operator name="IOSelector 2" class="IOSelector"> <parameter key="io_object" value="ExampleSet"/> <parameter key="select_which" value="2"/> </operator> <operator name="Personal Attributes" class="OperatorChain" expanded="no"> <operator name="StringTextInput" class="StringTextInput" expanded="no"> <parameter key="filter_nominal_attributes" value="true"/> <list key="namespaces"> </list> <operator name="StringTokenizer" class="StringTokenizer"> </operator> </operator> <operator name="CorpusBasedWeightingForPersonal" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="purely personal"/> </operator> <operator name="AttributeWeightSelectionForPersonal" class="AttributeWeightSelection"> <parameter key="k" value="2"/> <parameter key="weight_relation" value="top k"/> </operator> </operator> <operator name="ExampleSetJoin (3)" class="ExampleSetJoin"> </operator> <operator name="ExampleVisualizer" class="ExampleVisualizer" breakpoints="after"> </operator></operator>
<operator name="Root" class="Process" expanded="yes"> <operator name="CSVExampleSource" class="CSVExampleSource"> <parameter key="filename" value="C:\Dokumente und Einstellungen\Mierswa\Eigene Dateien\rm_workspace\keywords.txt"/> <parameter key="id_name" value="ID"/> <parameter key="label_name" value="LABEL"/> </operator> <operator name="StringTextInput (4)" class="StringTextInput" expanded="no"> <parameter key="filter_nominal_attributes" value="true"/> <list key="namespaces"> </list> <parameter key="remove_original_attributes" value="true"/> <operator name="StringTokenizer (4)" class="StringTokenizer"> </operator> </operator> <operator name="IOMultiplier" class="IOMultiplier"> <parameter key="io_object" value="ExampleSet"/> <parameter key="number_of_copies" value="2"/> </operator> <operator name="Pers/Prof Attributes" class="OperatorChain" expanded="no"> <operator name="CorpusBasedWeightingForPersProf" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="personal, but in professional context"/> </operator> <operator name="AttributeWeightSelectionForPersProf" class="AttributeWeightSelection"> <parameter key="k" value="2"/> <parameter key="weight_relation" value="bottom k"/> </operator> </operator> <operator name="IOSelector" class="IOSelector"> <parameter key="io_object" value="ExampleSet"/> <parameter key="select_which" value="2"/> </operator> <operator name="Professional Attributes" class="OperatorChain" expanded="no"> <operator name="CorpusBasedWeightingForProfessional" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="purely professional"/> </operator> <operator name="AttributeWeightSelectionForProfessional" class="AttributeWeightSelection"> <parameter key="k" value="2"/> <parameter key="weight_relation" value="bottom k"/> </operator> </operator> <operator name="ExampleSetJoin" class="ExampleSetJoin"> </operator> <operator name="IOSelector 2" class="IOSelector"> <parameter key="io_object" value="ExampleSet"/> <parameter key="select_which" value="2"/> </operator> <operator name="Personal Attributes" class="OperatorChain" expanded="no"> <operator name="CorpusBasedWeightingForPersonal" class="CorpusBasedWeighting"> <parameter key="class_to_characterize" value="purely personal"/> </operator> <operator name="AttributeWeightSelectionForPersonal" class="AttributeWeightSelection"> <parameter key="k" value="2"/> <parameter key="weight_relation" value="bottom k"/> </operator> </operator> <operator name="ExampleSetJoin (3)" class="ExampleSetJoin"> </operator> <operator name="ExampleVisualizer" class="ExampleVisualizer"> </operator></operator>
<operator name="Root" class="Process" expanded="yes"> <operator name="WeightCreationData" class="ExampleSetGenerator"> <parameter key="target_function" value="sum classification"/> </operator> <operator name="ExampleSet2AttributeWeights" class="ExampleSet2AttributeWeights"> </operator> <operator name="AttributeWeightsWriter" class="AttributeWeightsWriter"> <parameter key="attribute_weights_file" value="C:\home\ingo\rm_workspace\selection_weights.wgt"/> </operator> <operator name="DataConsumer" class="IOConsumer"> <parameter key="io_object" value="ExampleSet"/> </operator> <operator name="WeightsConsumer" class="IOConsumer" breakpoints="after"> <parameter key="io_object" value="AttributeWeights"/> </operator> <operator name="AttributeWeightsLoader" class="AttributeWeightsLoader"> <parameter key="attribute_weights_file" value="C:\home\ingo\rm_workspace\selection_weights.wgt"/> </operator> <operator name="WeightApplicationData" class="ExampleSetGenerator" breakpoints="after"> <parameter key="number_of_attributes" value="10"/> <parameter key="target_function" value="sum classification"/> </operator> <operator name="AttributeWeightSelection" class="AttributeWeightSelection"> </operator></operator>