Dear All,
How to make a smooth learning curve?
Which shows the averaged result over many runs?
edit:
The training ratio, the ratio which shall be maximally used for training, doesn't seem to work.
When looking in the results the max fraction is 0.95, and training_ratio was set to 0.2.
When changing training ratio to 0.6, nothing changes!
edit:
I found 07_meta\04_LearningCurve.xml
I modified this xml as following:
<?xml version="1.0" encoding="windows-1252"?>
<process version="4.6">
<operator name="Root" class="Process" expanded="yes">
<description text="This process plots the learning curve, i.e. the performance with respect to the number of examples which is used for learning."/>
<parameter key="logverbosity" value="warning"/>
<parameter key="random_seed" value="2004"/>
<parameter key="send_mail" value="never"/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<operator name="ArffExampleSource" class="ArffExampleSource">
<parameter key="data_file" value="D:\wessel\Desktop\CYT_rest.arff"/>
<parameter key="label_attribute" value="class"/>
<parameter key="datamanagement" value="double_array"/>
<parameter key="decimal_point_character" value="."/>
<parameter key="sample_ratio" value="1.0"/>
<parameter key="sample_size" value="-1"/>
<parameter key="local_random_seed" value="-1"/>
</operator>
<operator name="LearningCurve" class="LearningCurve" expanded="yes">
<parameter key="training_ratio" value="0.5"/>
<parameter key="step_fraction" value="0.01"/>
<parameter key="start_fraction" value="-1.0"/>
<parameter key="sampling_type" value="shuffled sampling"/>
<parameter key="local_random_seed" value="-1"/>
<operator name="W-J48" class="W-J48">
<parameter key="keep_example_set" value="false"/>
<parameter key="U" value="false"/>
<parameter key="C" value="0.25"/>
<parameter key="M" value="2.0"/>
<parameter key="R" value="false"/>
<parameter key="B" value="false"/>
<parameter key="S" value="false"/>
<parameter key="L" value="false"/>
<parameter key="A" value="false"/>
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="false"/>
<list key="application_parameters">
</list>
<parameter key="create_view" value="false"/>
</operator>
<operator name="Performance" class="Performance">
<parameter key="keep_example_set" value="false"/>
<parameter key="use_example_weights" value="true"/>
</operator>
</operator>
<operator name="ProcessLog" class="ProcessLog">
<list key="log">
<parameter key="fraction" value="operator.LearningCurve.value.fraction"/>
<parameter key="performance" value="operator.LearningCurve.value.performance"/>
</list>
<parameter key="sorting_type" value="none"/>
<parameter key="sorting_k" value="100"/>
<parameter key="persistent" value="false"/>
</operator>
</operator>
</operator>
</process>
But this is not giving the results I want.
The learning curve is way to chaotic!
Seems results do not get average over different runs:
http://student.science.uva.nl/~wluijben/learning_curve_in_need_of_smoothing.jpg
Old question:
How can I make a Learning Curve?
Lets say I have a dataset of 100 examples.
I wish to split this data in 10 folds each.
In normal cross-validation, there will be 10 runs: training on 9 folds and testing on 1.
Which result in 1 result average + standard deviation.
Now I wish to do do an extra iteration inside each run:
Which varies the amount of folds used for training.
So this should result in
N result averages +
Nstandard divinations for each amount of folds used.
(Preferably it should output the amount of training data used, not the amount of folds)
Regards,
Wessel