"Where to check different Neural-Net behaviour between RM and Weka NN"

michaelhecht
michaelhecht New Altair Community Member
edited November 5 in Community Q&A
Hello,

even if there might exist a workaround for the neural net problems (see other topic ;-) ) I have another problem
with RM-NN remaining.

---------------------
By the way: NN is (in my opinion) the fastest way to check whether there are relations in data without applying
a too sophisticated learning workflow or the danger of overfitting data. Therefore I like to apply NN for a first
approach :-) (answer for the other topic and only for my intentions on NN)
---------------------

Now I have the problem, that RM-NN behaves totally wrong (with 4.3 and 4.4) while Weka-NN behaves as expected.
I applied the same parameters for both in the same workflow and deactivated Weka-NN normalization since I do
manually before. While RM-NN results in a totally confusing result (not a local minimum or so) Weka-NN reproduces
the data acceptable. Since the data set is quite large (Zipped 300k) I cannot place it here. Only posting the workflow
doesn't make sense, does it? (see below)

So: Who can help me and how can it be done??

Here is the workflow (without data):

<operator name="Root" class="Process" expanded="yes">
    <parameter key="logverbosity" value="init"/>
    <operator name="MemoryCleanUp" class="MemoryCleanUp">
    </operator>
    <operator name="ExampleSource" class="ExampleSource">
        <parameter key="attributes" value="C:\temp\wwdaten.aml"/>
        <parameter key="datamanagement" value="double_array"/>
    </operator>
    <operator name="FeatureNameFilter" class="FeatureNameFilter">
        <parameter key="skip_features_with_name" value="Xhyd"/>
    </operator>
    <operator name="CSVExampleSetWriter" class="CSVExampleSetWriter">
        <parameter key="csv_file" value="C:\temp\wwexamples.csv"/>
    </operator>
    <operator name="Numerical2Real" class="Numerical2Real">
    </operator>
    <operator name="Normalization" class="Normalization">
        <parameter key="return_preprocessing_model" value="true"/>
    </operator>
    <operator name="ModelWriter" class="ModelWriter">
        <parameter key="model_file" value="C:\temp\normalizeww.mod"/>
        <parameter key="output_type" value="XML"/>
    </operator>
    <operator name="NeuralNet" class="NeuralNet">
        <parameter key="learning_rate" value="0.31"/>
        <parameter key="momentum" value="0.21"/>
        <parameter key="training_cycles" value="500"/>
    </operator>
    <operator name="ModelWriter (2)" class="ModelWriter">
        <parameter key="model_file" value="C:\temp\nnww.mod"/>
    </operator>
    <operator name="CSVExampleSource" class="CSVExampleSource">
        <parameter key="filename" value="C:\temp\wwexamples.csv"/>
        <parameter key="label_column" value="11"/>
    </operator>
    <operator name="Numerical2Real (2)" class="Numerical2Real">
    </operator>
    <operator name="ModelLoader" class="ModelLoader">
        <parameter key="model_file" value="C:\temp\normalizeww.mod"/>
    </operator>
    <operator name="ModelApplier" class="ModelApplier">
        <list key="application_parameters">
        </list>
    </operator>
    <operator name="ModelLoader (2)" class="ModelLoader">
        <parameter key="model_file" value="C:\temp\nnww.mod"/>
    </operator>
    <operator name="ModelApplier (2)" class="ModelApplier">
    </operator>
</operator>

Answers

  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    first of all: thanks for insisting on the RM NN. Due to this we have checked it again on more data sets and it seems that indeed during one or several of our changes for version 4.4 the NN performs less good on several data sets (unfortunately this was not true for the 8 (!) data sets we used for testing those changes  >:( ) So thanks for pointing this out ever and ever again until we finally checked it again and were able to reproduce your problems.

    The most likely candidate for both the decrease in generalization capabilities and the increase in runtime is the change of the underlying NN library we use. So the conclusion of course is that we will check the NN again in detail and see if we can identify the error and perform a fix on our side. In the worst case, we could at least change the NN (library) back although there were several other issues with the old NN learner which was actually the reason why we decided to include another one (the new NeuralNetSimple operator). I will let you know if we find out what going wrong with the NN here and what the final decision will be.

    By the way: NN is (in my opinion) the fastest way to check whether there are relations in data without applying
    a too sophisticated learning workflow or the danger of overfitting data. Therefore I like to apply NN for a first
    approach :-) (answer for the other topic and only for my intentions on NN)
    I do not agree here  ;)

    Actually, NN are by far the learning method which are known to be among those which are most likely to produce overfitting. From a historical point of view, exactly the problem of overfitting led to the integration of structural risk minimization and to most parts of statistical learning theory as it is used today. Which makes things harder that NN are really sensitive to parameter changes, more than most other methods I am aware of. So I would never say that NN are a good method for a first quick test without a high risk of overfitting.

    And that is especially true if you only check the performance on the training data like you did in your process. I would suggest to make a real comparison like using cross validation (with the same random seed!) for getting an idea which method works better on your data. And please, please, do not use the NN but the NNSimple learner instead until we fixed the NN issues  ;)


    So a fair comparison would look like:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="C:\Dokumente und Einstellungen\Mierswa\Eigene Dateien\rm_workspace\sample\data\ripley-set.aml"/>
        </operator>
        <operator name="XValidation" class="XValidation" expanded="no">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="local_random_seed" value="100"/>
            <operator name="NeuralNetSimple" class="NeuralNetSimple">
                <parameter key="default_hidden_layer_size" value="4"/>
                <parameter key="training_cycles" value="500"/>
                <parameter key="learning_rate" value="0.3"/>
                <parameter key="momentum" value="0.2"/>
                <parameter key="error_epsilon" value="0.0"/>
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
            </operator>
        </operator>
        <operator name="XValidation (2)" class="XValidation" expanded="no">
            <parameter key="local_random_seed" value="100"/>
            <operator name="W-MultilayerPerceptron" class="W-MultilayerPerceptron">
                <parameter key="H" value="4"/>
            </operator>
            <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier (2)" class="ModelApplier">
                    <list key="application_parameters">
                    </list>
                </operator>
                <operator name="Performance (2)" class="Performance">
                </operator>
            </operator>
        </operator>
    </operator>
    The used data set is the Ripley data which is delivered together with RM (in the the sample/data subdirectory). The default parameters of Weka's MLP have been used here for both NN learners (learning rate: 0.3, momentum: 0.2, max. error: 0, max iterations: 500). The result:

    RapidMiner's NeuralNetSimple: 85.20% +/- 6.94%
    Weka's W-MultilayerPerceptron: 84.00% +/- 8.00%

    Not too much of a difference, huh? At least no significant one  ;)


    So thanks again for pointing out the NN issue, we will let you know what's happening there as soon as possible. Until then, I would like to ask all users to use the operators "NeuralNetSimple" which - despite its name - works quite well on a wide range of data sets - or of course the W-MLP operator.

    Cheers,
    Ingo


    P.S.: By the way: did you work on classification problems (as most of our checks did) or on regression problems?
  • earmijo
    earmijo New Altair Community Member
    Hi Ingo:
    Thanks for the effort put into making the program better. The kind of support we get in this forum from the Rapid-I developers (for a product that we get for free) would make many commercial software providers feel ashamed.
  • michaelhecht
    michaelhecht New Altair Community Member
    Hello,

    well I'm not a so sophisticated user but more at the beginning of exploring the data mining world.
    NN is for me the safest way since I know, if I have 20, 40 or even 60 weights but 10000 or more
    data lines with 10 or more attributes, it is quite difficult to overfit these data, isn't it?

    Even if I know, that using meta-methods is a must for a good data mining, I feel not very comfortable
    with the application in RM, since it is not so easy to understand how the meta-models interact with
    the models and the data sets. More detailed documentation is wishfully welcome (I don't expect you
    to write a book like Weka - even if I would by one if available :-) ).

    So I will inspect your example in detail to learn as much as possible - thank you. For the NN-issue
    I opened a new topic with data. Maybe someone will have a pity on it and give me a hint.
  • IngoRM
    IngoRM New Altair Community Member
    Hi,

    thanks, earmijo, for your really kind words! Hearing things like those from time to time is really motivating us.

    NN is for me the safest way since I know, if I have 20, 40 or even 60 weights but 10000 or more
    data lines with 10 or more attributes, it is quite difficult to overfit these data, isn't it?
    No, although the curse of dimensionality naturally leads to a higher tendency for overfitting for most learning methods, there is not such simple rule of thumb like that. In contrary, if the method overfits or not depends more on an appropriate model class together with an appropriate set of learning parameters. A KNN learner for smaller k tends to more overfitting than with larger k. Using high values of C of an SVM also leads to more overfitting risk, the same is true for larger number of training cycles for neural nets.

    For neural networks, things are in particular more difficult: since you can hardly interpret the model, you do not even know without a fair error estimation if the net was likely to overfit or not. And what makes things more difficult: if the structure of the net is complicated enough and the net gets enough time, a neural net will overfit to any data set independent to the number of dimensions or data tuples.

    About the meta learning: I would not say that using them is a must for good data mining. There is only one must: check all model types, starting with the most simple ones, and use the best one in terms of generalization capabilities, understandability if desired and finally runtime if scalability is an issue. And talking about DM recipes: if you don't have success with a diiverse set of model types, try to "invent" better representations of your data (read: features). The input space most often is much more important than the selection of the "best" learning method. But these are just my 2c, just ignore them if you like  ;)

    Cheers,
    Ingo