I setup a nice SVM using nu-svr in RM.
As a test I trained it on a a sparse data set containing 1000 records.
Then, I tested it against a new data set of about 14 records.
Every record of the test set returned the exact same prediction. This seems highly unlikely since there are over 140 dimensions to the SVM and a significant amount of variation in the data.
One guess is that maybe I'm not loading in the sparse data correctly for testing.
I can't seem to discover where my error is. Maybe someone here can offer some help/suggestions.
Here is the training XML
<?xml version="1.0" encoding="MacRoman"?>
<process version="4.3">
<operator name="Root" class="Process" expanded="yes">
<operator name="SparseFormatExampleSource" class="SparseFormatExampleSource">
<parameter key="data_file" value="/Users/noah/train_sparse.txt"/>
<parameter key="dimension" value="140"/>
<parameter key="format" value="yx"/>
</operator>
<operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
<parameter key="attribute_name_regex" value="label"/>
<parameter key="condition_class" value="is_nominal"/>
<parameter key="process_special_attributes" value="true"/>
<operator name="NominalNumbers2Numerical" class="NominalNumbers2Numerical">
</operator>
</operator>
<operator name="LibSVMLearner" class="LibSVMLearner">
<parameter key="C" value="100.0"/>
<parameter key="gamma" value="0.1"/>
<parameter key="keep_example_set" value="true"/>
<parameter key="svm_type" value="nu-SVR"/>
</operator>
<operator name="ModelWriter" class="ModelWriter">
<parameter key="model_file" value="/Users/noah/sparse_small.mod"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
<parameter key="keep_model" value="true"/>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="absolute_error" value="true"/>
<parameter key="keep_example_set" value="true"/>
<parameter key="prediction_average" value="true"/>
<parameter key="relative_error" value="true"/>
<parameter key="relative_error_lenient" value="true"/>
<parameter key="root_mean_squared_error" value="true"/>
</operator>
</operator>
</process>
Here are 2 rows to training data
0.99307958477511 1:2 2:12 3:0.982609455619486 4:0 5:14 6:5 7:0.8 8:0.0348258706467662 9:201 10:0.0496977837474815 11:1489 1
2:1 13:1 14:0.00477630731561417 15:133 16:10.81 17:5.5 101:1 116:1 117:1 119:1 125:1\
0.989655172413817 1:3 2:2 3:0.973641810178274 4:0 5:63 6:3 7:1 8:0.0631443298969072 9:776 10:0.0769704433497537 11:1624 12:
1 13:0.5 14:0.0049596226732805 15:123 16:-0.09 17:6 101:1 116:1 117:1 119:1 125:1
here is the test XML
<?xml version="1.0" encoding="MacRoman"?>
<process version="4.3">
<operator name="Root" class="Process" expanded="yes">
<operator name="SparseFormatExampleSource" class="SparseFormatExampleSource">
<parameter key="data_file" value="/Users/noah/test.txt"/>
<parameter key="dimension" value="141"/>
<parameter key="format" value="yx"/>
</operator>
<operator name="ModelLoader" class="ModelLoader">
<parameter key="model_file" value="/Users/noah/sparse_c4_1000.mod"/>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<list key="application_parameters">
</list>
<parameter key="create_view" value="true"/>
<parameter key="keep_model" value="true"/>
</operator>
</operator>
</process>
here are 2 rows of test data
1:0 2:14 3:0.979392741314451 4:0.0909090909090909 5:28 6:22 7:0.227272727272727 8:0.0436046511627907 9:1376 10:0.0735090152
565881 11:1442 12:0 13:2 14:0.0104266852405951 15:133 16:9.64 17:8.09 103:1 116:1 117:1 119:1 125:1
1:0 2:1 3:0.980626115895827 4:0.0357142857142857 5:20 6:28 7:0.178571428571429 8:0.0338541666666667 9:768 10:0.065300896286
8118 11:781 12:0.321428571428571 13:0.2 14:0.0067155135256289 15:130 16:6.64 17:8.32 102:1 111:1 117:1 119:1 125:1