Build decision tree using Python and embed in Rapid Miner

10383721User: "10383721"
New Altair Community Member
Updated by Jocelyn

Hi guys, 

 

I am doing a project where I need to create decision tree using Python and then embed it in Rapid Miner using Execute Python operator. 

These are screenshots of my process:Screen Shot 2017-12-12 at 11.14.02.png

 

Screen Shot 2017-12-12 at 11.14.16.pngSubprocess in Cross Validation

 

 

This is my code for the decision tree:

import numpy as np
import pandas as pd
from sklearn.cross_validation import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn import tree

# rm_main is a mandatory function,
# the number of arguments has to be the number of input ports (can be none)
def rm_main(data):
#import data
file = '04_Class_4.1_german-credit-decoded.xlsx'
xl = pd.ExcelFile(file)
print(xl.sheet_names)

#load a sheet into a DataFrame
gr_raw = xl.parse('RapidMiner Data')

#create arrays for the features, X, and response, y, variable
y = gr_raw['Credit Rating=Good'].values
X = gr_raw.drop('Credit Rating=Good', axis=1).values

#split data into training and testing set
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=50)

#build decision tree classifier using gini index
clf_gini = DecisionTreeClassifier(criterion='gini', random_state=50, max_depth=10, min_samples_leaf=5)
clf_gini.fit(X_train, y_train)

return clf_gini

When executed it gives me an error, I am not sure which part of this code that I should ignore for a successfule execution. 

Would appreciate any advice or help on this! 

Thank you. 

 

Regards, 

Azmir F

Find more posts tagged with

Sort by:
1 - 1 of 11
    10383721User: "10383721"
    New Altair Community Member
    OP
    Accepted Answer

    Thanks guys for the solutions you have provided. I have managed to come up with my own solution. 

    I did not know that python needs numerical data to apply the model. So I have modified my process and used Execute Python operators twice, once in Training and once in Testing. I used the Numerical to Binominal operator after the second Excecute Python operator. 

    Note that I have renamed it to Build Model and Apply Model.

     

    This is my updated process:Screen Shot 2017-12-14 at 14.42.32.png

     

    Screen Shot 2017-12-14 at 14.42.45.pngCross Validation Subprocess

     

     

    My Python script for Build Model is as below:

    from sklearn.tree import DecisionTreeClassifier
    def rm_main(data):

    # build decision tree
    X = data[['Age', 'Duration in month', 'Installment rate in % of disposable income','Credit Amount', 'Present residence since', 'Number of existing credits', 'Number of dependents']]
    y = data[['Credit Rating']]
    clf = DecisionTreeClassifier(min_samples_split = 20, max_depth = 10, random_state = 99)
    clf.fit(X, y)

    return clf

    My Python script for Apply model is as below:

    from sklearn.tree import DecisionTreeClassifier
    def rm_main(model, data):
    X = data[['Age', 'Duration in month', 'Installment rate in % of disposable income','Credit Amount', 'Present residence since', 'Number of existing credits', 'Number of dependents']]
    data['prediction'] = model.predict(X)

    #set role of prediction attribute to prediction
    data.rm_metadata['prediction']=(None,'prediction')
    return data

    Let me know if you have other relevant solution or better script to produce a more stable model. 

    Thank you. 

     

    Regards,

    Azmir F