🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Linear Regression using p-value

User: "RapidRavi"
New Altair Community Member
Updated by Jocelyn
I'm trying to do Linear Regression, and I want to create a process which can exclude features using from Training Set by using p-value. So, if a column's p-value is less than 0.05 then remove/ignore that column and repeat the process until we are left with Statistically significant columns for out model.

Can someone guide me (or point me to any existing documentation) how to do it?

Find more posts tagged with

Sort by:
1 - 2 of 21
    User: "varunm1"
    New Altair Community Member
    Updated by varunm1
    Hello @RapidRavi

    Did you try to set "T-test" in features selection parameter of linear regression operator? The alpha value 0.05 indicates to select variables with p<0.05.

    Please let us know if you need any information.

    User: "YYH"
    Altair Employee
    Hi @RapidRavi,

    Suppose you are building optimization process for feature selection that shave off the variables of less importance. You can use the “Forward Selection”, “Backward Elimination”, or other feature engineering operators. But we are using the performance (RMSE, AUC, accuracy, precision, recall, f-score, etc.) measurement of the predictive models not the p-value to select the significant variables. 


    In RapidMiner, only Generalized Linear Model or Logistic Regression could return a table of p-values. So we can build iterative loops to select important variables according to the p-values from GLM/LR. My attached process shows a simplified version (no iteration) of feature selection by p-values from GLM. Other non-linear models or ensembled regression models may not have p-values. Attached process is an example to drop variables with nonsignificant p-values. You will need the converters extension from Marketplace to run the process. The converter is used to extract the p-value from linear regression model to shave off the non-significant attributes...

    Cheers,

    YY