prediction modeling for text analysis

lambamanika07
lambamanika07 New Altair Community Member
edited November 5 in Community Q&A

I am trying to perform a prediction modeling of text resources. I chose 272 training resource and 116 as test ones. But only 190 from the training ones and 80 from the test ones got modeled and results about their accuracy, precision and recalls values were shown. But I want to get those results for all the data. Please help.

Best Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓

    Hi @lambamanika07

     

    I don't understand exactly what you want to do and what you performed exactly.

     Your training and test dataset are both labeled ?

     

    But given the information given, I suggest you to perform a cross validation with your 272 training ressources to build a model ==>

    you will have the performance (accuracy, recall, precision) of your model based on your 272 training ressources.

     

    and then to apply this model to your 116 (labeled ?) test ressources with a performance operator. =>So you can measure the performance

    of your builded model on "unseen" data. THe process looks like this : 

    text_training_test_data.png

     

    or

     

    you can perform a cross validation with your 388 ressources (272 training + 116 test) to build a better model ==> 

    you will have the performance (accuracy, recall, precision) of your model based on your 388 "training" ressources.

    and you can apply this model to future unseen data. The process looks like that : 

    text_training_test_data_2.png

     

     

     

     

    For a better response, can you share your process and your data source, please ?

     

    Regards,

     

    Lionel

     

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓

    Hi @lambamanika07 again,

     

    to complete my response,  the sub-process cross validation looks like that : 

    text_training_test_data_3.png

     

    Regards,

     

    Lionel

     

     

     

     

Answers

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    Are you using Cross Validation? Post your proess using the < / > option. 

  • lambamanika07
    lambamanika07 New Altair Community Member
  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓

    Hi @lambamanika07

     

    I don't understand exactly what you want to do and what you performed exactly.

     Your training and test dataset are both labeled ?

     

    But given the information given, I suggest you to perform a cross validation with your 272 training ressources to build a model ==>

    you will have the performance (accuracy, recall, precision) of your model based on your 272 training ressources.

     

    and then to apply this model to your 116 (labeled ?) test ressources with a performance operator. =>So you can measure the performance

    of your builded model on "unseen" data. THe process looks like this : 

    text_training_test_data.png

     

    or

     

    you can perform a cross validation with your 388 ressources (272 training + 116 test) to build a better model ==> 

    you will have the performance (accuracy, recall, precision) of your model based on your 388 "training" ressources.

    and you can apply this model to future unseen data. The process looks like that : 

    text_training_test_data_2.png

     

     

     

     

    For a better response, can you share your process and your data source, please ?

     

    Regards,

     

    Lionel

     

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Answer ✓

    Hi @lambamanika07 again,

     

    to complete my response,  the sub-process cross validation looks like that : 

    text_training_test_data_3.png

     

    Regards,

     

    Lionel

     

     

     

     

  • Thomas_Ott
    Thomas_Ott New Altair Community Member

    @lambamanika07 i would not build a text classification model as you've shown. I would do it like @lionelderkrikor shows. Also, if the LinearSVM doesn't show good results, I would try a Naive Bayes and/or Deep Learning. You could even use a Stacking or Voting operator.