🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"A problem about the output_word_list of TextInput"

User: "gfyang"
New Altair Community Member
Updated by Jocelyn
Hi,

I want to do classification on a text set with 5 categories. Here is the code to input the text:

OperatorChain textInput;
IOContainer container;

textInput = (OperatorChain) OperatorService.createOperator("TextInput");
List<String[]> para = new ArrayList<String[]>();
String[] para1 = {"graphics", "c:/data/Reuters/acq"};
String[] para2 = {"hardware", "c:/data/Reuters/corn"};
String[] para3 = {"hardware", "c:/data/Reuters/crude"};
String[] para4 = {"hardware", "c:/data/Reuters/earn"};
String[] para5 = {"hardware", "c:/data/Reuters/grain"};
para.add(para1);
para.add(para2);
para.add(para3);
para.add(para4);
para.add(para5);
textInput.setListParameter("texts", para);
textInput.setParameter("prune_below", "3");
textInput.setParameter("output_word_list", "d:/test/word.list");

// some preprocessing for text

container = textInput.apply(new IOContainer());

Here is a fragment of the outputted file "word.list":

@number_of_documents 80
@number_of_classes 2
bank,8,5,3
aim,3,3,0
ltd,11,7,4
... ...
WHY there are only two classes in "word.list"?

BTW, what is the meaning of the last number, listed after each term in "word.list"?

Sincerely yours,
gfyang

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "haddock"
    New Altair Community Member
    Hi,

    Looks to me like there are only two classes ( "hardware" and "graphics" as a wild and crazy guess  ;) ), but what do I know ?

    User: "gfyang"
    New Altair Community Member
    OP
    Hi, haddock

    You are right.  ;D This is my foolish mistake. Sorry.
    User: "haddock"
    New Altair Community Member
    Hola gfyang,

    As they say, the devil is always in the detail  >:(