"A problem about the output_word_list of TextInput"

gfyang
gfyang New Altair Community Member
edited November 5 in Community Q&A
Hi,

I want to do classification on a text set with 5 categories. Here is the code to input the text:

OperatorChain textInput;
IOContainer container;

textInput = (OperatorChain) OperatorService.createOperator("TextInput");
List<String[]> para = new ArrayList<String[]>();
String[] para1 = {"graphics", "c:/data/Reuters/acq"};
String[] para2 = {"hardware", "c:/data/Reuters/corn"};
String[] para3 = {"hardware", "c:/data/Reuters/crude"};
String[] para4 = {"hardware", "c:/data/Reuters/earn"};
String[] para5 = {"hardware", "c:/data/Reuters/grain"};
para.add(para1);
para.add(para2);
para.add(para3);
para.add(para4);
para.add(para5);
textInput.setListParameter("texts", para);
textInput.setParameter("prune_below", "3");
textInput.setParameter("output_word_list", "d:/test/word.list");

// some preprocessing for text

container = textInput.apply(new IOContainer());

Here is a fragment of the outputted file "word.list":

@number_of_documents 80
@number_of_classes 2
bank,8,5,3
aim,3,3,0
ltd,11,7,4
... ...
WHY there are only two classes in "word.list"?

BTW, what is the meaning of the last number, listed after each term in "word.list"?

Sincerely yours,
gfyang

Answers

  • haddock
    haddock New Altair Community Member
    Hi,

    Looks to me like there are only two classes ( "hardware" and "graphics" as a wild and crazy guess  ;) ), but what do I know ?

  • gfyang
    gfyang New Altair Community Member
    Hi, haddock

    You are right.  ;D This is my foolish mistake. Sorry.
  • haddock
    haddock New Altair Community Member
    Hola gfyang,

    As they say, the devil is always in the detail  >:(