"A problem about the output_word_list of TextInput"

gfyang
gfyang New Altair Community Member
edited November 2024 in Community Q&A
Hi,

I want to do classification on a text set with 5 categories. Here is the code to input the text:

OperatorChain textInput;
IOContainer container;

textInput = (OperatorChain) OperatorService.createOperator("TextInput");
List<String[]> para = new ArrayList<String[]>();
String[] para1 = {"graphics", "c:/data/Reuters/acq"};
String[] para2 = {"hardware", "c:/data/Reuters/corn"};
String[] para3 = {"hardware", "c:/data/Reuters/crude"};
String[] para4 = {"hardware", "c:/data/Reuters/earn"};
String[] para5 = {"hardware", "c:/data/Reuters/grain"};
para.add(para1);
para.add(para2);
para.add(para3);
para.add(para4);
para.add(para5);
textInput.setListParameter("texts", para);
textInput.setParameter("prune_below", "3");
textInput.setParameter("output_word_list", "d:/test/word.list");

// some preprocessing for text

container = textInput.apply(new IOContainer());

Here is a fragment of the outputted file "word.list":

@number_of_documents 80
@number_of_classes 2
bank,8,5,3
aim,3,3,0
ltd,11,7,4
... ...
WHY there are only two classes in "word.list"?

BTW, what is the meaning of the last number, listed after each term in "word.list"?

Sincerely yours,
gfyang

Welcome!

It looks like you're new here. Sign in or register to get started.

Answers

  • haddock
    haddock New Altair Community Member
    Hi,

    Looks to me like there are only two classes ( "hardware" and "graphics" as a wild and crazy guess  ;) ), but what do I know ?

  • gfyang
    gfyang New Altair Community Member
    Hi, haddock

    You are right.  ;D This is my foolish mistake. Sorry.
  • haddock
    haddock New Altair Community Member
    Hola gfyang,

    As they say, the devil is always in the detail  >:(

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.