🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

"text mining output"

User: "sheridany"
New Altair Community Member
Updated by Jocelyn
Trying to text mine 30K email excerpts collated into one file.  I know something is wrong because the frequency count for words that I would expect to be frequent are coming up as zero.

id id integer avg = 1 +/- 0 [1.000 ; 1.000] 0.0
label label nominal mode = bp (1), least = bp (1) bp (1) 0.0
regular Dear real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular Wells real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular Fargo real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular online real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular bill real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular transactions real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular National real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular Benefit real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular Life real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular Insurance real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular Company real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular another real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular Both real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular were real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular deducted real avg = 0 +/- 0 [0.000 ; 0.000] 0.0
regular checking real avg = 0 +/- 0 [0.000 ; 0.000] 0.0

The log also references an issue with the example set at the end even though I have set it to overwrite.

P Aug 13, 2009 11:57:53 AM: Process:
  Root[1] (Process)
  +- TextInput[1] (TextInput)
  |  +- StringTokenizer[2] (StringTokenizer)
  |  +- StopwordFilterFile[2] (StopwordFilterFile)
  |  +- TokenLengthFilter[0] (TokenLengthFilter)
  +- ExampleSetWriter[1] (ExampleSetWriter)
P Aug 13, 2009 11:57:53 AM: [Warning] TextInput: Warning: Encoding  unknown. Using default.
P Aug 13, 2009 11:57:56 AM: [Warning] TextInput: The original example example set already contains an attribute named "label". This is likely to cause trouble. Please rename the attribute in the original example set.
P Aug 13, 2009 11:57:56 AM: [Warning] TextInput: There is a term that equals the class attribute, renaming it
P Aug 13, 2009 11:57:56 AM: [Warning] TextInput: Warning: Encoding  unknown. Using default.
P Aug 13, 2009 11:57:59 AM: Process:
  Root[1] (Process)
  +- TextInput[1] (TextInput)
  |  +- StringTokenizer[2] (StringTokenizer)
  |  +- StopwordFilterFile[2] (StopwordFilterFile)
  |  +- TokenLengthFilter[2] (TokenLengthFilter)
  +- ExampleSetWriter[1] (ExampleSetWriter)
P Aug 13, 2009 11:57:59 AM: Produced output:
IOContainer (1 objects):
SimpleExampleSet:
1 examples,
34729 regular attributes,
special attributes = {
    id = #0: id (integer/single_value)
    label = #34730: label (nominal/single_value)/values=[bp]
}
(created by TextInput)
P Aug 13, 2009 11:57:59 AM: [NOTE] Process finished successfully after 5 s
G Aug 13, 2009 11:57:59 AM: [NOTE] Cannot use plotter 'Scatter Matrix': Data table must have between 0 and 50 columns, was 34730.
G Aug 13, 2009 11:57:59 AM: [NOTE] Cannot use plotter 'Survey': Data table must have between 0 and 100 columns, was 34730.
G Aug 13, 2009 11:58:00 AM: [NOTE] Cannot use plotter 'Andrews Curves': Data table must have between 0 and 1000 columns, was 34730.
G Aug 13, 2009 11:58:00 AM: [NOTE] Cannot use plotter 'Quartile Color Matrix': Data table must have between 0 and 100 columns, was 34730.
G Aug 13, 2009 11:58:00 AM: [NOTE] Cannot use plotter 'RadViz': Data table must have between 0 and 1000 columns, was 34730.
G Aug 13, 2009 11:58:00 AM: [NOTE] Cannot use plotter 'GridViz': Data table must have between 0 and 10000 columns, was 34730.

Lastly how can I use visualization to see frequent terms words etc. 

Find more posts tagged with