How can I use document metadata in function expressions?
Hi,
I have a very simple process with 3 main operators.
1) Get Page
2) Extract Content
3) Proccess Documents
Under "Process Documents" there are sub-processes, Tokenize, Aggregate Token Length.
Everything works fine in terms of creating the tokens(keywords) with total occurrences, however, I'm trying to calculate the density of each word and include as a custom attribute. I have the token_number metadata which holds the number of keywords but I cannot seem to access that information. How can I achieve this so my example set result looks similar to this?
Thanks,
Find more posts tagged with
It sounds like that is the output from the wordlist. If you also connect the output port for the exampleset and examine that, you should see each token as its own attribute and the value that it has will be based on the term frequency (percentage of document tokens that particular token represents).
If you set the vector creation parameter for your "process documents" operator to term frequency, isn't that calculating what you are looking for?