A program to recognize and reward our most engaged community members
Wanttoknow wrote:However there is one aspect that I would like to see during the vector creation of words in documents and that is the byte addresses per word occurence as a key to distinguish one word occurence from another. This would require a whole new representation of the wordlist where every occurence is displayed with a byte address/word location in stead of the aggregated number of occurences per word per document.This would open up a new range of possibilities such as determining what other words or terms are found in proximity of a certain word/term. This would be of great value to determine the context of documents.Of course I would be glad to know if this would already be possible with some combination of current operators ::)