(.*_.*){2,}
int maxLength = getParameterAsInt(PARAMETER_MAX_LENGTH);int minLength = getParameterAsInt(PARAMETER_MIN_LENGTH);for (int i = 0; i < tokenList.size(); i++) { for (int j = minLength - 1; j < maxLength; j++) { StringBuffer s = new StringBuffer(); if (i + j < tokenList.size()) for (int z = i; z < i + j + 1; z++) { s.append(tokenList.get(z)); if (z != i + j) s.append('_'); } if (s.length() > 0) ngrams.add(new Token(s.toString(), tokenList.get(i))); }}
types.add(new ParameterTypeInt(PARAMETER_MIN_LENGTH, "The minimal length of the ngrams.", 1, (int) Double.POSITIVE_INFINITY, 2, false));
any response on this? I have a similar use case where order of n-grams dont matter and i want to group "word1-word2-word3" same as "word2-word1-word3"
is that possible?
The link posted above doesnt work
Can't you use the Extract Information operator for this?