Hi miners,
I need to understand inner logic of 'Binning by Entropy' operator (however I understand the standalone algorithm itself). It seems to me that in many cases it tries to minimize the final number of bins, which results in maximum 2 bins for most variables in certain datasets. This often might me relevant, however, very often not granular enough.
Think of customer age in credit risk applications. Traditionally, the correlation is such, that the younger the customer, the riskier he is, and with a little upward trend in the oldest age group also. Technically, we can say that 2 bins can be a minimum that works here, but such binning does not take into account the distribution of risk per more granular age groups. If using weight of evidence binning, in many cases we may see distributions like this (here blue trend goes perfectly down throughout age groups, so it easily could be represented by 2 bins minimum):

Do I understand it right that this is how actually the operator works, trying to minimise number of bins? Can there be in the future possibilities and improvements for more control over parameters, like specifying desired minimum number of bins, and so on?
Also, a side question: anyone ever heard of an implementation of weight of evidence / information value algorithms and binning for RM?
Many thanks.