Home
Discussions
Community Q&A
Polynominal Value Reduction
seanv507
Hi
I would like to replicate a process i have done in Python/scikit-learn/R:
I am looking at Advertising Click Through Rate prediction. ( Millions of rows, say ~5 polynominal features... each with up to 1000 different values (eg feature=Website, Country etc).
Since the feature data is "skewed", ie many values have very few instances in data and vice versa, I want to restrict the polynominal features to those that change CTR significantly from base CTR ( and replace the "long tail" by a single "NA" category for each polynominal feature).
Is there any way of doing this within rapid miner?
Find more posts tagged with
AI Studio
Accepted answers
All comments
fras
Hi,
as far as I understand the problem I would do two things first:
- get a sample of your data (reduce rows, 1%)
- apply operator "NominalToBinominal"
Then analyse how sparse your data is.
For more advice examples are useful.
seanv507
CTR data is "unbalanced" - ie ~1% chance of clicking. So subsampling is good - but I have to do it only on the "non-click class" and then reweight the class in the training algorithm [ eg data contains 100 clicks, 100000 non-clicks - I am happy to subsample non-clicks]
feature data is JUST IDs: WebsiteID, AdID etc [ eg google.com=1, yahoo.com=2, cnbc.com=3,....], so no description of website.
So yes I want to to NominaltoBinominal, but then/at same time/before I want to FILTER out those Binominals eg certain websites for which there is little training data]
( see eg
http://www.kaggle.com/about/papers
... click though rate)
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)