Good evening ladies and gentlemen:
I am puzzled by a particular issue, and I would greatly appreciate if anyone could point me in the right direction to learn about solving this issue.
What I am hoping to do is use data mining to identify patients who could benefit from a cancer screening test that would not be beneficial to the general public. I am treating this as a classification problem with two groups: potential cancer patient and not potential cancer patient. However, I want the algorithm to be biased in a sense. What I mean by that, is that I'm OK with it calling 100 disease free people potential cancer patients, but most importantly is for me to minimize the ones with actual disease that the algorithm says have no disease. Because, you see, in one case, falsely triggering the screening test, very little harm is done, but if you avoid the screening test in someone with the disease, a great deal of harm is done. There is a balance that must be reached here, obviously, because the little harm that is done by screening healthy people can add up if you screen too many of them to find one disease. That's basically the problem I'm working on: how to find the right way to identify the people who will benefit.
So, I've gotten to know my way around the basics, but at this point, do I need to learn to write my own algorithm? Or are there algorithms where I can set some parameters that will bias them in various ways so I can evaluate the results of those biases?
Any assistance you can provide and especially direction to resources where I can learn about this topic in depth will be met with my sincere gratitude