"Beginner Machine Laerning Question"
Say I want to predict the price of an automobile based on attributes of the automobile. Assume that I know things such as tire size, date of manufacture, number of doors, etc. I could throw all these attributes into a decision tree learner and hope to find some relation about the cost of the car. But can I get a better result by using relations that I already know about the attributes? For example, assume that I don't know how much horsepower that the engine produces, but I do know information about the attributes that correlate with the engine's horsepower such as the engine displacement, number of cylinders, and number of gears in the transmission. Although I don't know the horsepower, assume that I can roughly calculate it form these parameters. Question is, doesn't it make more sense to try to isolate these attributes from the other attributes and use them exclusively for building a model for engine horsepower which can then be supplied to a higher layer learner that can try to figure out how horsepower and other factors affect an automobile's price? Obviously, if I don't have any idea about how the attributes relate, it's probably better to just supply them all into one learning algorithm. But if I know information about the relation among certain attributes, it seems like it would be a better approach to isolate the attributes into groups, build a model for what these attributes represent, and then use these sub-models to train another model, this would be like a hierarchy of learning, going from detailed attributes (number of cylinders, engine displacement, gears in transmission) to predict higher-level attributes (horsepower, torque), and finally predict price of auto from these higher level attributes (horsepower, quality of interior, car marker's reputation, etc). Question is, is this a good approach? Idea is to use information about relationships that I already know and direct the learning process. Second question, what if I don't know how to calculate horsepower from those low-level attributes, I only know that those attributes are related?