Mining for stock entry rules
Mark_Knecht
New Altair Community Member
Hi,
Like many before me I'm a complete newbie to RapidMiner. This is my first post. Very impressive program and I'm very happy that it's available as Open Source. I'm running it on my Gentoo 64-bit machine and so far it seems to be working well.
Now, I've never used data mining before and am likely going to ask all the wrong things so please be kind. Don't worry too much. I can take a punch if I do something stupid.
OK, the initial task I've set for myself is to see if RapidMiner can extract a set of (for now) *ENTRY* rules that would help with day trading. I've prepared a data file with OHLCV data as well as a number of technical indicators that I currently use. The file is in csv format and seem to be able to successfully read it in using either the ExampleSource and CSVExampleSource operators.
Having done that I've managed to use a couple of operators as preprocessors, etc. For instance I can apply the Normalization operator and the model will play. However as soon as I add a RuleLearner I get a message in the message window that says "[Fatal] Process failed: Input example set does not have a label attribute". My first question is what, in general, do I need to do to get past this problem?
The larger question I have is in general how do I descibe the sort of criteria I would find acceptable in the rules RapidMiner mines? For instance, say I'm asking it to find a rule for going long a futures contract, I'd like a rule that did something like this:
1) Sometime in the next 30 bars there is a potential for a 2% gain. (Might be measured using high or possibly close.) Using 5 minute data that's about 2 1/2 hours which is good for a day trade.
2) Within whatever number of bars the required gain is developing there is no bar with more than a 1% drawdown or the entry would be considered a failure. (Must be measured using low.)
If I can mine out a rule like that then I'd like to understand what percentage of the time the rule works.
Assuming I can find a rule like this then I'll address exit rules later. I.e. - this is (for now) only about using indicators to start the trade.
For now I have a recent data set of 76,000 examples. I tried to load 400,000 samples but RapidMiner said I ran out of memory. (4GB - I'm surprised but I guess it's possible since it happened!) ;-)
Thanks in advance and I look forward to becoming a contributing member of the group.
Cheers,
Mark
Like many before me I'm a complete newbie to RapidMiner. This is my first post. Very impressive program and I'm very happy that it's available as Open Source. I'm running it on my Gentoo 64-bit machine and so far it seems to be working well.
Now, I've never used data mining before and am likely going to ask all the wrong things so please be kind. Don't worry too much. I can take a punch if I do something stupid.
OK, the initial task I've set for myself is to see if RapidMiner can extract a set of (for now) *ENTRY* rules that would help with day trading. I've prepared a data file with OHLCV data as well as a number of technical indicators that I currently use. The file is in csv format and seem to be able to successfully read it in using either the ExampleSource and CSVExampleSource operators.
Having done that I've managed to use a couple of operators as preprocessors, etc. For instance I can apply the Normalization operator and the model will play. However as soon as I add a RuleLearner I get a message in the message window that says "[Fatal] Process failed: Input example set does not have a label attribute". My first question is what, in general, do I need to do to get past this problem?
The larger question I have is in general how do I descibe the sort of criteria I would find acceptable in the rules RapidMiner mines? For instance, say I'm asking it to find a rule for going long a futures contract, I'd like a rule that did something like this:
1) Sometime in the next 30 bars there is a potential for a 2% gain. (Might be measured using high or possibly close.) Using 5 minute data that's about 2 1/2 hours which is good for a day trade.
2) Within whatever number of bars the required gain is developing there is no bar with more than a 1% drawdown or the entry would be considered a failure. (Must be measured using low.)
If I can mine out a rule like that then I'd like to understand what percentage of the time the rule works.
Assuming I can find a rule like this then I'll address exit rules later. I.e. - this is (for now) only about using indicators to start the trade.
For now I have a recent data set of 76,000 examples. I tried to load 400,000 samples but RapidMiner said I ran out of memory. (4GB - I'm surprised but I guess it's possible since it happened!) ;-)
Thanks in advance and I look forward to becoming a contributing member of the group.
Cheers,
Mark
Tagged:
0
Answers
-
Overnight I started to wonder if it's my responsibility to put something in the data set that identifies where the sort of entries are that I'm looking for. Is this the standard thing to do?
For instance, looking backward in the data I can identify locations where there *was* enough gain to be of interest and also where there *was not* more drawdown than I might want to tolerate. This could be just a couple more columns in the spreadsheet and isn't a big deal to add. I'm thinking that this sort of information might make it much easier for RapidMiner to understand what I'm looking for?
If this has all been discussed elsewhere in the forums please point me toward the right threads. I've been searching and haven't found much but I'm sure my search terms aren't the best.
Thanks,
Mark0 -
Hi Mark,
welcome on bord
A little remark on the memory issue: Take a look on the memory monitor to check if your available memory is initialized correctly. If you start the .jar file, java might restrict the access to 64 Mb or something like that. If you use the start script this should be avoided.
Second thing to mention is, that you are using a generic data mining program. It provides a very great functionality, but it is not tailord for mining stock data. So you will have to help the program find what it should analyze.
There is a group of operators called supervised learners, including the rulelearner you mentioned. But since they are supervised they need a number of examples, where they can learn from. Thats the reason why rapid miner calls an table row "example". But to apply such a learner, you need labeled examples. The label provides the information, what the target of the analysis should be. This might very well be the three values "stable", "up"; or "down", indicating of the stock value will rise, fall or stay as it was.
You should include such an information into your data and mark the corresponding column as label via the "label" parameter of the input operator.
If you need a more detailed introduction to data mining, you could visit one of our training courses.
Greetings,
Sebastian0 -
Sebastian,
Thanks. I'm excited about learning. I'm just an old guy who likes computer programs. Engineering degree so some math but I'm not approaching this from an academic perspective. I just picked stocks as a way to get rich if I learned anything! ;-)
Also, I come from the Open Source world and primarily use nothing but Linux so I expect to share whatever generic information I discover along the way. It's one reason I decided to try RapidMiner and actually from a learning perspective would prefer a generic tool over one where someone has decided ahead of time what I should see/do/think.
Ok, thanks for the pointer to the start-up script. That seems to have fixed the memory issue, at least according to the monitor. I was stuck at 81MB starting the .jar file. I now have 494MB. I've not loaded my large file yet but I suspect that problem is fixed.
I clearly don't have a handle on attributes yet so I'm working on that. I found this thread among others. Much to learn.
http://rapid-i.com/rapidforum/index.php?topic=379.0
In terms of describing to the program what I'm looking for in the data let's assume a simple set of closing data and an extra calculation in the spreadsheet that looks 2 bars back:
8000 ?
8000 ?
8000 0
8010 10
8020 20
8030 20
8040 20
8050 20
8050 10
8050 0
Let's assume I take the 2nd column and filter everything above 15 as "up", everything below -15 as "down" and everything that's left as "stable". Now I have some information, but it's misaligned in terms of *predicting* where the price will go. If I shift that data forward two rows in the spreadsheet now it would seem to give the info I need. Is this the idea?
I am assuming that I would shift the results column but not shift any of the other columns in the spreadsheet that have technical indicators, and then let RapidMiner look for rules that might point toward going long when some set of rules tell it to do so? I'm not clear if this would be an example of supervised or unsupervised learning?
I would absolutley LOVE to come take a course one day. However I am in Califormia and it seems you are in Germany. Any chance you might teach in the U.S. and especially on the West Coast one day? I'm in Silicon Valley so I suspect there's a good audience for such things here.
Again, thanks for the info. It's helpful.
Cheers,
Mark0