Automating a RM5 Process

Question

Hello,

I am interested in automating a RM5 process

I have created a very simple RM5 application which has the operators: Read Model, Read Database, which join to Apply Model then Write database.

What I need is a way to automate this process.  My database gets updated once a minute and I have 10 different data sources and models.  Also I write predictions to 10 different tables.

I would like to continuously loop through this application and just change the parameters: Read Model – input file, Read Database –Sql statement and Write Database – Output table.

I have looked at the various process control loops without success.  Should I create 10 different applications with the various configurations and load them through a Scheduled Task (windows cron job) or is there a better solution.

I am also working with RM5 Beta, and have been unable to load this application through the command line.  (OS is Windows XP)

Thanks in advance,
Cleo

Cleo · Answer

This works perfectly.

Thanks for your help.

Cheers,
Cleo

land · Answer

Hi Cleo,
the first two steps would be perfectly fulfilled with the operators of the time series extension, we will publish with the final version.
The last step can be done without scripting, just using the Construct Attribute operator. It handles if-conditions, and even nested conditions. So it should be possible to extract the nominal target value with that.

As an example for your script, I will quote the still unfinished tutorial:
Let’s assume we have the following situation: We get data from a machine, that count’s the seconds since it was switched on. Each entry in this log file has this time stamp. Unfortu-nately other data sources we are going to use don’t have this relative time stamp. So we have to transform the relative format into a regular date and time format. Since RapidMiner doesn’t provide an operator solving this particular problem, we decide to write a small script. This problem doesn’t seem to be worth the effort of building a complete extension, because we can’t believe there are many other such stupid machines around, that don’t have an integrated clock. We build a simple process, which should do the trick:

Image 1: A simple process for applying a script
As a first step we are going to load the data and then directly apply our script. As a last step we will do some date adjustment, but we will come back to this later. After loading we have an ExampleSet consisting of a number of attributes, describing the machine’s state. They are called att1, att2 to att500. The time stamp is contained in an attribute named relative time. During scripting we might ignore the state’s attribute. We just want to focus on the one single attribute.

And here's the resulting code after two types of explanations:
1.	import com.rapidminer.tools.Ontology;
2.	
3.	ExampleSet exampleSet = input[0];
4.	Attributes attributes = exampleSet.getAttributes();
5.	Attribute sourceAttribute = attributes.get("relative time");
6.	String newName = ("date(" + sourceAttribute.getName() + ")";
7.	Attribute targetAttribute = AttributeFactory.createAttribute(newName, 	Ontology.DATE_TIME);
8.	targetAttribute.setTableIndex(sourceAttribute.getTableIndex());
9.	attributes.addRegular(targetAttribute);
10.	attributes.remove(sourceAttribute);
11.	
12.	for (Example example: exampleSet) {
13.		double timeStampValue = example.getValue(targetAttribute);
14.		example.setValue(targetAttribute, timeStampValue * 1000);
15.	}
16.	
17.	return(exampleSet);

Hope that will help you.

Greetings,
  Sebastian

Cleo · Answer

Hello Sebastian,

Thanks again! With a couple of simple modifications to your RM5 file suits my needs perfectly.

The data I am working with is a time series and I am attempting preprocess the data in three ways.  
1)	Moving Average (average value of the last 2 values of col2 and col3 )[ Average(col2(t),col2(t+1),col3(t),col3(t+1)]
2)	Percent change ([Col1(t)-Col1(t-1)]/Col1(t)*100)
3)	Custom binomial result based on: if (col1(t)-col2(t+x))>const1 before (col1(t)-col3(t+x))>const2 then Result=Yes else Result=No   ie which statement is true first

So far I have done this and other preprocessing in the database, but I think RM would be better at it.  I believe cases 1 and 2 could be achievable with standard RM operators but I feel case 3 will require custom coding.

I have unsuccessfully tried to implement a “Hello world” groovy  example from http://groovy.codehaus.org/.
If possible I would appreciate a small example script in Groovy. I have included the pseudo code of an example I which could adapted to my personal needs.  Assuming the execute script operator is working with an exampleSet loaded from a Read excel operator containing with one sheet and the number 1,2,3,4,5 in column A.

Step 1) Load the exampleSet from the Excel Operator 
Step2)Create a loop for each row:
Step2a) Print the value of the attribute Column A (row) {Print to a log file, or to the screen or anywhere else for debug proposes}
Step2b) Call a function passing it row and have the function return the result row+10
Step3) Return to RM the new ExampleSet containing two columns A and A+10 ie (1,11),(2,12),(3,13),(4,14),(5,15)

If you would like I could give you some feedback on the tutorial you are writing.

Thanks  again,
Cleo