"Create Time Cluster Template for associate algorithm"
freeman
New Altair Community Member
Hello Together,
it is easier if I first explain what I have. Here we go:
In my company there is an little Server farm with 5 Servers.
On the Servers is running one big CRM System.
Also there is a Monitoring System to check the availability of the servers.
If an error occurs the monitoring system recognize it, create a message for the support team and store the failure in a db2 datawarehouse.
Also the Monitoring Systems stores the status for many components like CPU_Usage, Memory_Usage,... for every Server every 15Minutes.
The idea is now to do analytical tasks on the datawarehouse. I want to see if there is a dependency beetwen a failure that occure on one server and the cpu usage on another one.
The Vision is to get a rule like this: "If it is Monday at 8:15 and Server1 CPU_Usage>85% and Server2 CPU_Usage>91% then exists a plausibility of 77% that failureX will occure"
In the first step i want to create an Time Cluster Template like shown:
[img=http://img13.imageshack.us/img13/3774/templateg.th.jpg]
So I can save the count of failure in that scheme, and i can save the mean values of the CPU_Usage for every Server in that scheme.
If i have these tables i can take a look at a failure timestamp, check the CPU_Usage at this moment with the "normal CPU_Usage" and decide if it is in a normal area or not. (for every Server). In the final i want to have a table, where the failure timestamp is stored an "y" if a CPU_Usage of a server is in abnormal area and a "n" if it is normal value. On this table i want to use a association algorithm, to get a rule like descibed above.
Here I want to discribe my proceeding (Red "V" are variables):
[img=http://img26.imageshack.us/img26/8751/modellc.th.jpg]
My Question is if i have to write an tool to do the extraction in the timeslots or is there a easier way to get what i want.
Thanks for all your help
Chris
P.S.
I know that this analytic is not very interessting, because a failur in a sever farm have many causes. But its just a first experiment for me to get into dataming.
it is easier if I first explain what I have. Here we go:
In my company there is an little Server farm with 5 Servers.
On the Servers is running one big CRM System.
Also there is a Monitoring System to check the availability of the servers.
If an error occurs the monitoring system recognize it, create a message for the support team and store the failure in a db2 datawarehouse.
Also the Monitoring Systems stores the status for many components like CPU_Usage, Memory_Usage,... for every Server every 15Minutes.
The idea is now to do analytical tasks on the datawarehouse. I want to see if there is a dependency beetwen a failure that occure on one server and the cpu usage on another one.
The Vision is to get a rule like this: "If it is Monday at 8:15 and Server1 CPU_Usage>85% and Server2 CPU_Usage>91% then exists a plausibility of 77% that failureX will occure"
In the first step i want to create an Time Cluster Template like shown:
[img=http://img13.imageshack.us/img13/3774/templateg.th.jpg]
So I can save the count of failure in that scheme, and i can save the mean values of the CPU_Usage for every Server in that scheme.
If i have these tables i can take a look at a failure timestamp, check the CPU_Usage at this moment with the "normal CPU_Usage" and decide if it is in a normal area or not. (for every Server). In the final i want to have a table, where the failure timestamp is stored an "y" if a CPU_Usage of a server is in abnormal area and a "n" if it is normal value. On this table i want to use a association algorithm, to get a rule like descibed above.
Here I want to discribe my proceeding (Red "V" are variables):
[img=http://img26.imageshack.us/img26/8751/modellc.th.jpg]
My Question is if i have to write an tool to do the extraction in the timeslots or is there a easier way to get what i want.
Thanks for all your help
Chris
P.S.
I know that this analytic is not very interessting, because a failur in a sever farm have many causes. But its just a first experiment for me to get into dataming.
Tagged:
0
Answers
-
Hi Together,
is there someone who can help me?
Or could it be that my english is too bad and no one understand me? ???
Thanks0 -
Your English is fine, no problem! You can easily do this with Rapidminer, analysing computer logs is pretty standard; but you will need to work through the manuals and examples. As ever the "easier" way is to get get someone else to do it, but then you would expect to pay. Your call.My Question is if i have to write an tool to do the extraction in the timeslots or is there a easier way to get what i want.
0 -
Hi Haddock,
thanks for your reply. I just have 2 weeks to finish my task.
And I just wanted to know how i use this short time:
- geeting deeper in RM or
- writing a tool in java that creates the table i need as described
So I hope you are right and i can do everythink in RM.
I know will read the manuals and do the examples.
Thanks for your help
Chris0 -
Hi Chris,
Just alter the inputs for the association rules example, and see what you get! Shuffling the data around is easy enough when you know how, but a little daunting to start with, so don't hesitate to post here if you get bogged down. Good luck.
0 -
Hi Together,
i wrote an java application that creates the csv files that i need.
Now I have a csv file for every failure with the following colums:
Srv1 Srv2 Srv3 Srv4
1 1 0 1
1 0 1 1
...
The 0 shows if a CPU_Value is not in a normal area.
The 1 shows if a CPU_Value is normal.
I now want to use every file as input for FP_Growth.
Before I do that i convert the numerical values to binominal.
Now i have the following table as input for FP_Groth:
Srv1 Srv2 Srv3 Srv4
true true false true
true false true true
...
The FP_Groth shows a false result.
It show me the frequent items containing the "false"-value.(These are not frequent items)
I changed the input table for the FP_Growth into the form, that i negate the result:
Srv1 Srv2 Srv3 Srv4
false false true false
false true false false
...
but again the result of FP_Groth are the items that are not frequent.
Anyone have an idea what i can do to get the correct freuquent items?
I need them for input of the association algorithm.
Thanks for help
Chris
0 -
Hi Chris,
Good to see you're making progress, here's a small example that may help. Hope so.<operator name="Root" class="Process" expanded="yes">
Good luck! Please let us know how you get on.
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="random"/>
<parameter key="number_of_attributes" value="4"/>
</operator>
<operator name="FeatureNameFilter" class="FeatureNameFilter">
<parameter key="filter_special_features" value="true"/>
<parameter key="skip_features_with_name" value="label"/>
</operator>
<operator name="UserBasedDiscretization" class="UserBasedDiscretization" breakpoints="after">
<list key="classes">
<parameter key="0" value="0.0"/>
<parameter key="1" value="99999.0"/>
</list>
</operator>
<operator name="Nominal2Binominal" class="Nominal2Binominal">
</operator>
<operator name="FPGrowth" class="FPGrowth">
</operator>
<operator name="AssociationRuleGenerator" class="AssociationRuleGenerator">
</operator>
</operator>
PS For messing around with the rules you could look at http://rapid-i.com/rapidforum/index.php/topic,778.0.html
0 -
Hi haddock,
many thanks for your example.
Now I get the results I needed. (Association rules, that discribe which servers could be responsible for an failure)
First this is enough for me. Now I am going to write my Bachelor Thesis about knowledge discovery in tivoli datawarehouses for the early detection of errors in siebel crm systems.
Thanks a lot for your help. ;D
Chris0