Downloading a webpage for every 5 minutes?

MikeR
New Altair Community Member
Hi everybody,
I'm new to this forum, so i hope i have posted this the right place.
I am doing my bachelor thesis about an online forum, and thereby want to monitor the activity on the forum.
At the front page www.lydmaskinen.dk there is a # of people online at that particular time in the bottom of the page
- does any of you know a way I can download this information for every 5 minutes in a given time period?
I thought about downloading the whole sourcecode/webpage for every 5 minutes, and afterwards just manually log the data in an excel spreadsheet.
There might ofc. be a much more clever way around this, but I consider that a luxury problem at the moment.
But does anyone know a simple way of doing this?
Thanks,
- Mike(DK)
I'm new to this forum, so i hope i have posted this the right place.
I am doing my bachelor thesis about an online forum, and thereby want to monitor the activity on the forum.
At the front page www.lydmaskinen.dk there is a # of people online at that particular time in the bottom of the page
- does any of you know a way I can download this information for every 5 minutes in a given time period?
I thought about downloading the whole sourcecode/webpage for every 5 minutes, and afterwards just manually log the data in an excel spreadsheet.
There might ofc. be a much more clever way around this, but I consider that a luxury problem at the moment.
But does anyone know a simple way of doing this?
Thanks,
- Mike(DK)
Tagged:
0
Answers
-
Hello Mike,
you can use the webmining and Text mining extension to get the information. It works quite good with a small regular expression.
Attached is a process extracting the number of registered users. It's straight forward to get the number of guests.
You can run this process on a RapidMiner Server automatically. Then you can directly store the information in a repository and work with it. There is by the way an academic program which would allow you to get a rapidminer server for your thesis. If you need more information just write an email to me: mschmitz@rapidminer.com
Best,
Martin
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="6.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="web:get_webpage" compatibility="5.3.002" expanded="true" height="60" name="Get Page" width="90" x="112" y="30">
<parameter key="url" value="http://www.lydmaskinen.dk/index.php"/>
<list key="query_parameters"/>
<list key="request_properties"/>
</operator>
<operator activated="true" class="web:extract_html_text_content" compatibility="5.3.002" expanded="true" height="60" name="Extract Content" width="90" x="246" y="30"/>
<operator activated="true" class="text:documents_to_data" compatibility="6.1.000" expanded="true" height="76" name="Documents to Data" width="90" x="380" y="30">
<parameter key="text_attribute" value="Data"/>
</operator>
<operator activated="true" class="text:generate_extract" compatibility="6.1.000" expanded="true" height="60" name="Generate Extract" width="90" x="514" y="30">
<parameter key="source_attribute" value="Data"/>
<parameter key="query_type" value="Regular Expression"/>
<list key="string_machting_queries"/>
<list key="regular_expression_queries">
<parameter key="registered users" value=" .*users online.*([0-9])\sregistered.*guests.* "/>
</list>
<list key="regular_region_queries"/>
<list key="xpath_queries"/>
<list key="namespaces"/>
<list key="index_queries"/>
<list key="jsonpath_queries"/>
</operator>
<connect from_op="Get Page" from_port="output" to_op="Extract Content" to_port="document"/>
<connect from_op="Extract Content" from_port="document" to_op="Documents to Data" to_port="documents 1"/>
<connect from_op="Documents to Data" from_port="example set" to_op="Generate Extract" to_port="Example Set"/>
<connect from_op="Generate Extract" from_port="Example Set" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>0