Altair RISE

A program to recognize and reward our most engaged community members

Nominate Yourself Now!

"[SOLVED]Using k-means clustering on web log data"

I have a data set from a access web log file which I'm interested in finding similar clusters. (I'm an absolute beginner of data mining). So far I have referred many research papers on the same problem domain.

An Efficient Approach for Clustering Web Access Patterns from Web Logs
http://www.sersc.org/journals/IJAST/vol5/1.pdf

Classifying the user intent of web queries using k-means clustering
http://faculty.ist.psu.edu/jjansen/academic/jansen_user_intent_kmeans.pdf

I want to use k-means clustering to cluster web pages. Although these papers discuss about the algorithm, they do not specify the way of providing input data set. k-means calculate similarity between data points using Euclidean distance. So how to normalize my dataset to be mined using k-means since urls can not directly used for k-means. Any help/good reference on this?

Example Dataset(p1..pn are different web pages)

p1,p2,p3,p4
p1,p2
p1,p5,p6,p7
p1,p2,p3,p5

Find more posts tagged with

AI Studio

Clustering

Accepted answers

All comments

ighyboo

Hi Star,

I'm not an expert but the way I would approach the problem is to create a table with p1...pn as columns and individual users as rows.
The values filling the table would be the count of how many times a page has been visited by the user. UserID p1 p2 p3 .. User1 1 1 1 1 User2 1 1 0 0 User3 1 0 0 0

Just an idea..

star

Hi ighyboo,

Thanks for the reply, this is what exactly ended up in doing.