"Mining online reviews for sentiment analysis"

Question

I am trying to capture reviews about a specific product from amazon in order to do sentiment analysis by applying a classification model to predict positive or negative attitudes.   Two questions:

1)  Regarding getting the data: How do you limit the crawl to just the reviews. The reviews for the product are several pages long, each page link looks like this:
http://www.amazon.com/Rainbow-Loom-Twistz-Bandz/product-reviews/B00DMC6KAC/ref=cm_cr_pr_btm_link_2?ie=UTF8&;pageNumber=2&showViewpoints=0&sortBy=byRankDescending

...with the pageNumber number in the link changing based on the page number of course. I want to crawl just these pages, but each review page has tons of other links eg to amazon.com, to online ads etc.  Is there a character (like *) that I can use instead of the page number to specify that I only want to crawl only these links?

2) How can I get individual reviews (several on a page) into its own text document (or maybe its own field in a database record) so it can be classified?

sourabhchoudhar · Answer

Hi Marius

I want to search for the related valuable information about specific key word or key name on the web(social Media & Forums, Blogs, Search Engines, News websites,News Blogs etc.)using Rapidminer. Please help me How can I do it..

Thanks

Sourabh

sourabhchoudhar · Answer

Hi Marius,

Thanks for your Suggestions. I am trying over the combinations of filters with Crawling rules. ASAP I will be able to do exactly what I want, I will share at forum.

Regards

Sourabh

MariusHelf · Answer

Hi Sourabh,

that depends completely on the websites - you have to define the correct crawling rules, maybe combined with filters on the retrieved documents afterwards.
Unfortunately there is no general rule, you really have to look into the structure of the websites.

Best regards,
Marius