I am trying to capture reviews about a specific product from amazon in order to do sentiment analysis by applying a classification model to predict positive or negative attitudes. Two questions:
1) Regarding getting the data: How do you limit the crawl to just the reviews. The reviews for the product are several pages long, each page link looks like this:
http://www.amazon.com/Rainbow-Loom-Twistz-Bandz/product-reviews/B00DMC6KAC/ref=cm_cr_pr_btm_link_2?ie=UTF8&;pageNumber=2&showViewpoints=0&sortBy=byRankDescending
...with the pageNumber number in the link changing based on the page number of course. I want to crawl just these pages, but each review page has tons of other links eg to amazon.com, to online ads etc. Is there a character (like *) that I can use instead of the page number to specify that I only want to crawl only these links?
2) How can I get individual reviews (several on a page) into its own text document (or maybe its own field in a database record) so it can be classified?