"How to split a tokenized document into several documents?"

MSEMIS
MSEMIS New Altair Community Member
edited November 2024 in Altair RapidMiner
Hello,

We download a page from the web with the module "Get Page" and split the relevant data with "Extract Content", later we want to tokenize each extracted div as a single document.
Is there a possibly to do this?
With "Cut Document" we need regular expressions but every div has a different content (movie reviews) and the HTML tags are also removed.
So we are wondering how to define a splitting point for different reviews.

Thanks in advance

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.