Coreference resolution with RapidMiner: how to begin?

Unknown
edited November 5 in Altair RapidMiner
Dear All,

I was playing with RM for some time, but it's time to do something real now – and I don't quite know how to proceed. The task is direct nominal coreference resolution, i.e. clustering together sets of mentions from the text given a series of documents with properly clustered mentions.

To make it as simple as possible, I guess we can exclude text processing from the whole process and have the data represented as a table with tokens in rows and attributes in columns (attributes containing the usual properties, starting with gender, number – up to some more complex ones).

Issue 1: does such representation make sense? How can we represent different documents (with another attribute, doc number?) and clusters (with cluster number?) How validation should be organized? If we have documents as samples, not just tokens, how should the clusters be represented? Please advise.

Issue 2: how should the process be organized to make it work? Can you suggest anything?

Best,
Andreas
Tagged: