"how to calculate the distance between two text documents?"

gfyang
gfyang New Altair Community Member
edited November 5 in Community Q&A
Hi,

Suppose here are two text documents, d1 and d2. I could build two vectors and read them by Iterator<Example>. Then, how to calculate the distance or similarity between them? For example, the cosine distance. Is there any operator or function provided by RM?

Thank you very much.

Sincerely yours,
gfyang

Answers

  • fischer
    fischer New Altair Community Member
    Hi,

    yes. Please look into com.rapidminer.tools.math.similarity.DistanceMeasure

    Cheers,
    Simon
  • gfyang
    gfyang New Altair Community Member
    Hi,

    Thanks a lot for the reply. However, it is still not clear enough for me. Would you please give some Java codes?
    I tried the following, but failed:

    ExampleSet ex=...
    Example ex1 = ex.getExample(1);
    Example ex2 = ex.getExample(2);
    DistanceMeasure myDis = new DistanceMeasure();
    double dis = myDis.calculateDistance(ex1, ex2);
    It reported DistanceMeasure could not be instantiated?

    Thank you.

    Sincerely yours,
    gfyang
  • fischer
    fischer New Altair Community Member
    Hi,

    distance Measure is abstract. You can only instantiate its subclasses.

    Also, if you are using a distance measure at an operator, try installing a DistanceMeasureHelper.

    Cheers,
    Simon
  • gfyang
    gfyang New Altair Community Member
    Hi,

    I see. The subclasses work well. Thank you.

    Sincerely yours,
    gfyang