🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Building Similarity Matrix

User: "tutur"
New Altair Community Member
Updated by Jocelyn
Hi all,

My problem is as following: Given two groups of documents, I want to compute Cosine similarity and output a similarity matrix with all the possible comparisons. The matrix should contain the names of documents (and not terms).

For the moment, the operator pipeline is:

Process documents from Files -> Data to Similarity

My questions:

1) Is it OK to use Process documents from Files operator and in text directories create 2 entries with different documents to compare (so I will have 2 class names,i.e. 2 folders with different documents to compare)

2) What is the operator that allows to visualize a document similarity matrix?

Any advice is very much appreciated!

Find more posts tagged with

Sort by:
1 - 3 of 31
    User: "Andrew2"
    New Altair Community Member
    hello tutur

    I don't quite understand what you mean for your first question - but the Process Documents from Files operator does work like that.

    For the second question, use the Data to Similarity Data operator and plot the result using the Block Plotter. You will have to do some lookup and replacing work to get the document names instead of an id.

    regards

    Andrew
    User: "tutur"
    New Altair Community Member
    OP
    Hello Andrew,

    Thank you for reply!

    I asked the 1st questions because I saw another way of document processing, the pipeline is like this:

    Loop Files - (Read Document as nested process)  -> Process Documents

    I was wondering if there is any principle difference with just Process Documents from Files operator.

    The Data to Similarity Data operator is indeed what I need but I can't find info how to replace id's with document names ...could you please provide any suggestions?
    For the moment, Block Plotter : x-Axis - First_ID, y-Axis - Second_Id..
    May be it's possible to replace them in Data View table?

    Thanks again,


    User: "Andrew2"
    New Altair Community Member
    Hello tutur

    The Process Documents operators are basically the same.

    You could use the Join operator to match document ids in the similarity matrix with the original. You would need some renaming and set role steps.

    The Map operator might be easier.

    regards

    Andrew