Altair RISE
A program to recognize and reward our most engaged community members
Nominate Yourself Now!
Home
Discussions
Community Q&A
Select column with non-zero value
ElenaVet
Hi everybody!
I've calculated TF-IDF with "Process document from data" and I found a matrix that have a word in every column and a body for every row and every cell of the matrix cointains TF-IDF's value. Now I filter by cluster, creates with k.means, and I want to see only columns with values non-zero. I firstly thought to do a sum of every column's value (with Aggregate) and take only those with sum greater than zero, but I also think that it's a mistake do the sum of TF-IDF and all the analysis would be distorted, so can you please tell me a solution to filter only columns with at least one value different from zero?
Thanks you so much!
Find more posts tagged with
AI Studio
Text Mining + NLP
Term Frequency + TF-IDF
Filtering
Accepted answers
All comments
Telcontar120
Have you tried looking at the cluster centroid output? This is essentially giving you the average value for each cluster for each attribute. You should be able to filter that more easily.
If you don't want to use that approach, you would need to loop over each cluster, do an Aggregation using the Max function and remove those attributes that have a max value of zero.
ElenaVet
Hi
@Telcontar120
thank you for your answer! I found the cluster centroid output, as you suggested, but i don't really understand the value of every cell, can you explain me, please? I attach the screen of my results.
Telcontar120
Cluster centroids are showing the average value of the word vector metric (using whatever parameter metric you selected such as TF-IDF) for each cluster for each attribute. You can see, for instance, the cluster that has the highest value for the token "aapl" is cluster 12. You can use this to understand what attributes are most dominant for any particular cluster by sorting and filtering. You can also compute differences between clusters if you like.
I noticed you have a lot of clusters. This can sometimes make interpretation difficult, you should probably also think about whether you have a need for this many distinct clusters. Or you could try another approach beyond k-means such as LDA analysis.
MartinLiebig
Hi,
too add one more thought: The operator Extract Cluster Centroid gives you that table as an example set to work with.
Cheers,
Martin
Quick Links
All Categories
Recent Discussions
Activity
Unanswered
日本語 (Japanese)
한국어(Korean)
Groups