🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Automatic Text Signal Finder for Binary Response

User: "Noobie"
New Altair Community Member
Updated by Jocelyn
I have 2 datasets:

Dataset 1 - this has the response variable and some potential categorical predictors (the response is 1 or 0). Each entity has a unique record (let's call them entities A to Z)

Dataset 2 - this has thousands of records with lots of text for each entity. So each entity could have thousands of rows, each with paragraphs of information

I want to predict the response in Dataset 1 based on the text information in Dataset 2. So here is what I think should happen next:

1) Concatenating the thousands of rows for each entity in Dataset 2 such that the resulting table is one row per entity (with a ton of text information per record). 

2) Join Dataset 1 with Dataset 2 based on entity ID

Assuming above is correct so far (please correct if better way as I haven't done this yet), I am wondering if there's a ML algorithm that could find me all the words/phrases/fuzzy combos that are predictive of the response variable in dataset 1. Please advise!

Thanks!

Find more posts tagged with