Aggregating Categorical Values - Music Genre

zsteiner
zsteiner New Altair Community Member
edited November 5 in Community Q&A
I'm working with a dataset that has multiple genres per entry. For example, one row might have g1 = rap, g2 = demotrack, g3 = polish trap. None of these genres can be said to be the "primary" genre, so all need to be retained. I am attempting to train the set to predict the genre value, but am having a hard time finding a way to make a single genre column with multiple values per row. Is there a way to do this? Any suggestions are appreciated and I am happy to clarify.

Best Answer

Answers

  • lionelderkrikor
    lionelderkrikor New Altair Community Member
    Hi @zsteiner,

    Can you share a sample of your dataset and from this sample give an example of what you want to obtain ?

    thanks you

    Regards,

    Lionel
  • zsteiner
    zsteiner New Altair Community Member
    edited February 2019
    @lionelderkrikor
    Be happy to. My goal is to find song attributes that can be used to predict song/artist genre. A single song/artist pair can be described by more than one genre at a time, with none being more "correct" than another. In the attached data sample for instance, Empire of the Sun can be categorized as electropop, indietronic, and new rave simultaneously. This is why they should be listed together in a single field and not as "genre_1", "genre_2", because there is no inherent order here.

    I want to train a model to predict the genre of a song using all of an artist's genres as training targets variables. However, if I were to combine all into one "genre" column, the model will treat each combination, however similar, as a different target. For example, the model will treat the artist genre arrays  [rock, grunge, nu-metal] and [nu-metal, grunge, indie-rock] as totally distinct responses, despite being virtually identical. 

    I'm looking for a way that I can train a model using all of a song's genres, but to receive only a single genre as prediction output. So, is there a way to have distinct multiple genres in a single column that won't be treated as a single value?
  • rfuentealba
    rfuentealba New Altair Community Member

    I would do this:
    • Put the same song from your training data with different genres on each row, like this:
    Deep Purple, Child In Time, Rock, ...
    Deep Purple, Child In Time, Ballad, ...
    Deep Purple, Smoke on the Water, Rock, ...
    • Create a list of genres (select attributes and filter duplicates might do the work).
    • Use loops to train one or a few algorithms per genre (e. g., one for rock, one for pop, one for jazz...). You could use "Validate" and "Optimize" to get the best results for each. Probably Naïve Bayes sounds good.
    Loop over the examples on your testing data and apply all the models to these algorithms. The "Loop Examples" will allow you to get a list of genres a song can be classified for, as a list.

    So, if a song is in A minor and it's 5 / 4, it will never ever be a Cumbia, but it can be Jazz, Rock or Classical.

    I made a few things in the past using this approach and it works reasonably well. Hint. It's not a 5-minute work but more of a 3-hours one.

    Hope this helps. Will elaborate more once I get my AC adapter.

    Rodrigo.