Removing StopWords using Dictionary

Hyram
Hyram New Altair Community Member
edited November 5 in Community Q&A
Hi
I am using my own dictionary to remove Stopwords. On close analysis, words like "is" are not being removed, although they are in the dictionary. Any clue as to why this is happening?
Thanks,
Hyram

Best Answers

Answers

  • kayman
    kayman New Altair Community Member
    Can you share your process? No need to add data, just the process itself.
  • Hyram
    Hyram New Altair Community Member
    edited June 2020
    Yes sure, thanks @kayman
    Attached

    For the dictionary, I am using NLTK stopswords. Not sure if my encoder setting is right?
  • Hyram
    Hyram New Altair Community Member
    @kayman thanks for looking. Some answers to your questions:
    1. I am using 'non-letters' to tokenise my words and it seems to work. No full sentences as a result;
    2. Correct, I transform to lower case;
    3. Correct - I filter by length of 2 i.e. any characters with < 2 are out
    4. You have a good point as I have not checked this. I basically cut and pasted it into a Word doc

    I initially used 'filter Stopwords (English)' but it was excluding words like 'like' which I wanted to keep.
    Thanks!
  • Hyram
    Hyram New Altair Community Member
    Thanks @kayman
    Really appreciate your help! Will try what the operator notes suggest which is inline with what you are saying re txt format.
  • Hyram
    Hyram New Altair Community Member
    @kayman
    Your suggestion re file format worked. Thank you!