🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

pdf to database

User: "r_esmaeilzadeh1"
New Altair Community Member
Updated by Jocelyn
Hellow  everyone
I am a new member and had studies about the software but I have a problem:
I need to read a lot of PDFs, delete the references sections, categorize them by year of publication, and then do the text mining and found The most repetitive words.
 how can I do that?
Thanks in advance for your guidance

Find more posts tagged with

Sort by:
1 - 1 of 11
    User: "kayman"
    New Altair Community Member
    If your pdf's are based on text (so not scanned) you can use the read document operator ans select pdf as format. This will convert the pdf to a plain text file.

    Next you can use the replace operators and regex to strip what you don't need and use the document to data operators for the mining part.