nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

Text extraction of key themes/words from series of pdf files

pimlico35

Hi Folks,

Im new to this & struggling a little bit

I just wanted some easy (explicit) steps to help me achieve what I want to do, which is:

I have a series of mostly pdf reports;

- I want to extract key themes or words that recur throughout the reports, for example 'serious accident' or 'safety'

What I have done so far is to put all these files into a new repository. I have tried to use operators to read through the files, tokenise etc - but Im getting lost in translation so to speak

- Im not sure whether I have to convert the pdfs into word files - if that makes it easier before getting it into rapidminer; but that seems to defeat the whole purpose ....

- I want to then have a document or table of these extracted common occuring words so I can see how often they are used. Later then I can also check in the output document the least used words...

I would really appreciate any help or pointing me in the direction of videos that explicitly look at this.

thanks so much!

Find more posts tagged with

AI Studio

PDFs

Accepted answers

All comments

There are no accepted answers yet