nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Siemens Community Catalyst Program

The Siemens Community Catalyst program was co-created with our community to acknowledge technology leaders who consistently contribute to the Siemens Community. Nominations are accepted on a rolling basis.

Nominate Now

Topic Modeling for PDF files

Karissa

Hello everyone,

I want to read several PDF files (business reports) and analyze them. Until now I use the operator Read Douments, because I haven't found a better operator yet.
I want to do a topic modeling on the files to find out relevant topics. A pre-processing is done by the operators Tokenize, Transform Cases, Filter Stopwords, Filter Tokens by Length and Stem. For this I have found the two operators: Extract Topics from Documents (LDA) and Extract Topics from Data (LDA). Unfortunately both do not work properly.
Extract Topics from Documents( LDA) needs a collection as input and I don't know how to get it.
And Extract Topics from Data (LDA) needs a text attribute and again I don't know how to get it.

Accordingly, I have these two questions:
1) Is there an operator I can use to read in multiple PDF files?
2) What is the best operator for Topic Modeling and how do I implement it?

I have created the process below, it runs, but I only get null values as results. Does anyone have a tip for me?

Image: https://us.v-cdn.net/6038102/uploads/editor/c8/v9lmn8rp8nrg.png

Many thanks for the help

Find more posts tagged with

AI Studio

PDFs

Accepted answers

MartinLiebig

Hi,

likely the texts are for some reasons empty?

BR,

Martin

All comments

MartinLiebig

Hey,

I think what you want to do is use Loop Files, to loop over your files and then use Read document inside. What you will receive is a collection of documents, which you process as needed.

Cheers,

Martin

Karissa

Thank you @MartinLiebig . The Loop Files Operator worked.

The process runs through, but all results are zero/null. What could be the reason for this?

Image: https://us.v-cdn.net/6038102/uploads/editor/h3/sugllyajr43o.png

Many thanks

MartinLiebig

Hi,

likely the texts are for some reasons empty?

BR,

Martin

Karissa

I have changed the process and now I get a result. Many thanks.