Patents mining
pfb
New Altair Community Member
Hello,
I'm a real newbie and am posting here to ask for help.
In the context of a patent set analysis, I got an extraction (csv/xlsx) of a list of patents, in a semi-structured format: in rows, I have patents, in columns, attributes (aka patent title, abstract, novelty, etc.).
Given the large size of the patent set (>6500 hits), I would like to automate the patent analysis as follows:
1- identify topics (keywords) for each patent
2- cluster patents based on these keywords
3- display clusters with their respective weights
I assume that 1 and 2 can be done through Rapidminer, while 3 could be done with Gephi. But it is only an assumption, as I am a real beginner here: I have never used Rapidminer.
Therefore, any indication on feasibility/guidance on how to start would be really appreciated.
Thank you,
Peter
I'm a real newbie and am posting here to ask for help.
In the context of a patent set analysis, I got an extraction (csv/xlsx) of a list of patents, in a semi-structured format: in rows, I have patents, in columns, attributes (aka patent title, abstract, novelty, etc.).
Given the large size of the patent set (>6500 hits), I would like to automate the patent analysis as follows:
1- identify topics (keywords) for each patent
2- cluster patents based on these keywords
3- display clusters with their respective weights
I assume that 1 and 2 can be done through Rapidminer, while 3 could be done with Gephi. But it is only an assumption, as I am a real beginner here: I have never used Rapidminer.
Therefore, any indication on feasibility/guidance on how to start would be really appreciated.
Thank you,
Peter
Tagged:
0
Answers
-
Hi Peter,
indeed there exists a couple of projects
where RapidMiner is the key tool to analyse patent
data. Using the text mining extension documents can be tokenized and
clustered based on word vectors. It doesnt matter whether your
documents/patents are spread over a file system or already put into
an excel sheet/data base.
Especially TF-IDF transformation and n-Grams are used to segment patents effectivley.
We offer a training on this at 21./22.5.2014 in Dortmund.
- Frank0