🎉Community Raffle - Win $25

An exclusive raffle opportunity for active members like you! Complete your profile, answer questions and get your first accepted badge to enter the raffle.
Join and Win

Open an existing project

User: "Decrypter"
New Altair Community Member
Updated by Jocelyn
Hello,

I am of course a newbie, and I am trying to open an existing project which is the following:
the project has a lot of resources, and I am unable to open them (you can see that in the picture)
The second problem is I can't open the .md files which contain the data for the project.

And for the file with the extension .properties i don't know how and where to use them,

for example the clustering. properties contains the following scrip

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "">
<properties>
<comment>Properties of repository entry Clustering</comment>
<entry key="owner">zhaohengrui</entry>
</properties>

I search a lot for a similar thing, but I didn't find, Apologies if this is has been asked before and I didn't see it.

Thank you




Find more posts tagged with

Sort by:
1 - 7 of 71
    User: "Marco_Barradas"
    Altair Employee
    Accepted Answer
    @Decrypter it seems to me that you need to create a repository out of that folder.

    Follow the images steps.





    Give it a name and Click on the folder icon and search for your file folder.

    That would load all your file structure into RM and you can work with it.

    User: "Marco_Barradas"
    Altair Employee
    Accepted Answer
    @Decrypter

    It seems that the stored objects on the  folder are broken.

    The good news is that you can rebuild the data sets with the excel files that are provided on the DataSet Folder.

    You just need to use a read excel operator and a nominal to text to execute everything else.
     
    I'm uploading an example of what you will need to do. But everything is described on the Report of Automated Job.pdf file

    User: "Marco_Barradas"
    Altair Employee
    Accepted Answer
    Hi @Decrypter

    The issue you are having id due to the number of attributes the process documents operator is throwing (more than 1.5k) that will take a lot o memory and time to create the clusters.

    You need to do 2 things to fix that issue:
    1. On the pdf the mention that they applied a filter dictionary with a list of words provided. You can find that filter words on the JobStopwrods.txt
    2. I suggest you prune the output of the process documents an use the prune by ranking method.
    There are a couple of steps that the mention on their document that are not done on their process so you'll need to fix a couple of things.

    You'll have a better understanding on what is happening if you take the Text Mining course on our academy. Its free!!!
    https://academy.rapidminer.com/learn/course/text-and-web-mining-with-rapidminer/text-and-web-mining/lets-get-started

    Attached you'll find the second version of the process to get you started.

    Have a great weekend.

    User: "Marco_Barradas"
    Altair Employee
    Accepted Answer
    Hi @Decrypter,

    Please check the changes I did on the process I shared before.

    The error you are getting is related to the type of columns you are outputting on the Process Documents. On my process I read the excel file and then I apply a Nominal to Text operator before I use the Process Documents Operator.

    That one tells RapidMiner that the two columns should be treated as text and that will remove the error you are seeing.

    For the second comment on your post related to the fine tunning (changing cluster_1 and the other to other text) you'll need to use a Map Operator. In that one you can provide a list of word that will replace the values of the clustering output to whatever text you like to use. 

    If you have doubts on how any operator works go to the help provided on each operator. You can even see some examples if you go to the lower part of the help text.


    User: "Marco_Barradas"
    Altair Employee
    Accepted Answer
    Hi @Decrypter
    The files process under the folder 1 Clustering are not consecutive steps.
    They are multiple analysis they did to the same data set Unlabelled Job Posting Dataset you can create the same DataSet (DS)  by running a process with these two operators I used on the process I shared + a store operator 

    And point that store object to the Folder 1 Clustering by doing that you'll be able to run the other process without any error.
    Please check the process I share before for other adjustments you'll need to do before you run the Process Documents from Data operator. If you don´t adjust them the process may take all your memory.

    You are getting closer.

    User: "Marco_Barradas"
    Altair Employee
    Accepted Answer
    Hi @Decrypter

    The files you downloaded do not seem to be final versions.

    For the first error the issue is with the connection of the output of the Clustering Operator they need to be connected in another way.

    Check on the Help for that operator. 

    For the second error you'll need to have a label (column you want to predict) again the process that is show in here is wrong.
    It needs to work with the output of the Clustering Process.
    /1 Clustering/Labelled Job Posting Dataset (K-Means) in the pdf they mention they are going to create a model to predict the type of job offer. That would be the label.

    On the text the mention you need to convert each cluster to a word.
    For that you can add a MAP operator with the list of word and the word it needs to replace.

    I would stop my help at this point since with these examples you have enough answers to adapt all the other process that you'll open throughout the folders.

    I strongly recommend going to https://academy.rapidminer.com/
    for more in depth videos on how to achieve the multiple tasks your project needs.

    Your process should look like the image below
     

    Enjoy the weekend.
    User: "Marco_Barradas"
    Altair Employee
    Accepted Answer
    @Decrypter
    You need to store the wordlist output from the Clustering process with a Store Operator as a DS.
    Then you'll need to use that DS and connect it to the Process Document wor input port (It will tell the operator which columns you want to keep) Remember that the Data you use to score (ResumeData) should have the same # of attributes with the same names and types for any model that you would like to score.
    You will also need to set the label attribute as a Label with the Set Role operator.
    Check if all the process are pointing to the folder in which the data is stored on your computer. The process I have shared should help you understand the changes you need to change on the process that are stored in your Rapid Minner Repository.