-
Mining Tweets from Twitter posted between Jan 1 and Dec 31 2022, and, Jan 1 to Dec 31 2019
Hi Community. Please can I ask you for some help and guidance on the above topic. I have looked through the archive discussions and I can not find anything related to this topic so far. I am researching twitter threads between Jan 1 - Dec 31 2019, and Jan 1 - Dec 31 2022, for discussions on a number of keywords that are…
-
Balanced classes in a unbalanced dataset with multiple classes
Hi all, I am new on this platform and I am struggling with balancing the classes. When I create a model for my binary dataset I can use the sample operator or the SMOTE upsampling operator to balance my classes. When I run a model with three (or more) classes the sample or SMOTE upsampling does not make my classes…
-
Sampling (Balancing) and Cross validation
hey everyone I want to train a decision tree model and I already use a cross validation operator for training my model. However I also need to balance my data since I have two classes from which one is repesented much less times. I am concerned now how to use the samling Operator. I know how to use it to balance my data, i…
-
Creating equally sized clusters that are representative for the population
Hi all, I have a set of data (population) with individuals that have signed up to be a part of a group. When they signed up they gave some background information, leaving me with 5 variables that I am mostly focusing on. What I want to do is create 4 equally sized groups that are as representative for the whole population…
-
Newbie - expected performance output -after using the sample operator
Hi, sorry for the beginners question... I have a data set with 30,000 lines. The target variable is imbalanced : total false: 24000 / total true: 6000. So I have used the operator "sample" to balance it ( 1000 each) . At the end the performance classification operator gives the confusion matrix with only 2000 results (…
-
Please HELP: how to use a CSV table of Tweets instead of document for sentiment analysis
So, I checked the sentiment analysis example. To predict sentiment, it takes on only a document. What if I have a CSV table with the Twitter user as a column, and the tweet text as another column. I want to run the prediction model on all of them, and get sentiment for each user (and his tweet)? When I try to do that, and…
-
Selecting samples for attributes whose values contributes the most
I have a attribute job which is a label and has 15 different values. Out of 1000 samples, 7 values contributes to 950 samples and remaining 8 values contributes to 50 samples. I want to use only the 950 samples (i.e 7 values only) and ignore the rest. How do I select the values of the label which contributes the most to…
-
Sample: balance data: what should I put as *class* when sample size per class?
Hello everyone! I have a problem with the balance data in the operator sample. When I have to sample size per class what should I put for class? I have one variable that I have set to 'label' and that is the variable I want to balance the data on. The error when I try for example class='label' or the name of the class (in…
-
Cannot change k-value for the sample kmean with plot.
hello community. I need to do k-means cluster using rapidminer and produce elbow method graph. i've tested the k-means clustering with plot sample. However, when I try to change the k value from 13 to 5 for the k-means operator and run, the k value does not change instead it turn back to 13 and produce 13 cluster. Can…
-
Automatic sampling_type in Split Operator
Hello all, Just one quick question if anyone has any idea on this. For the Automatic sampling_type in Split Operator, it is said that it will use stratified sampling if the label is nominal, shuffled sampling otherwise. What if the label is polynominal? It will be used stratified sampling? Because I have imbalanced classes…
-
What is the minimum sample of SVM?
Hi, I use Auto Model to conduct SVM regression. I wonder apart from 100, which is the minimum sample needed for Rapidminer, is there any way to justify the ideal number of SVM to provide reliable results? Thank you!
-
In Rapidminer Sample 04 S
I trained this sample model and use store operator to store model. It is ok. After, I delete the close column in 04 S&P500 original data as new data for prediction. I use retrieve to load model and use amended S&P500 data. I am failed to run. What is the problem ? In subprocess and Normalize, I amended to "all" in 'select…
-
Why does Naive Bayes return a confidence either 0 or 1 for every sample?
I'm just guessing but is this telling me that there is some attribute the algorithm is keying on and discarding everything else? Is there a way to take the results and look at the predictions + the other attributes together in a correlation matrix to see if that is the case? I can't picture that with NB. Seems more of an…
-
Prediction Model + Result Analysis
Dear, I have (24 Columns, and 5100 Rows) Data that contain the following attributes [Dengue Fever Data(district name, gender, nationality, week and year of record the case), Air quality Data (temperature, Humidity, rainfall, and other)], for the period between 2010 to 2018. I would like to create a prediction model that…
-
Sampling or filtering transactions down to last X per customer
I have a large exampleset with every single transaction completed by each customer over a long time frame. I want to filter this down to the last X transactions that each customer did during their lifetime (e.g. up to the last 5 - if they did fewer than this then at least the ones they did). I really don't want to have to…
-
Numer of samples in deep learning problem
Hello. I am solving a deep learning prediction problem and when I try to train more than 100k sample the Neural Network just catch around the 20% of the training samples set. Anyone knows why?Thanks
-
W- sample cart
What is 1 se rule to make pruning decision and heuristic method for binary split?
-
W-sample cart and W-j48 parameters
hello every one . Am using cart and c4.5 in my rapidminer studio for my thesis purposes . As rapidminer doesnot support cart and C4.5 so i use W-samplecart and W-j48. but i didn't know about the algorithm parameters ..can please any one help me out in this ... and how i mention gain ratio and information criteria for these…
-
Requesting sample model for clustering classification
My task is to cluster the data first and classify each cluster parallelly (as the following diagram), I will be appreciated if anyone suggests me how to implement this model effectively in rapidminer . Dataset --> Clustering (n cluster) -> classification (parallel processing to each cluster) -> combine classification…
-
Background information Sample datasets in RapidMiner Studio
Hi All, Where can I find the background information on the available datasets in the Sample folder in RapidMiner Studio? I'd like to know what the data is about before using the data. Thank you. Regards, Danny
-
Newbie ALERT!
I'm new to RapidMiner and just familiarizing myself with how to use it to mine tweets and Facebook posts for a research I'm currently undertaking. Any guide or link that could aid my task please?
-
Problem running Keras Sample - boston_housing_prices_regression
I am trying to set up a GPU enabled notebook for RapidMiner Studio 9.4.001 deep learning with Keras/Tensorflow-gpu 2.0. Under RM Settings/Preferences, both the Keras and Python Scripting tests were successful. The Python Scripting installed modules are listed in the attached file. I have Python 3.7.4 installed. To verify…
-
Rapid miner cannot show decision tree for sample size above 100 .
hi ,everyone ,my data set consist of 1150 entities with 49 variables . i am using rapid miner 9.4 free version . My rapid miner show decision tree and working smoothly with sample size 100 but when, i increase my sample size from 100 than my software didn't show complete tree ,only show one variable with class . i hope you…
-
Improving Test & Out of Sample Perf with Opt Selection and Auto Feature Generation
Hi All ( @IngoRM, @yyhuang, @varunm1, @hughesfleming68, @tftemme ) Apologies for cc’ing everyone, but I really need some help! I have a data set which started with 15 attributes and the two calculations which were needed to create the labels in excel (the label has three distinct values, but for my purposes, I’m only…
-
How do you set a fixed number of Random sampling and run this many times?
Problem: I am using sample stratified.. BUT the results are the same each time. It is not picking random samples each time out of my total dataset when using "sample stratified" operator. Is there a way to set the exact amount of sample size when using the cross validation operator? I have two datasets that I am merging.…