How to process categorical type data using unsupervised algorithm in anomaly detection?
I encounter a problem in anomaly detection. We know that distance is measured between different instances. Now my dataset contains categorical data. I have 3 choices. First, I remove the categorical features, however, I think that there are useful messages in categorical features. Second, I transform the categorical data into numerical value using LabelEncoder of sklearn, however, I think the transform can't correspond to the distance measure. Third, I use OneHotEncoder of sklearn to process the categorical features, however, I think that the demensions of features increase and it affect clustering.