This document discusses using a modified k-means algorithm to identify the optimal number of clusters in categorical sequence data. The traditional k-means algorithm requires the number of clusters to be predefined, which can impact performance. The proposed Robust K-means for Sequences algorithm aims to predict the optimal number of clusters by removing noise clusters. It evaluates cluster validation to assess clustering quality for categorical sequence data, where defining similarity is challenging. The algorithm combines a partition-based clustering method and a cluster validity index within a model selection process to determine the best number of clusters for categorical sequence sets.