Abstract:
This paper presents a semi supervised clustering technique with incremental and decremented affinity propagation (ID-AP) that structures labeled exemplars into the AP algorithm and a new method for actively selecting informative constraints to make available of improved clustering performance. The clustering and active learning methods are both scalable to large data sets, and can hold very high dimensional data. In this paper, the active learning challenges are examined to choose the must-link and cannot-link constraints for semi-supervised clustering. The proposed active learning approach increases the neighborhoods based on selecting the informative points and querying their relationship between the neighborhoods. At this time, the classic uncertainty-based principle is designed and novel approach is presented for calculating the uncertainty associated with each data point. Further, a selection criterion is introduced that trades off the amount of uncertainty of each data point with the probable number of queries (the cost) essential to determine this uncertainty. This permits us to select queries that have the maximum information rate. Experimental results demonstrate that the proposed ID-AP technique adequately captures and takes full advantage of the intrinsic relationship between the labeled samples and unlabeled data, and produces better performance than the other considered methods Empirically evaluate the proposed method on the eight benchmark data sets against a number of competing methods. The evaluation results indicate that our method achieves consistent and substantial improvements over its competitors.