Paper
31 July 2019 Research on Hadoop-based massive short text clustering algorithm
Qiang Zhao, Yuliang Shi, Zepeng Qing
Author Affiliations +
Proceedings Volume 11198, Fourth International Workshop on Pattern Recognition; 111980A (2019) https://doi.org/10.1117/12.2540380
Event: Fourth International Workshop on Pattern Recognition, 2019, Nanjing, China
Abstract
Many clustering algorithms work well on small data sets of less than 200 data objects. However, a large database may contain millions of objects, and clustering on such a large data set may lead to biased results. As data volumes and availability continue to grow, so does the need for large dataset analytics. Among the most commonly used clustering algorithms, K-means proved to be one of the most popular choices to provide acceptable results in a reasonable amount of time. In this paper, we present an improved k-means algorithm with better initial centroids. Also, we implement this modified algorithm on Hadoop platform. Experiments show that the improved k-means algorithm converges faster than the classic k-means and the average execution time is reduced compared to the traditional k-means.
© (2019) COPYRIGHT Society of Photo-Optical Instrumentation Engineers (SPIE). Downloading of the abstract is permitted for personal use only.
Qiang Zhao, Yuliang Shi, and Zepeng Qing "Research on Hadoop-based massive short text clustering algorithm", Proc. SPIE 11198, Fourth International Workshop on Pattern Recognition, 111980A (31 July 2019); https://doi.org/10.1117/12.2540380
Lens.org Logo
CITATIONS
Cited by 2 scholarly publications.
Advertisement
Advertisement
RIGHTS & PERMISSIONS
Get copyright permission  Get copyright permission on Copyright Marketplace
KEYWORDS
Data centers

Computing systems

Data modeling

Data processing

Analytics

Computer programming

Data mining

RELATED CONTENT

Analysis of parallel computational models for clustering
Proceedings of SPIE (October 01 2018)
Models of information security trend analysis
Proceedings of SPIE (August 14 2002)
Web usage data mining agent
Proceedings of SPIE (March 12 2002)
Web data mining
Proceedings of SPIE (March 12 2002)

Back to Top