This document outlines clustering algorithms for large datasets. It discusses k-means clustering and extensions like k-means++ that improve initialization. It also covers spectral relaxation methods that reformulate k-means as a trace maximization problem to address local minima. Additionally, it proposes landmark-based clustering algorithms for biological sequences that select landmarks in one pass and assign sequences to the nearest landmark using hashing to search for neighbors. The document provides analysis of the time and space complexity of these algorithms as well as assumptions about separability and cluster size.