Inspired by active learning approaches, we have developed a computational method that selects minimal gene sets capable of reliably identifying cell-types and transcriptional states in large sets of single-cell RNA-sequencing data. As the procedure focuses computational resources on poorly classified cells, active support vector machine (ActiveSVM) scales to data sets with over one million cells.