FastStep: Scalable Boolean Matrix Decomposition

Miguel Araújo, Pedro Ribeiro and Christos Faloutsos

2016

Abstract

Matrix Decomposition methods are applied to a wide range of tasks, such as data denoising, dimensionality reduction, co-clustering and community detection. However, in the presence of boolean inputs, common methods either do not scale or do not provide a boolean reconstruction, which results in high reconstruction error and low interpretability of the decomposition. We propose a novel step decomposition of boolean matrices in non-negative factors with boolean reconstruction. By formulating the problem using threshold operators and through suitable relaxation of this problem, we provide a scalable algorithm that can be applied to boolean matrices with millions of non-zero entries. We show that our method achieves significantly lower reconstruction error when compared to standard state of the art algorithms. We also show that the decomposition keeps its interpretability by analyzing communities in a flights dataset (where the matrix is interpreted as a graph in which nodes are airports) and in a movie-ratings dataset with 10 million non-zeros.

Digital Object Identifier (DOI)

doi 10.1007/978-3-319-31753-3_37

Publication in PDF format

pdf

Software

software FastStep

Journal/Conference/Book

20th Pacific-Asia Conference on Knowledge Discovery and Data Mining

Reference (text)

Miguel Araújo, Pedro Ribeiro and Christos Faloutsos . FastStep: Scalable Boolean Matrix Decomposition. Proceedings of the 20th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), pp. 461-473, Springer, Auckland, New Zealand, April, 2016.

Bibtex

@inproceedings{ribeiro-PAKDD2016,
  author = {Miguel Araújo and  Pedro Ribeiro and Christos Faloutsos },
  title = {FastStep: Scalable Boolean Matrix Decomposition},
  doi = {10.1007/978-3-319-31753-3_37},
  booktitle = {20th Pacific-Asia Conference on Knowledge Discovery and Data Mining},
  pages = {461-473},
  publisher = {Springer},
  month = {April},
  year = {2016}
}