Topic Modeling and GMM on Manifold (Network)


Introduction

Dyadic data refers to domain where two sets of objects, row or column objects, are characterized by a matrix of numerical values which describe their mutual relationships. Such data arises in many real world applications such as social network analysis and information retrieval. A common example is term-document co-occurrence matrix. In order to discover the underlying or hidden structure in the dyadic data, topic modeling techniques are usually applied to learn a probabilistic interpretation of the row and column objects. Two of the most popular approaches for this purpose are Probabilistic Latent Semantic Indexing (PLSA) and Latent Dirichlet Allocation (LDA).

Recent studies have shown that naturally occurring data, such as texts and images, cannot possibly "fill up" the ambient Euclidean space, rather it must concentrate around lower dimensional structures. The goal of this work is to extract this kind of low dimensional structure and use it to regularize the learning of probability distributions.


Codes


Data

You can download the processed text data sets here.
If you find these algoirthms useful, we appreciate it very much if you can cite our following works:

Papers

  1. Deng Cai, Xuanhui Wang and Xiaofei He, "Probabilistic Dyadic Data Analysis with Local and Global Consistency", 26th International Conference on Machine Learning (ICML'09), June, 2009.
    Bibtex source

  2. Qiaozhu Mei, Deng Cai, Duo Zhang, and ChengXiang Zhai. "Topic Modeling with Network Regularization", Proceedings of the 17th International World Wide Web Conference (WWW' 08), pages 101-110, 2008.
    Bibtex source

  3. Deng Cai, Qiaozhu Mei, Jiawei Han, ChengXiang Zhai, "Modeling Hidden Topics on Document Manifold", Proc. 2008 ACM Conf. on Information and Knowledge Management (CIKM'08), Napa Valley, CA, Oct. 2008.
    Bibtex source

  4. Jialu Liu, Deng Cai, Xiaofei He, "Gaussian Mixture Model with Local Consistency", Twenty-Fourth Conference on Artificial Intelligence (AAAI'10).
    Bibtex source

  5. Xiaofei He, Deng Cai, Yuanlong Shao, Hujun Bao, and Jiawei Han, "Laplacian Regularized Gaussian Mixture Model for Data Clustering", IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 9, pp. 1406-1418, 2011.
    Bibtex source


Return to Codes and Data