Topic Modeling and GMM on Manifold (Network)
Introduction
Dyadic data refers to domain where two sets of objects, row or
column objects, are characterized by a matrix of numerical values
which describe their mutual relationships. Such data arises in many
real world applications such as social network analysis and
information retrieval. A common example is
term-document co-occurrence matrix. In order to discover the
underlying or hidden structure in the dyadic data, topic modeling
techniques are usually applied to learn a probabilistic
interpretation of the row and column objects. Two of the most
popular approaches for this purpose are Probabilistic Latent
Semantic Indexing (PLSA) and Latent Dirichlet Allocation
(LDA).
Recent studies have shown that naturally occurring data, such as
texts and images, cannot possibly "fill up" the ambient Euclidean
space, rather it must concentrate around lower dimensional
structures. The goal of this work is to extract this kind of low
dimensional structure and use it to regularize the learning of
probability distributions.
Codes
- LapPLSI: Laplacian regularized PLSI (PLSA)
- Examples here
- Deng Cai, Qiaozhu Mei, Jiawei Han, ChengXiang Zhai, "Modeling Hidden Topics on Document Manifold", CIKM 2008. (pdf)
Bibtex source
- Qiaozhu Mei, Deng Cai, Duo Zhang, ChengXiang Zhai, "Topic Modeling with Network Regularization", WWW 2008.
Bibtex source
- LTM: Locally-consistent Topic Modeling (The efficiency of the code has been significantly improved. If you are using LTM, please update the code. 2012/1/16)
- Examples here
- Deng Cai, Xuanhui Wang and Xiaofei He, "Probabilistic Dyadic Data Analysis with Local and Global Consistency", ICML 2009. (pdf)
Bibtex source
- LapGMM: Laplacian Regularized Gaussian Mixture Model. (coming soon!)
- Examples here (coming soon!)
- Xiaofei He, Deng Cai, Yuanlong Shao, Hujun Bao, and Jiawei Han, "Laplacian Regularized Gaussian Mixture Model for Data Clustering", IEEE TKDE 2011.
Bibtex source
- LCGMM: Locally Consistent Gaussian Mixture Model.
Data
You can download the processed text data sets here.
If you find these algoirthms useful, we appreciate it very much if you can cite our following works:
Papers
- Deng Cai, Xuanhui Wang and Xiaofei He, "Probabilistic Dyadic Data Analysis with Local and Global Consistency", 26th International Conference on Machine Learning (ICML'09), June, 2009.
Bibtex source
- Qiaozhu Mei, Deng Cai, Duo Zhang, and ChengXiang Zhai. "Topic Modeling with Network Regularization", Proceedings of the 17th International World Wide Web Conference (WWW' 08), pages 101-110, 2008.
Bibtex source
- Deng Cai, Qiaozhu Mei, Jiawei Han, ChengXiang Zhai, "Modeling Hidden Topics on Document Manifold", Proc. 2008 ACM Conf. on Information and Knowledge Management (CIKM'08), Napa Valley, CA, Oct. 2008.
Bibtex source
- Jialu Liu, Deng Cai, Xiaofei He, "Gaussian Mixture Model with Local Consistency", Twenty-Fourth Conference on Artificial Intelligence (AAAI'10).
Bibtex source
- Xiaofei He, Deng Cai, Yuanlong Shao, Hujun Bao, and Jiawei Han, "Laplacian Regularized Gaussian Mixture Model for Data Clustering", IEEE Transactions on Knowledge and Data Engineering, Vol. 23, No. 9, pp. 1406-1418, 2011.
Bibtex source
Return to Codes and Data