My main research interests are in the fields of Information Processing. Related topics include data mining, information retrieval, machine learning, pattern recognition... I'd like to help people get what they need more easily. Let the computer do more for us with less help from us, learn from experience, adapt effortlessly, and discover new knowledge. We need computers that reduce the information overload by extracting the important patterns from masses of data. And we need computer understand what we need. This poses many deep and fascinating scientific problems: How can a computer decide autonomously which representation is best for target knowledge? How can it tell genuine regularities from chance occurrences? How can pre-existing knowledge be exploited? How can learned results be made understandable by us?
My research addresses these and related questions. Research topics that I'm working on, or have recently worked on, include:
- Learning on Graph (Manifold)
A very natural assumption in machine learning is the local consistency assumption,
i.e., the neighboring points share the similar properties. The neighborhood structure of the data can be easily modeled using a graph.
With this assumption, we have been developing many local (graph) awareness algorithms for unsupervised and semi-supervised learning.
Learning with Local Consistency
- Mining the Web
- VIsion-based Page Segmentation (VIPS)
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant information from navigation, decoration, and interaction part of the page. I am working on a fundamental problem that how to extract the semantic structure of a web page based on its visual perception.
- Block Level Link Analysis
By extracting the page-to-block, block-to-page relationships from link structure and page layout analysis, we can construct a semantic graph over the WWW such that each node exactly represents a single semantic topic. We further developed two ranking algorithms, block level PageRank and block level HITS.
- Web Image Searching and Clustering
With VIsion-based Page Segmentation, a web page can be partitioned into blocks, each containing semantically coherent information, and the textual and link information of an image can be accurately extracted within each image block.
The textual information is used for image representation.
A large image graph can obtained from block-level link analysis. This method is less sensitive to noisy links than previous method like PicASHOW, and hence the image graph can to some extent reflect the semantic relationship between images. By spectral techniques, the obtained image graph can be partitioned into clusters which are used to enhance the search results.