报告1：Subgraph Matching: Past and Present
Graph data are key parts of Big Data and widely used for modelling complex structured data with a broad spectrum of applications. Over the last decade, tremendous research efforts have been devoted to many fundamental problems in managing and analysing graph data. In this talk, I will focus on a fundamental problem, subgraph matching. I will cover solutions for single computer, as well as distributed solutions.
报告2：An Introduction of Model-based Text Clustering
Text clustering is an important technology in data mining and machine learning. It is widely used in event discovery and tracking, document summarization, search results clustering, and other issues. Although there are many researches on text clustering, there are still many challenging problems to be solved: (1) How to set the number of clusters? Is it possible to automatically discover the number of clusters from the data? (2) How to deal with the sparsity of short text? (3) How to automatically discover abnormal documents in a dataset? (4) How to deal with the concept drift problem of stream text clustering? In this report, Dr. Jianhua Yin will share his work on text clustering and the stories behind these papers when he was a PhD student at Tsinghua University, hoping to inspire the younger students who are interested in scientific research.