Zhang, Ming

Professor

Research Interests: Machine learning text mining

Office Phone: 86-10-6276 5825

Email: mzhang_cs@pku.edu.cn

Zhang, Ming is a professor in the Department of Computer Science and technology, School of EECS. She obtained her B.Sc., M.Sc., and Ph.D. from Peking University in 1988, 1991 and 2005 respectively. Her research interests include Machine Learning, Natural Language Processing, Text Mining, Social Network Analysis, Computer Education, and Health Informatics.

Dr. Zhang has published more than 100 research papers, and most of them are published in top-tier conferences and journals, such as ICML, SIGKDD, WWW, AAAI, IJCAI, ACL, and TKDE. She has served in the Technical Program Committee of various international conferences including AAAI, SIGCSE, APWeb, and Chair of SIGCSE China Symposium at TURC 2017. She was awarded ICML Best Paper award (2014).

Dr. Zhang has more than ten research projects including NSFC, 863 project, etc. Her research achievements are summarized as follows:

1) Embedding very large information networks into low-dimensional vector spaces: We propose a novel network embedding method called “LINE,” which is suitable for arbitrary types of information networks: undirected, directed, and weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference.

2) Visualizing large-scale and high-dimensional data in a low-dimensional (typically 2D or 3D) space: We propose the LargeVis, a technique that ?rst constructs an accurately approximated K-nearest neighbor graph from the data and then layouts the graph in the low-dimensional space. Comparing to tSNE, LargeVis signi?cantly reduces the computational cost of the graph construction step and employs a principled probabilistic model for the visualization step, the objective of which can be effectively optimized through asynchronous stochastic gradient descent with a linear time complexity.

3) Incorporating World Knowledge to Text mining via Heterogeneous Information Networks: We provide examples of using world knowledge for domain dependent document clustering, document similarity. We provide three ways to specify the world knowledge to domains by resolving the ambiguity of the entities and their types, and represent the data with world knowledge as a heterogeneous information network. Incorporating world knowledge as indirect supervision can signi?cantly outperform the state-of-the-art text mining algorithms.

4) SPOC experiment on Data Structures and Algorithms. We leverage both online and offline data to analyze the efforts and improvements of students, including comparing test scores between different classes, collecting student information from questionnaires and exploring the online learning behaviors of students. We also discuss and explain our findings according to the experience that we get from the teaching experience.