Since the birth of software system, how to automate various tasks in software engineering has been one of the most important research problems faced by researchers. The popularity of Internet has given a new chance to automate software engineering: many software engineering data are now available in Internet, allowing data-oriented and knowledge-based intelligent software engineering. Around this direction, in the past years researchers in Peking University have proposed many different methods and systems to support a more automated software engineering environment.

To automate software development, a basic approach is to reuse the component that other developers have written. Therefore, a system for retrieving, storing, and querying reusable component is fundamental in automating software development. Researchers in Peking University have solved multiple key problems in building such a system, including automatic discovery of reusable software component, automatic evaluation of the quality of reusable components, better retrieval of reusable components, and interoperation of reusable repositories. Based on these methods, researchers in Peking University worked together with researchers in National University of Defense Technology and developed Trustie, a repository of component reuse. The Trustie system incorporated the aforementioned techniques, and integrated with the Eclipse IDE.

Besides reusable components, the resources in Internet also provide knowledge about various aspects of software development, and such knowledge are essential in automating many software engineering tasks. To support the management of such knowledge, researchers in Peking University, collaborating with researchers in Academy of Mathematics and System Sciences, etc., proposed the new concept “knowware” to guide the organization and management of knowledge. Based on knowware, the researchers further proposes a set of methods construct a repository of knowware, including methods to retrieve knowware from various resources, merge knowware for better organization, and evaluate the quality of knowware. Given a repository, the researchers propose methods to use the knowware, including algorithms for browsing and synthesizing knowledge, and problem-solving based on knowware.

Based on the fundamental support of the reusable component repository and the knowware repository, researchers in Peking University have proposed different methods to automate various tasks in software development, testing, and debugging. At the development stage, various methods are proposed to locate code for certain tasks, such as internationalization or locating code implementing a certain feature. At the testing stage, methods are designed to optimize a test suite. Unlike existing methods that optimize a test suite based on coverage, we optimized based on the fault-detection capability. At the debugging stage, methods are designed to automatically locate and repair the fault. Unlike existing techniques that often produce wrong patches, our techniques produce much fewer wrong patches, with the help of knowledge in various sources.

These research results has leaded to 2 books and more than 100 publications on top software engineering, programming languages, and artificial intelligence conferences and journals such as ICSE, FSE, ASE, IEEE TSE, ACM TOSEM, POPL, AAAI, ACL, etc. Three papers are awarded with ACM SIGSOFT Distinguished Paper Award.

The Trustie system has been applied to large corporations such as Digital China, Neusoft, State Grid Corporation of China, and their reports that around 60% improvements in the rate of software reuse. Trustie is also deployed in 10 software parks located in Beijing, Shanghai, Guangzhou, etc. as a public service, serving more than 1000 software companies. Furthermore, OW2, the third largest organization for open source middleware, incorporate Trustie as part of its infrastructure to enhance its analysis and evaluation of software components.

The techniques around knowware has been applied to more than 20 different fields to extract and manage knowledge, query knowledge, recommend, detect, and monitor based on knowledge. For example, TBCNN, a recently proposed machine learning technique by Peking University for analyzing source code, has been implemented by source (http://sourced.tech/) as part of their product to analyze source code.

blob.png