Projects

Dissertation: Entity Retrieval

2010.1-
  • Propose the Two-layer Retrieval and Extraction Probability Model
  • Decomposing the black box of Entity Retrieval into two parts: document retrieval and entity extraction.
  • Theoretical demonstrating the decomposition process using probability model.
  • Bringing state-of-the-art techniques in document retrieval and entity extraction into entity retrieval.
  • Evaluating the entity retrieval system in two layers, which in turns improves the overall system performance.
  • A demo system of entity retrieval on museum archive search with the techniques of Indri document indexing, servlet dealing with the front-end queries, and the front-end interface.

CDC Center of Excellence in Public Health Informatics

2010.1 to 2011.6
  • Working on the project of medical domain entity extraction (Topaz) and disease detection, especially on the disease of Shigellosis and Influenza.
  • Designing and Implementing on a medical entity extraction result evaluation tool, called ARR, to facilitate the extraction result monitoring.
  • Maintaining and debugging the GATE-version Topaz system; updating the rules using in the system; conducting the experiments and evaluating the performance.
  • Maintaining and debugging the UIMA-version Topaz system.
  • Studies on the different features, such as POS, for the medical entity extraction.

UACF project

2009.9 to 2009.12
  • Studies on the user collaborative search behavior on the Web environment, es- pecially in Citeulike website.
  • Mining the Citeulike dataset, including the users information (user id), the docu- ment information (title, abstract, author, keywords, url, etc.), group information (which users belong to the special group), and library information (which books are in the someone's library).
  • Updating a Firefox plug-in for the user behavior tracking, including user lo- gin/logout, user browsing, user search, user bookmarking, etc.
  • Collecting the front-end user data, and storing the data in the back-end (MySQL) database for the user behavior study.

A Study of Relation Annotation in Business Environments

2008.7 to 2009.1

Topic Tracker in Continuous Information SenseMaking (SAP intern)

2008. 5 to 2009.8
  • Working in Topic Track Continuous Sensemaking which focuses on the free-text document analysis.
  • Studying and implementing the entity and relation extraction from Knowledge Base (Freebase) and the free texts.
  • Classifying and disambiguating search results according to the context.
  • Estimating the importance of topics at each stage the according to the detected entities and relations on the ontology network; detecting the topics continuously by summarizing the topics at each stage.
  • Mid-term reports and final reports

Geography Query Parsing (CLEF Geographical Information Retrieval Track)

2008.1 to 2008.5
  • Geography name entity identification. Extracting geography names entity with corresponding from Wikipedia.
  • Geography name entity disambiguation. Designing components to disambiguate locations by estimating the distance between two locations and geography context information.
  • Query type identification. Classifying the query type by identifying the nearest-neighbor nodes in semantic knowledge networks.

Toward Classify Exploratory Search Results with Wikipedia (Prelim Exam)

2007.10 to 2008.2
  • Classifying exploratory search results to support people for Request For Information (RFI) search task.
  • Identifying the potential concepts in search queries using Wikipedia entries, and evaluating the effectiveness of the approach
  • Demonstrating the possibility of using Wikipedia articles as training data sets to help classification task. And concluding some rules for collecting training data sets for this task.

GALE Project

2007.1-2007.12
  • Passage-level precision recall evaluation
  • Continuing and completing the passage-level precision and recall evaluation tool component. This tool is Java-based with inputs of snippets and outputs of passage-level precision.
  • Implementing the algorithm for combining two annotators evaluations for the ground truth.
  • More information welcome to our Gale Project Website

Design and Implementation Tools for Object-Oriented Design Flaws Detection (Master Degree Thesis)

2005.9-2006.5
  • Studyed on the detections of Object-Oriented (OO) Design Flaws, using metrics to quantitize the quality of the OO softwares.
  • Reviewed metrics for detecting OO design flaws and proposed metric framework for OO Design Flaws according to the Goal-Driven Measurement Process.
  • Implemented Java-based tool sets, providing basic object-oriented statistics data of Jave scripts and predicting the potential design flaws according to the framework I proposed for users.