Dissertation: Entity Retrieval | 2010.1- |
- Propose the Two-layer Retrieval and Extraction Probability Model
- Decomposing the black box of Entity Retrieval into two parts: document retrieval
and entity extraction.
- Theoretical demonstrating the decomposition process using probability model.
- Bringing state-of-the-art techniques in document retrieval and entity extraction
into entity retrieval.
- Evaluating the entity retrieval system in two layers, which in turns improves the
overall system performance.
- A demo system of entity retrieval on museum archive search with the techniques
of Indri document indexing, servlet dealing with the front-end queries, and the
front-end interface.
| |
| 2010.1 to 2011.6 |
- Working on the project of medical domain entity extraction (Topaz) and disease
detection, especially on the disease of Shigellosis and Influenza.
- Designing and Implementing on a medical entity extraction result evaluation
tool, called ARR, to facilitate the extraction result monitoring.
- Maintaining and debugging the GATE-version Topaz system; updating the rules
using in the system; conducting the experiments and evaluating the performance.
- Maintaining and debugging the UIMA-version Topaz system.
- Studies on the different features, such as POS, for the medical entity extraction.
|
| 2009.9 to 2009.12 |
- Studies on the user collaborative search behavior on the Web environment, es-
pecially in Citeulike website.
- Mining the Citeulike dataset, including the users information (user id), the docu-
ment information (title, abstract, author, keywords, url, etc.), group information
(which users belong to the special group), and library information (which books
are in the someone's library).
- Updating a Firefox plug-in for the user behavior tracking, including user lo-
gin/logout, user browsing, user search, user bookmarking, etc.
- Collecting the front-end user data, and storing the data in the back-end (MySQL)
database for the user behavior study.
| |
A Study of Relation Annotation in Business Environments | 2008.7 to 2009.1 |
|
|
Topic Tracker in Continuous Information SenseMaking (SAP intern) | 2008. 5 to 2009.8 |
- Working in Topic Track Continuous Sensemaking which focuses on the free-text
document analysis.
- Studying and implementing the entity and relation extraction from Knowledge
Base (Freebase) and the free texts.
- Classifying and disambiguating search results according to the context.
- Estimating the importance of topics at each stage the according to the detected
entities and relations on the ontology network; detecting the topics continuously
by summarizing the topics at each stage.
- Mid-term reports and
final reports
| |
Geography Query Parsing (CLEF Geographical Information Retrieval Track) | 2008.1 to 2008.5 |
- Geography name entity identification. Extracting geography names entity with corresponding from Wikipedia.
- Geography name entity disambiguation. Designing components to disambiguate locations by estimating the distance between two locations and
geography context information.
- Query type identification. Classifying the query type by identifying the nearest-neighbor nodes in semantic knowledge networks.
| |
Toward Classify Exploratory Search Results with Wikipedia (Prelim Exam) | 2007.10 to 2008.2 |
- Classifying exploratory search results to support people for Request For Information (RFI) search task.
- Identifying the potential concepts in search queries using Wikipedia entries, and evaluating the effectiveness of the approach
- Demonstrating the possibility of using Wikipedia articles as training data sets to help classification task. And concluding some rules for
collecting training data sets for this task.
| |
GALE Project | 2007.1-2007.12 |
- Passage-level precision recall evaluation
- Continuing and completing the passage-level precision and recall evaluation tool component. This tool is Java-based with inputs of snippets
and outputs of passage-level precision.
- Implementing the algorithm for combining two annotators evaluations for the ground truth.
- More information welcome to our Gale Project Website
| |
Design and Implementation Tools for Object-Oriented Design Flaws Detection (Master Degree Thesis) | 2005.9-2006.5 |
- Studyed on the detections of Object-Oriented (OO) Design Flaws, using metrics to quantitize the quality of the OO softwares.
- Reviewed metrics for detecting OO design flaws and proposed metric framework for OO Design Flaws according to the Goal-Driven Measurement
Process.
- Implemented Java-based tool sets, providing basic object-oriented statistics data of Jave scripts and predicting the potential design flaws
according to the framework I proposed for users.
| |