Indian Journal of Medical Informatics, Vol 3, No 1 (2008)

Data-based Query Reformulation to Bolster Scenario-Specific Retrieval of Biomedical Documents

Indian Journal of Medical Informatics. 2008; 3(1): 2

http://ijmi.org

Article

Data-based Query Reformulation to Bolster Scenario-Specific Retrieval of Biomedical Documents

Y. J. Choi

Department of Image Engineering,

the Graduate School of Advanced Imaging Science, Multimedia & Film.

Chung Ang University, 221 Huksuk-dong, Dongjak-ku 156-756,

Seoul, Korea

Email: yongjc0602@yahoo.com,

Phone: 82 2 812 5717

Abstract

In this paper, I propose a data-based query expansion technique to support scenario specific retrieval in the medical domain. A data-based query expansion technique takes advantage of the UMLS (Unified Medical Language System) data source to append the original query with additional terms that are specifically relevant to the query's scenario, thus improving upon traditional query expansion approaches. I compare this technique with cross-search method that only refer to the encyclopedia and expand terms that are not necessarily scenario specific. The study on the clinical notes shows that the data based techniques that results in scenario-based expansion outperformed over the cross-search and automatic query expansion method on average in all categories of scenarios.

Keywords:

Automatic query expansion; Biomedical document retrieval; Cross-search query expansion; Data-based query expansion

1. Introduction

A number of approaches to query expansion have been studied for decades as an effective method to improve the query document mismatch problem. The basic idea behind all techniques is to supplement the original query with additional terms related to the original query topic so that the modified query has a better chance to match relevant documents.

In clinical practices, doctors are often interested in answers relevant to certain scenarios that correspond to common tasks in medical practice such as "diagnosis", "treatment", "symptom" etc. As a result, queries they pose are frequently scenario specific like "liver cancer, diagnosis". Studies show that 60% of doctors' queries center around a limited number of medical scenarios such as "treatment", "diagnosis" etc in clinical practice [1].

Retrieving documents that are specifically related to the query's scenario is referred to as scenario-based retrieval. Scenario terms in the queries are typically general such as "diagnosis, liver cancer", while full-text medical documents often discuss the same topic using much more specialized terms such as "chemoembolization". Such general scenario terms fail to match with the specialized terms in relevant documents, resulting in poor retrieval performance.

The fundamental challenge is that scenario terms in the query are too general to match specialized terms in relevant documents like "chemoembolization" which is one of treatment options for liver cancer. Therefore it is often desirable to retrieve only documents pertaining to a based medical scenario where a scenario is typically defined as a frequently reappearing medical task. For example, a doctor can pose a query "liver cancer, diagnosis" to find out the latest diagnostic techniques about the disease in diagnosing a potential liver cancer patient. In this case, "diagnosis" is the medical task that marks the scenario of the query. Scenario based retrieval is not adequately addressed by traditional text retrieval systems such as SMART and such systems suffer from the fundamental problem of query document mismatch when handling scenario-based queries [2].

There has been research on query expansion to improve the query document mismatch problem [3-6]. Those techniques also have difficulties handling scenario-based queries. In principle, query expansion techniques append the original query with specialized terms that have a statistical co-occurrence relationship with original query terms in medical literature. Even if adding such specialized terms makes the expanded query a better match with relevant documents, the expansion is not scenario based. For example existing query expansion techniques can add not only terms such as "chemoembolization" that is relevant to the treatment scenario but also irrelevant terms like "alcohol" simply because the term co-occur with "liver cancer" in medical literature in handling the query "liver cancer, treatment".

Adding non-scenario-based terms leads to the retrieval of documents that are irrelevant to the original query's scenario, diverging from the goal of scenario-based retrieval. Moreover, expanding just synonyms without considering the scenario-based data embedded in the original query is not sufficient in dealing with such queries. For example, previous methods will exclude "chemoembolization" from the expansion list for query "liver cancer, treatment" simply because "chemoembolization" is not a synonym of any original query concept.

For the demand of doctors such as finding biomedical documents in the event of scenario-based retrieval, I propose specially developed techniques called data-based query expansion and cross-search query expansion, which allow more convenient searching of biomedical documents. However, the conventional methods of informational retrieval are not suitable for this task [7]. For this design, I have applied the associative access method (ASSA) which is based on constructing of bit-attribute matrix by transforming a free text into a set of trigrams [8-9]. This gives rise to possibilities for approximate matching of pieces of distorted text and fuzzily formulated queries because biomedical documents have lots of spelling mistakes.

2. Information retrieval techniques in the medical domain

Malet [10] has defined a set of document content description tags, or metadata encodings that can be used to promote disciplined search access to Internet medical documents. Their medical core metadata set can be used to assign metadata to medical documents to facilitate document retrieval by Internet search engines. It proposes a standard metadata schema for health and medicine resources and introduces a metadata syntax and semantics, compatible with HTML code that allows Web medical authors to tag their documents for more effective retrieval.

Wilczynski [11] developed search filters to assist end users in increasing the success of their searches. An analytic survey was conducted, comparing hand searches of journals with retrievals from MEDLINE for candidate search terms and combinations. The result is that empirically derived search strategies combining indexing terms and textwords can achieve high sensitivity and specificity for retrieving sound prognostic studies from MEDLINE.

Leroy [12] described the development and testing of the Medical Concept Mapper as an aid to providing synonyms and semantically related concepts to improve searching. It presents users with a system that can take a medical query in any form and provides synonyms and important related (but not synonymous) concepts for the terms in the queries. It was tested in two user studies and found out that the Medical Concept Mapper can automatically double the number of useful search terms extracted from queries. It also suggests related terms with high precision. This can be helpful for users interested in using alternative terms in their search.

Mendonça [13] studied the possibility of using the co-occurrence of MeSH terms in MEDLINE citations to automate construction of a knowledge base of interrelated concepts as part of an effort to support searching of online medical literature according to individual needs. This study shows that sixty percent of semantic pairs generated by the process were judged to be relevant to the specific task proposed data retrieval.

Query expansion techniques can be based on some forms of knowledge structure. It is independent of the search process and additional query terms are derived by traversing a semantic network built up according to a knowledge structure. It is important to study the impact of a knowledge source in query expansion.

Aronson [14] proposed to use MetaMap, a program that maps medical free text to UMLS concepts, to first identify concepts mentioned by the original query for associating UMLS Metathesaurus concepts with the original query. Their approach further expands synonyms of the original query concepts, with the guidance of UMLS. The research has suggested the use of retrieval feedback for enhancing the original text of users' queries. The experiments show that query expansion based on MetaMap compares favorably with retrieval feedback.

Hersh [15] proposed to expand the parent and child concepts of the original query concepts, based on the concept hierarchy defined in the UMLS Metathesaurus. It assessed query expansion using thesaurus relationships and definitions in the UMLS Metathesaurus for improving searching performance. The queries from a MEDLINE test collection were expanded using synonym, hierarchical and related term data as well as term definitions from the UMLS Metathesaurus. Thesaurus-based query expansion causes a decline in retrieval performance generally but improves it in specific instances.

Voorhees [16] examined the utility of lexical query expansion in the large, diverse TREC collection. Concepts are represented by WordNet synonym sets and are expanded by following the typed links included in Word Net. Experimental results showed that this query expansion technique makes little difference in retrieval effectiveness if the original queries are relatively complete descriptions of the data being sought even when the concepts to be expanded are selected by hand.

Salton [17] found that expansion by synonyms improved performance but expansion by broader or narrower terms selected from a hierarchical thesaurus was too inconsistent to be generally useful. Moreover, automatic indexing methods are evaluated and design criteria for modern data systems are derived.

Wang [18] has suggested the use of a thesaurus containing binary relations as a resource for query enhancement and found that a variety of lexical-semantic relations improved retrieval performance. Although this relational thesaurus cannot be mechanically created, it could presumably be used in many contexts. However, each of these conclusions was drawn from experiments on very small collections using single-domain thesauri.

Chen [19] found that query expansion using specific domain thesauri has yielded a very good result such that the thesaurus is an excellent "memory-jogging" device and that it has supported learning and serendipitous browsing. This research reported an algorithmic approach to the automatic generation of thesauri for electronic community systems. The techniques used included term filtering, automatic indexing, and cluster analysis. The system was useful in suggesting relevant concepts for the researchers' queries and it helped improve concept recall. However, building the thesauri manually requires a lot of human labor and time from linguists or domain experts.

3. Proposed query expansion techniques

3.1 Describing Cross-Search query expansion concepts

Cross-search in this paper is defined as a search method to find related data by searching for the additional vocabulary [20]. The vocabulary is expanded by collecting relevant words from a relevant context in another knowledge base to which the original inquiry belongs.

An encyclopedia is a knowledge base that is arranged in pairs of a word and its description, and the description is accessible by any key word or word in the description. Cross-search on biomedical documents is done against any encyclopedias of choice.

When users search for documents with the search engine, they type keywords into the engine. The cross-search algorithm is working to find documents.

Figure 3.1 shows the cross-search mechanism for searching documents.

In is a key word. Each key word has an associated description that consists of description words, Tn1, Tn2, …

There are two operations that can be done against an encyclopedia.

RetrieveDescriptionWordsOf(In ): The encyclopedia can be searched for a key word, In to retrieve the associated description words that are also key words, {Tn1, Tn2, …}. That way only significant description words are extracted.

RetrieveKeyWordsOf(Tmk): The encyclopedia can be searched for a description word, Tmk to retrieve the associated key words, {Kp, Kq,…}

Therefore there are two different kinds of cross-searches that can be performed to expand the related vocabulary set, S through the encyclopedia.

For preparation for cross-search by description words, Set D is initialized with each word of the user's query. RetrieveDescriptionWordsOf() search is performed for each element of D and it adds the result to D

For preparation for cross-search by key words, Set K is initialized with each word of the user's query. RetrieveKeyWordsOf() search is performed for each element of K and it adds the result to K.

The related vocabulary set, S is built by unioning D and K.

S=D È K.

Then the related vocabulary set, S, is searched for in the biomedical documents again with the search engine. The biomedical documents are displayed with the related vocabulary highlighted. Thus they can be skimmed through by looking at the sections with highlights. As a result, doctors can find useful data in the biomedical documents even though they don't know exact words or phrases.

Once the related vocabulary set, S, is built. It can be fed into the beginning of cross-search as an input set instead of the user's query. If the wanted data is not found in iteration, more related vocabulary can be found in the next iteration. As more iteration of cross-searches is performed, more data related to the initial user query would be found.

A scenario-based query consists of two parts: a key concept Ck (e.g., "liver cancer") and several scenario concepts Cs's (e.g., "treatment," "diagnosis," etc) because doctors often pose queries like this to search biomedical documents. Given an original query such as "liver cancer, diagnosis", cross-search can generate candidate expansion concepts using the medical encyclopedia with the key concept Ck co-occurring with the key concept Ck, e.g., "alcohol," "chemoembolization," etc., for Ck = "liver cancer."

3.2 Describing data-based query expansion concepts

Cross-search expansion concepts that are related to the original query have been described. Only a subset of these candidate concepts is relevant to the original query's scenario. I also have developed a special query expansion technique called data-based query expansion for doctors to find relevant biomedical documents in the scenario specific query. A method that automatically takes advantages of the knowledge structures in the semantic network and UMLS is designed to identify concepts that are specifically related to the outline of scenarios in the original query.

UMLS (Unified Medical Language System) has been developed by the National Library of Medicine to facilitate the development of intelligent systems to understand the meaning of the language of biomedicine and health. UMLS has been used for the domain-based data source to retrieve free text in biomedical documents. The UMLS is a long-term project of the NLM developed to enable new data technologies to take advantage of controlled medical vocabularies [21-23]. The knowledge it contains, its definitions, concepts and structure are used in a variety of applications. One possible use is mapping user queries to relevant retrieved data [24]. It is used purely as a knowledge base for other medical tools. Carenini and Moore [25] extracted the knowledge contained in the relations of the Semantic Net and used it for their patient education system. Others use its concepts for Web searches [26].

The UMLS consists of four components: the Metathesaurus, the Semantic Network, the Specialist Lexicon, and the Knowledge Sources Server. Only the Metathesaurus and the semantic network were used in this study.

The Metathesaurus is a large, multi purpose and multi lingual vocabulary database that contains data about biomedical and health related concepts, their various names and the relationships among them. The Metathesaurus has more than 800,000 medical concepts and a group of concepts in the Metathesaurus belong to a semantic type in the semantic network in UMLS. It is organized by concept or meaning. It links alternative names and views of the same concept together and identifies useful relationship between different concepts. The goal of the Metathesaurus is to link terminology and underlying concepts. It combines the vocabulary of more than 50 different sources - including Mesh terminology - into one consistent set. This is one of the main advantages of the Metathesaurus, since retrieval is higher when terminology is not strictly limited to Mesh terms [27].

The semantic network provides a consistent categorization of all concepts represented in the UMLS Metathesaurus and provides a set of useful relationships between these concepts. All data of specific concepts is found in the Metathesaurus.

To develop the idea in full details, in the following, the data structure used in this study is introduced first, and then the data-based methods are described. Figure 3.2 depicts the components in a data-based query expansion and retrieval framework.

Given an original query such as "liver cancer, diagnosis", the data-based query expansion whose scope is marked by the rectangle derives the scenario-based expansion concepts, with the aid of domain knowledge such as UMLS in addition to the cross-search expansion technique marked by 3D box.

The basic idea of the data-based method is the following. A scenario-based query consists of two parts: a key concept Ck (e.g., "liver cancer") and several scenario concepts Cs's (e.g., "treatment," "diagnosis," etc.) because doctors often pose queries like this to search biomedical documents.

Using the Cross-Search expansion, we only can get candidate expansion concepts with the key concept Ck co-occurring with the key concept Ck, e.g., "alcohol," "chemoembolization," etc., for Ck ="liver cancer." In data-based expansion, we can explore a domain based data source to identify possible relationships between each candidate expansion concept and Ck. For example, the data source may indicate that "alcohol" is a "risk factor" for "liver cancer," whereas "chemoembolization" is a "treatment" method for this disease. Among these identified relationships, certain relationships are desirable because they match with scenarios of the original query. Thus, the data-based method will keep only the candidate concepts that have a desirable relationship with Ck. Since such concepts should be specifically relevant to the original query's scenarios, appending such concepts should lead to scenario-based expansion. For instance, "liver cancer"

and other disease concepts belong to one semantic type called "disease." The semantic network is modeled as an entity-relation diagram in which each semantic type is an entity and semantic types are associated via relationships (Figure3.3). For example, semantic types "Therapeutic and preventive procedures," "medical device" and pharmacologic substance" have a "treats" relationship with semantic type "disease or syndrome."

Given this structure, here are the procedures to identify the data-based expansion concepts.

First, a key concept (the name of a disease) K and a scenario term such as "treatment", "diagnosis", "symptom" are chosen together.

Key concepts K are identified in the data-based query. K identifies the semantic type it belongs to (e.g. from "liver cancer" to "disease") by referring to UMLS and then it reaches a set of relevant vocabulary sets. Starting from K's semantic type, I follow the relationships as indicated by the scenario concepts S's, e.g., following "treats" if a S is "treatment" and reach a set of relevant semantic types(e.g., "medical device," "therapeutic or preventive procedure" and "pharmacologic substance).

Among all these candidate expansion concepts derived by cross-search expansion, those concepts that belong to these relevant semantic types are selected as data-based expansion concepts. Those relevant terms are selected and appended to the original query.

For illustration purposes, for the sample query "liver cancer, treatment," we first use cross-search expansion technique to derive candidate expansion concepts, and then identify the scenario specific expansion concepts using the procedure described above.

3.3. The example of expansion concepts for each method

The top-13 heavily-weighted automatic expansion concepts are listed in Table 3.1(a). The pseudo-relevance feedback method is used for automatic query expansion [28]. The bold concepts in Table 3.1(a) are the ones identified as scenario-specific, corresponding to the concepts in the bold circles of Figure 3.3. These scenario-specific concepts, together with other top-weighted scenario-specific concepts, are shown in Table 3.1 (b) for the cross-search technique. The concept down the list of Table 3.1 (b) and (c) (e.g., "Radiation therapy") does not appear in the list of Table 3.1 (a), simply because they have relatively smaller weights and we are only showing the top-13 statistically-related concepts in Table 3.1 (a). The scenario concept down the list of Table 3.1 (c) (e.g., "Radiofrequency ablation") does not appear in the list of Table 3.1 (a) or (b).

The following observations are made from these results. By comparing Table 3.1 (a), (b) with Table 3.1 (c), we can clearly see that data-based expansion identifies expansion concepts that are much more relevant to the original query's scenario ("treatment") compared to automatic expansion and cross-search expansion.

Here is a real example. A user first types the scenario specific word to the ASSA search engine such as +"liver cancer" +treatment. The ASSA search engine hits 20 results regarding the query. In cross-search, he may find useful terms by referring to the medical encyclopedia included in the list of results and add new key terms to the search engine (See figure 3.4.). As a result, he can find more scenario specific documents in the next search. In data-based query expansion, he may find useful words by referring to the UMLS semantic network which is also included in the list of results and search again by adding those scenario specific terms (See figure 3.5.). It will lead to retrieving relevant documents which is scenario specific.

4. Evaluation of the techniques

Information Retrieval is one of the focal research fields in information Science. This paper specifically examines how retrieval performance can be improved by data-based and cross-search expansion techniques in searching biomedical documents.

In this section, I explain methodologies and strategies for each method performance evaluation. The data sources and the search engine for the experiment are described.

4.1. Experimental Setup

The experiment in this paper uses biomedical documents for medical data retrieval using ASSA. ASSA (Associative Access Method) is a fuzzy search engine that can quickly find data on the PC, databases, file servers, the Internet just about anywhere.

The experiment is based on the OHSUMED test collection that has been used in medical data retrieval research. The OHSUMED test collection is a set of 348,566 references from MEDLINE, the on-line medical data database, consisting of titles and/or abstracts from 270 medical journals over a five-year period (1987-1991). The available fields are title, abstract, MeSH indexing terms, author, source, and publication type.

It was built as part of a study assessing the use of MEDLINE by physicians in a clinical setting. Novice physicians using MEDLINE generated 106 queries.

The test collection is a subset of the MEDLINE database, which is a bibliographic database of important, peer-reviewed medical literature maintained by the National Library of Medicine (NLM). There are currently over seven million references in MEDLINE dating back to 1966, with about 250,000 added yearly. While the majority of references are to journal articles, there is also a small number of references to letters to the editor, conference proceedings, and other reports. Each reference also contains human-assigned subject headings from the 17,000-term Medical Subject Headings (MeSH) vocabulary.

The query set consists of 12 queries. Each query is short and contains a data request. A key concept is a name of disease. Scenario concept can be diagnosis, treatment, and symptom. All queries are in the form of ("key concept", "scenario concept").

Every reference in the test database retrieved for a given query was judged for relevance by doctors who were clinically active and were current fellows in general medicine or medical informatics. A document was judged by doctors as relevant, irrelevant for a given query and relevance was judged on a three-point scale: Definitely relevant (article provided highly relevant data for doctors faced with the recorded patient data and data need), possibly relevant (article might provide useful data to the doctor), and not relevant (article did not provide any relevant data for this data need). For evaluation, all documents judged here as either possibly or definitely relevant were considered relevant.

4.2. Evaluation criteria

Extensive experimental evaluation of the data-based method has been performed by comparing the cross-search expansion method and automatic query expansion. The performance of data-based expansion compared to that of cross-search and automatic query expansion has been studied. The main purpose of this study is to see if it presents improvements over the cross-search expansion and automatic query expansion method when handling scenario-based queries.

Results were evaluated in two ways. One was 11 points average precision. The other was the recall and precision rates with the precision-recall curve. The number of expansion terms tested was 5, 10, 15 and 20. An 11- point average precision (at recall levels of 0.0, 0.1, ..., 1.0) was used to calculate a composite measure for retrieval performance. The average precision for three methods was computed on each of the 12 queries and the results were averaged over the 12 queries.

The three methods were first compared under different expansion sizes, and then the performance of each method was studied under different query scenarios.

The query expansion techniques may perform differently for different query scenarios, so the way

in which each expansion technique performs in different scenarios was studied. To do this, 12 queries have been grouped according to the scenario such as diagnosis, treatment, symptoms.

The performance results depending on each group of queries were compared - data-based expansion, cross-search expansion and automatic query expansion under the same settings.

The queries that mention each scenario are listed in Table 4.1.

4.3. Experimental results and comparative analysis of effectiveness

In this section, the performance results of the data-based expansion technique compared to that of the cross-search expansion and automatic query expansion technique are presented.

The recall-precision graph which is the most commonly used for comparing search techniques was created using the 11 cutoff values from the recall level precision averages. Figures 4.1, 4.2, 4.3, 4.4 show the effectiveness of each method demonstrated by a total of 11 recall-precision value pairs.

The average precision was computed for three query expansion methods on each of the 12 queries and figure 4.1 shows the performance result. "n" is the expansion size and "n=All" means appending all expansion terms.

The performance for data-based expansion and cross-search increases as n increases and reaches the peak when n=All. The data-based expansion method has more useful data for medical information retrieval than the cross-search method that only refers to the medical encyclopedia. However, the performance of the automatic query expansion method degrades after n=15. The automatic query expansion method does not seem to distinguish between expansion terms that are medical specific from those are not. For example, it can add not only terms such as "chemoembolization" that is one of treatment options and is relevant to the original query but also irrelevant terms like "stomach" simply because the term co-occurs with "liver cancer" in medical literature in handling the query "liver cancer". As a result, as more terms are appended to the original query, the negative impact of those irreverent terms begins to accumulate and after a certain point the performance drops. On the other hand, the data-based query expansion method appends medical specific terms only and consequently, the performance of it keeps increasing as more useful terms are appended.

I averaged the performance of three expansion techniques within each group of queries and the results of applying the query expansion techniques are showed in Figure 4.5. Each bar shows the performance of data-based expansion averaged over the cross-search expansion and automatic query expansion under the same settings. All values are given in terms of 11-point average precision.

The results show that the data-based technique can create scenario-based query expansion and produces improvements over cross-search and automatic query expansion technique when handling scenario-based queries.

The results also suggest that data-based expansion performs differently for queries with different scenarios. This may happen because the knowledge structures defined for these scenarios exhibit different characteristics. The data-based technique produces more improvements in the "treatment" than "diagnosis" or "symptom" scenario. A possible explanation lies in the different data structures for these three scenarios. For example, there are more relevant semantic types than those in the "diagnosis" or "symptom" scenario in the treatment scenario. Data-based query expansion has more concepts in the treatment scenario as scenario specific expansion concepts than the cross-search or automatic method.

5. CONCLUSION

This paper proposes a data-based query expansion method to improve the retrieval performance when handling scenario queries for which doctors are often searching.

The previous studies have not tried to take advantage of a domain-based data source to reformulate the query expansion results and provide scenario-based expansion. This research focuses on a type of medical queries, namely scenario-based queries, which have been shown to be predominant among medical users' search requests.

A data-based query expansion method is presented to improve the retrieval performance for such queries and it is a method that automatically takes advantage of the data structures in the

semantic network and the UMLS to identify concepts that are specifically related to the scenarios in the original query. Adding such identified concepts to the original query results in scenario-based expansion and improves the search performance. Scenario specific queries that have been shown to be predominant among medical users' search requests. It is often too narrow to expand just the synonyms without considering the scenario data embedded in the original query in dealing with such queries. However the data-based query expansion technique explores the scenario data in the original query, relates that data to certain data structures in the UMLS semantic network, and uses the identified data structure to guide the selection of scenario specific concepts. The resulting expansion can have a much broader scope than just synonyms. The performance results shows that data-based expansion technique outperformed over cross-search and automatic query expansion techniques in all medical scenarios.

The experiments reported in this paper have examined the performance of retrieval results of automatic query expansion, cross-search expansion, and data-based query expansion. The results suggest that data-based query expansion provides more consistent increases in retrieval effectiveness. I conclude that a data-based query expansion along with UMLS is an effective method of enhancing retrieval effectiveness when handling medical scenario queries.

6. Acknowledgements

This research was supported by the ITRC (Information Technology Research Center, MIC) program and Seoul R&BD program, Korea. This work was also supported by the second phase of the Brain Korea 21 Program in 2008.

7. Competing interests

The author declares no competing interests.

References

1. Ely JW, Osheroff JA, Ebell MH, Bergus GR, Levy BT, Chambliss ML, Evans ER. Analysis of questions asked by family doctors regarding patient care BMJ, 319; 1999: 211-220,

2. Efthimiadis EN. Query expansion, American Society for Data Retrieval by Data Today, Inc. Annual Review of Data Science and Technology, 1996; 31: 121_187

3. Qiu Y. Frei HP. Concept-based query expansion. In Proceedings of ACM SIGIR '93, 1993; 160-169

4. Jing Y, Croft WB. An association thesaurus for data retrieval. In Proceedings of RIAO '94, 1994; 146-160

5. Xu J, Croft WB. Query expansion using local and global document analysis. In Proceedings of ACM SIGIR '96, 1996; 4-11

6. Mitra M, Singhal A, Buckley C. Improving automatic query expansion. In Proceedings of ACM SIGIR '98, 1998; 206-214

7. Frakes WB, Baeza-Yates R. (eds.). Data Retrieval - Data Structures & Algorithms. New Jersey: Prentice Hall PTR, Saddle River, 1993

8. Lapir GM. Use of Associative Access Method for Data Retrieval Systems, Proceedings of the 23rd Annual Pittsburgh Conference on Modeling and Simulation, vol. 23, part 2, 1992; 951-958

9. Berkovich S, El-Qawasameh E, Lapir GM, Mack M, Zincke C., Organization of Near Matching in Bit Attribute Matrix Applied to Associative Access Methods in Data Retrieval, 16th IASTED International Conference on Applied Informatics, IASTED, 1998 62-64.

10. Malet G, Munoz F, Appleyard R, Hersh W, A Model for Enhancing Internet Medical Document Retrieval with "Medical Core Metadata" Journal of the American Medical Informatics Association. 1999; 6: 163-172

11. Wilczynski NL, Haynes RB. Developing optimal search strategies for detecting clinically sound prognostic studies in MEDLINE: An analytic survey, BMC Medicine, 2004

12. Leroy G, Tolle KM, Chen H. Customizable and Ontology-Enhanced Medical Information Retrieval Interfaces Methods of Information in Medicine, forthcoming, 2000

13. Mendonça EA., Cimino JJ. Building a knowledge base to support a digital library. In Patel, V., Haux, R, Rogers, R. Proceedings of the Tenth World Conference on Medical Informatics, MEDINFO, 2001; 2001 221-225.

14. Aronson AR, Rindflesch TC. Query Expansion Using the UMLS Metathesaurus. In: AMIA Annual Fall Symposium. 1997: 485-489.

15. Hersh WH, Price S, Donohoe L. Accessing thesaurus-based query expansion using the UMLS metathesaurus. In Proceedings of AMIA Annual Symp 2000

16. Voorhees EM. Query expansion using lexical-semantic relations, In Proceedings of ACM SIGIR '94, 1994 61_69

17. Salton G. Lesk ME. Computer evaluation of indexing and text processing. In Salton G.(ed), The SMART Retrieval System: Experiments in Automatic Document Processing. New Jersey: Prentice-Hall, 1971 pp.143_180.

18. Wang YC, Vandendorpe J, Evens M. Relational thesauri in information retrieval Journal of the American Society for Information Science, 1985; 36: 15-27

19. Chen H, Schatz B, Yim T, Fye D. Automatic thesaurus generation for an electronic community system. Journal of the American Society for Information Science. 1995; 46: 175-193

20. Choi Y, Byun J, Berkovich S. Cross-search technique and its visualization of peer-to-peer distributed clinical documents. (ICIT) International Conference on Information Technology, Turkey, 2004: 49-53

21. Lindberg DA, Humphreys BL, McCray AT. The Unified Medical Language System Methods Inf Med. 1993; 32: 281-91

22. McCray AT, Nelson SJ. The representation of meaning in the UMLS, Methods Inf Med; 34(1-2): 1995 193-201

23. Humphreys BL, Lindberg DA. The UMLS project: making the conceptual connection between users and the information they need, Bull Med Libr Assoc. 1993; 81: 170-77.

24. McCray AT, Aronson AR, Browne AC, Rindflesch TC, Razi A, Srinivasan S. UMLS knowledge for biomedical language processing. Bull Med Libr Assoc. 1993; 81: 184-194.

25. Carenini G, Moore JD. Using the UMLS Semantic Network as a Basis for Constructing a Terminological Knowledge Base: A Preliminary Report In: Seventeenth Annual Symposium on Computer Applications in Medical Care; 1994.

26. Suarez HH, Hao X, Chang IF. Searching for information on the internet using the UMLS and Medical World Search. In: AMIA Annual Fall Symposium; 1997 824-828.

27. Rindflesch TC., Aronson AR. Ambiguity resolution while mapping free text to the UMLS Metathesaurus. In: The American Medical Informatics Society Annual Symposium on Computer Applications in Medical Care; 1994 240-244.

28. Burkely C, Salton G, Allan J, Singhal A. Automatic query expansion using SMART: TREC-3, In proceedings of the third text retrieval conference (TREC-3), 1994 69-80


Paper received on 06/05/2008; accepted on 18/07/2008

Correspondence:

Y. J. Choi

Department of Image Engineering,

the Graduate School of Advanced Imaging Science, Multimedia & Film.

Chung Ang University, 221 Huksuk-dong, Dongjak-ku 156-756,

Seoul, Korea

Email: yongjc0602@yahoo.com,

Phone: 82 2 812 5717

This Open Access article is available at: http://ijmi.org/index.php/ijmi/article/view/y08i1a12

© 2008 Author(s); licensee Indian Journal of Medical Informatics under

Creative Commons Attribution-No Derivative Works 3.0 License .

Comments on this article

View all comments