It is convenient to refer to an ambiguous word along with all of its
individual senses as an ambiguity case. Further, we call each
textual occurrence of the ambiguity an instance. In the UMLS
Metathesaurus, a large number of ambiguity cases are represented by separate
concepts, each of which refers to one of the individual senses.
In order to support research investigating the automatic resolution of word sense ambiguity using natural language processing techniques, we manually constructed our original WSD Test Collection in 1999. In collaboration with several colleagues, that collection has been updated in several ways to increase its usability.
More recently, we have created a second, automatically constructed WSD test collection called the MSH WSD Test Collection that is larger and has broader semantic coverage than the original test collection. See the table below for further information about both collections.
The MSH WSD Test Collection was constructed using a method that automatically extracts instances of ambiguous terms from MEDLINE without manual curation which also uses MeSH indexing of MEDLINE as a resource. The resulting data set contains both biomedical terms and abbreviations and is automatically created using the UMLS Metathesaurus and the manual MeSH indexing of MEDLINE.
Please Note: The MSH WSD Data Set is more current and contains a much larger and richer set of ambiguities than our original WSD Test Collection.
The Original WSD Test Collection was constructed using citations from the 1998 MEDLINE Baseline where the ambiguities were resolved by hand. Evaluators were asked to examine instances of an ambiguous word and determine the sense intended by selecting the Metathesaurus concept (if any) that best represents the meaning of that sense.
To access either of the WSD Test Collections you must have activated a UMLS Terminology Services (UTS) account. For more information please visit our Help about UTS accounts Web page. This account is free with only the minor requirement of filing a brief annual report on your use of the UMLS.
Users are also responsible for compliance with the UMLS Metathesaurus License Agreement which requires you to respect the copyrights of the constituent vocabularies.