Nov 19, 2013

Understanding the UMLS Methathesaurus at a high level.

TUI, CUI, LUI and SUI look like some badly constructed names of sushi dishes, agreed. However, when dealing with the UMLS Metathesaurus, these terms are extremely important to know. In the "medical terminology community", it's funny but they talk in terms of "What SUI (sooey) is it?", "What is the CUI (cooey) of the SUI (sooey)", bla bla bla. To us people in the pure NLP, IR and text mining community these SUIs and CUIs just look sounds nothing like precision, recall or F1 Scores. 

Well really, the SUIs and the CUIs are nothing but names of fields in the UMLS Metathesaurus. Note that the UMLS Metathesaurus is NOT an ontology, it is a collection of vocabularies and ontologies such as SNOMED, ICD9, ICD10, RxNorm, Mesh, LOINC and so on. What each of these vocabuaries represent is a whole other story. Essentially, what Metathesaurus tries to do is to group synonymous concepts into a one representative bucket and that is where all the TUIs and SUIs come into play.

Let's take an example. Let us say, you are looking for the concept "Chest Pain". If you type this in the UMLS Metathesaurus online browser, you would see something like the following:

On the left you see the search for "Chest Pain" and on the right you see the results when "C0008031" is clicked. You see concept, semantic types, definitions atoms, etc. 

Concepts have CUIs
So, a concept is a fundamental unit of meaning in the Metathesaurus. Concepts can be expressed in many different ways by many different people across different vocabularies and terminologies. In the UMLS Metathesaurus, a concept is an encapsulation of all these different expressions that have the same meaning. In the example above, the concept is "Chest Pain" with the concept ID called CUI of C0008031. This encapsulates all other expressions with the same meaning which is where Atoms come in. 

Atoms have AUIs
Each concept consists of a set of ATOMS where atoms represent a single meaning within a source. Think of atoms as bank account numbers. Person "Kavita Ganesan" may have accounts in three different banks; Bank of America, Citibank and American Express. With each of these banks, for each account, person  "Kavita Ganesan" would have a unique account number (routing + account). Similarly, each unique entry from a particular vocabulary (e.g. SNOMED; ICD10; etc) would have a unique atom id - AUI. In the example above, the entries under Atoms are the concepts from different ontologies that fit into the "Chest Pain" bucket . The AUI is the primary key to the concepts table.

String Unique Identifier - SUI
Each string regardless of its source vocabulary in its very raw form will have its own SUI. Strings that differ in any way, e.g., by upper or lower case, will have different SUIs. For example, the term "chest pain" and "Chest Pain" will have different SUIs but could have the same CUI.

Lexical Unique Identifier - LUI
This is the unique identifier of a term in the Metathesaurus. Terms are different from strings. String with different lexical variants of one another are grouped together into one LUI. For example, the strings 'Pain', 'pain', and 'PAIN' all have different SUIs, but share the same LUI. LUIs are optional in the Methathesaurus. 

Semantic Types have TUIs
Each concept is assigned at least one semantic type which is one of the broad categories like "Clinical Drug" or "Disease or Syndrome" described in the UMLS Semantic Network.

So I hope this sort of demystifies some of the fuzziness in the UMLS Metathesaurus. At least for me, understanding of these core concepts helped understand the contents of the Metathesaurus. For more info you can check:

No comments:

Post a Comment