Feb 13, 2015

What is the Notion of NLP in Computer Science vs. Biomedical Informatics vs. Information Systems (MIS)?

Computer Science: In computer science research, when we talk about NLP methods we are referring to specific methods that use some linguistics properties or structural properties of text often coupled with statistics to solve targetted language and text mining problems. The NLP parts are those "customized methods" that we use to solve specific problems. Thus, the notion of NLP here is rather broad as any novel approach that does not just re-use existing tools could be considered NLP or Computational Linguistics.

Biomedical Informatics: In the pure biomedical informatics world however, I often notice that the idea of what NLP is, is quite narrow. People tend to refer to usage of tools such as NegEx Detection, POS Taggers, Parsers and Stemming as NLP. While technically, the underlying technology in these tools can involve lightweight to heavy NLP, the applications of these tools in specific problems are not necessarily "NLP". It is more of applications of NLP tools or text-processing using NLP tools. To be technically correct, only when these NLP tools are used in combination with additional learning methods or customized text mining algorithms, this would probably qualify as actual NLP, solving a specific problem.

Information Systems (MIS): The concept of what qualifies as NLP (and text mining) in the information systems (MIS) world tends to be a lot more shallow and ill-defined than biomedical informatics or computer science where even counting word frequencies seems to be considered a form of NLP. In fact, even getting a binary classifier to work on text is considered fairly involved NLP. This may seem slightly exaggerated, but when you start reviewing papers from some of the MIS departments through TOIS or TKDE you will start to understand how ill-defined NLP can really be. 

The point here is that there are different flavors of NLP both in research and in practice and the notion of NLP varies from department to department. So when you are publishing obvious methods to conferences where the audience primarily consists of core computer scientists, you will get dinged really hard for the lack of originality in your proposed methods. At the same time, if you submit a really effective algorithm that does not use existing NLP tools to a biomedical informatics journal, again you may get dinged but this time for not using existing tools or as a friend of mine once told me "the reviewer had  philosophical issues with my submission". Also, be warned that if a physician comes and tells you hey, I am also doing NLP (heard quite a few of those), don't be surprised that what he or she means is that he is running some NLP tools on some patient data. And if someone from the IS department of a business school talks to you about offering a text mining course, it usually means they are teaching you how to use R or SaS to do some text processing and querying :D, not to mention they will also refer to this as "Big Data".