Text Mining, IR and NLP References

These are some Text Mining, IR and NLP related reference materials that would be useful to anyone who is doing research and development in the area of Text Data Mining, Retrieval and Analysis. I have found many of these resources particularly useful in getting me started. Please note that this page is periodically updated.

Opinion Analysis

Survey: Opinion Mining and Sentiment Analysis [ pdf ] 
This is a fairly complete survey that covers some of the core techniques and approaches used in Opinion Mining (prior to 2008). Note that the techniques covered are the earlier ones that do not necessarily involve summarization of Tweets or short texts. In particular, it does not cover the newer body of work focusing on textual summarization of opinions.
(By Bo Pang and Lillian Lee - 2008)

Opinion Mining Tutorial [ pdf ] 
This is a nice and easy-to-follow set of slides on Opinion Mining. The main focus in these slides is the use of heuristics / data mining based approaches to  opinion mining. It does not really cover some of the more recent probabilistic / learning based approaches, but it gives a fairly good introduction to Opinion Mining. (By Bing Liu)

Survey: Opinion Mining and Summarization  [pdf
This survey zooms into recent research in the area of opinion summarization, which is related to generating effective summaries of opinions so that users can get a quick understanding of the underlying sentiments. Since there are various formats of summaries, the survey breaks down the approaches into the commonly studied aspect-based summariztion and non-aspect based ones (which includes visualization, contrastive summarization and text summarization of opinions).  (By Kim et al - 2011)

Interesting tasks within Opinion Mining and Sentiment Analysis  [link
A one page summary of the various tasks within opinion mining and related areas. (By Kavita Ganesan)

Automatic Text Summarization

A Survey on Automatic Text Summarization [pdf
This is a well written survey about text summarization. The focus of this survey is mainly on techniques in extractive summarization. The authors talk about techniques used in single document summarization, multi-document summarization and also include a nice section on evaluation methods. There is one section that talks about sentence compression which can be considered a form of abstractive summarization.
(Dipanjan Das and André F. T. Martins. 2007)

Text Summarization an Overview [pdf 
This article contains nice descriptions of the various text summarization methods used. The explanation is fairly intuitive. It also attempts to classify the summarization methods in several ways (e.g. abstractive vs extractive) - which is very useful. (Elena Lloret, 2008)

Query Based Text Summarization Tutoria[view
This is a nice deck of slides summarizing the area of text summarization. Easy to follow and has a lot of useful information.
(Mariana Damova)

Abstractive Summarization

More info here
 

Information Retrieval

Stanford IR/NLP Book  [ read online ] [ pdf ] 
A very good reference point for IR/NLP tasks. I would recommend this to anyone who is getting in to the IR field. The concepts are well explained and easy to understand. (By Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze)

Information Retrieval - a Survey  [ pdf ] 
A good survey on classic IR approaches. Topics include VSM, Bayesian, Term Weighting..etc  (By Ed Greengrass 2000)

Statistical Language Models  [ pdf ] 
This book systematically reviews the large body of literature on applying statistical language models to information retrieval with an emphasis on the underlying principles, empirically effective language models, and language models developed for non-traditional retrieval tasks. Recommended to those who want to get in-depth knowledge on language model based retrieval approaches. (By ChengXiang Zhai 2008)

Search User Interfaces  [ read online ] [ book ] 
This book talks about the design of search user interfaces, how to evaluate search interfaces, effective methods of presentation and other useful tips that relates to users of a search system. This book is ideal for those who are interested in designing, studying or improving upon search systems from the user's perspective. (By Marti Hearst 2009)

Recommended Reading for IR Research Students  [ pdf ]
This paper highlights some of the very core papers that should be read if you are in the IR field. This is a good place to start if you need a refresher or are a new student.

Faceted search
 [ pdf ]
(Daniel Tunkelang)