Nov 24, 2014

API for Word Counts and N-Gram Counts

N-grams are essentially a set of co-occurring words within a given window. It has many uses in NLP and text mining right from summarizing to feature extraction in supervised machine learning tasks. Here is a basic tutorial on what n-grams are. In this article, we will focus on a Web API for generating N-Grams which is available on mashape: https://www.mashape.com/rxnlp/text-mining-and-nlp/.

These are the input parameters:

  • text for n-gram generation, 
  • case-sensitive: true/false 
  • n-gram size.

Below is a sample output from the API for the text "I love rainy days. How I wish it was raining ! How I wish it was snowing !". As you can see, you have the n-grams and the count of n-grams in descending order. Note that the API also does basic sentencing to generate sentences from text so that n-grams can be computed. Also note that this tool is language-neutral so you can generate n-grams for multiple languages. To force your sentences to be correctly split, you would probably need to ensure that punctuation is available at least between two consequent sentences.