Java Language – 213 – NLP Libraries in Java

Natural Language Processing (NLP) – NLP Libraries in Java

Natural Language Processing (NLP) is a subfield of artificial intelligence that focuses on the interaction between computers and human language. Java offers a range of NLP libraries and tools that enable developers to perform various NLP tasks. In this article, we’ll explore some popular NLP libraries in Java, discuss their features, and provide code examples to demonstrate their usage.

1. Introduction to NLP Libraries in Java

Java has gained popularity in the field of NLP due to its portability, extensive community support, and the availability of powerful libraries. These libraries help developers extract meaning from text, perform sentiment analysis, entity recognition, and more.

2. Stanford NLP Library

The Stanford NLP library is a widely used open-source NLP library written in Java. It provides tools for various NLP tasks, including part-of-speech tagging, named entity recognition, and dependency parsing. Below is a Java code example using the Stanford NLP library for named entity recognition:


import edu.stanford.nlp.ie.crf.CRFClassifier;
import edu.stanford.nlp.util.CoreMap;

public class StanfordNERExample {
    public static void main(String[] args) {
        String serializedClassifier = "english.all.3class.distsim.crf.ser.gz";
        CRFClassifier<CoreMap> classifier = CRFClassifier.getClassifierNoExceptions(serializedClassifier);
        String text = "Barack Obama was born in Hawaii.";
        System.out.println(classifier.classifyToString(text));
    }
}
3. OpenNLP

OpenNLP is another popular open-source NLP library for Java. It provides tools for tokenization, sentence detection, part-of-speech tagging, and more. Here’s an example of tokenization using the OpenNLP library:


import opennlp.tools.tokenize.Tokenizer;
import opennlp.tools.tokenize.TokenizerME;
import opennlp.tools.tokenize.TokenizerModel;

public class OpenNLPTokenizerExample {
    public static void main(String[] args) {
        try {
            InputStream modelIn = new FileInputStream("en-token.bin");
            TokenizerModel model = new TokenizerModel(modelIn);
            Tokenizer tokenizer = new TokenizerME(model);
            String text = "Natural language processing is amazing.";
            String[] tokens = tokenizer.tokenize(text);
            for (String token : tokens) {
                System.out.println(token);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
4. Apache OpenNLP

Apache OpenNLP is an Apache Software Foundation project that provides machine learning-based NLP tools. It supports tokenization, sentence splitting, part-of-speech tagging, and more. Below is a code example of part-of-speech tagging using Apache OpenNLP:


import opennlp.tools.postag.POSModel;
import opennlp.tools.postag.POSTaggerME;
import opennlp.tools.util.ObjectStream;
import opennlp.tools.util.PlainTextByLineStream;
import opennlp.tools.util.Span;

public class ApacheOpenNLPPOSTagger {
    public static void main(String[] args) throws IOException {
        InputStream modelIn = new FileInputStream("en-pos-maxent.bin");
        POSModel model = new POSModel(modelIn);
        POSTaggerME tagger = new POSTaggerME(model);
        String[] sentence = new String[]{"Natural", "language", "processing", "is", "powerful"};
        String[] tags = tagger.tag(sentence);
        for (int i = 0; i < sentence.length; i++) {
            System.out.println(sentence[i] + "/" + tags[i]);
        }
    }
}
5. Apache Lucene

While Apache Lucene is primarily known for full-text search capabilities, it also offers text analysis libraries that can be used for NLP tasks. You can perform tasks such as tokenization, stemming, and text analysis using Lucene. Here’s an example of tokenization using Apache Lucene:


import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.tokenattributes.CharTermAttribute;
import org.apache.lucene.util.Version;

public class LuceneTokenizerExample {
    public static void main(String[] args) {
        StandardAnalyzer analyzer = new StandardAnalyzer(Version.LUCENE_7_7_0);
        String text = "Natural language processing with Lucene is efficient.";
        try (TokenStream stream = analyzer.tokenStream("text", new StringReader(text))) {
            CharTermAttribute termAtt = stream.addAttribute(CharTermAttribute.class);
            stream.reset();
            while (stream.incrementToken()) {
                System.out.println(termAtt.toString());
            }
            stream.end();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}
6. Conclusion

Java offers a wide array of NLP libraries that empower developers to work on various text processing and analysis tasks. Whether you need to extract entities, classify text, or perform linguistic analysis, these libraries provide the tools and functionalities to support your NLP projects.