Apache OpenNLP

Apache OpenNLP is a natural language processing engine based on supervised machine learning and has been officially supporting Japanese since version 1.9.0.

It supports a wide variety of natural language processing tasks including the following main functions.

Named Entity Extraction

Text written in natural language, such as Japanese, contains proper nouns – name of person, name of place, name of organization, etc. “Named Entity Extraction” is a technique that extracts these proper nouns with attributes (proper noun types). Using this technique, for example, in combination with applications like search engine will help improve search precision.

You can use a wide variety of types for proper noun type including general nouns such as name of people or more specific ones such as name of disease, name of dish, and name of event according to the application.

Document Classification

This function automatically provides article and/or document written in natural language, such as newspaper article, with classification label. For an advanced example, on a site where you can post documents could have those documents automatically labeled as “sport”,”performing art”,”politics”, or “economy”.

This technique can classify not only text document but can also be applied to enable automated credit administration and spam mail decision.

Language Identification

This is a function that automatically identifies the language that is used to write natural language text. It of course supports Japanese and English as well as 103 languages including German, French, Russian, Arabic, Chinese, and Korean.


Japanese topics




ロンウイットのApache Software Foundationコミッターが、情報検索の基礎、自然言語処理、そして、ユーザにとっての効果についてご説明させていただきます。