Apache OpenNLP

Apache OpenNLP is a natural language processing engine based on supervised machine learning and has been officially supporting Japanese since version 1.9.0.

It supports a wide variety of natural language processing tasks including the following main functions.

Named Entity Extraction

Text written in natural language, such as Japanese, contains proper nouns – name of person, name of place, name of organization, etc. “Named Entity Extraction” is a technique that extracts these proper nouns with attributes (proper noun types). Using this technique, for example, in combination with applications like search engine will help improve search precision.

You can use a wide variety of types for proper noun type including general nouns such as name of people or more specific ones such as name of disease, name of dish, and name of event according to the application.

Document Classification

This function automatically provides article and/or document written in natural language, such as newspaper article, with classification label. For an advanced example, on a site where you can post documents could have those documents automatically labeled as “sport”,”performing art”,”politics”, or “economy”.

This technique can classify not only text document but can also be applied to enable automated credit administration and spam mail decision.

Language Identification

This is a function that automatically identifies the language that is used to write natural language text. It of course supports Japanese and English as well as 103 languages including German, French, Russian, Arabic, Chinese, and Korean.

Links

Japanese topics

KandaSearch

KandaSearch はクラウド型企業向け検索エンジンサービスです。
オープンAPIでカスタマイズが自由にできます。

  • セマンティックサーチ

    人間が理解するように検索エンジンがテキストや画像を理解して検索できます。

  • クローラー

    検索対象文書を収集するWebクローラーが使えます。

  • 簡単操作のUIと豊富なライブラリー

    検索や辞書UIに加え、定義済み専門用語辞書/類義語辞書やプラグインがあります。

  • ローコードで低コスト導入

    検索UIで使い勝手を調整した後、Webアプリケーションを自動生成できます。

セミナー

企業が検索エンジンを選定する際のポイントから、
実際の導入デモをお客様ご自身でご体験!