The novel technology, invented by Prof. Mark Last, Dr. Marina Litvak, and Dr. Menahem Friedman at the Department of Software and Information Systems Engineering of Ben-Gurion University, provides language-independent summaries of texts, based on a genetic algorithm that ranks document sentences, using statistical sentence features, which can be calculated for sentences in any language, and then extracts top–ranking sentences into a summary. The method, called MUSE – Multilingual Sentence Extractor, was tested on nine languages: English, Hebrew, Arabic, Persian, Russian, Chinese, German, French, and Spanish, and its summarization quality was evaluated on four languages: English, Hebrew, Arabic and Persian showing a high level of similarity to human-generated summaries. Experimental results show that after an initial training of the algorithms on an annotated corpus of summarized documents, where each document is accompanied by several human generated summaries, the software does not need to be retrained on a summarization corpus in each new language, and the same sentence-ranking model can be used across several languages.
“Extractive summarization, which selects a subset of the most relevant sentences from a source text, via ranking them by a relevance score and selecting the top-ranking sentences into a summary, is invaluable for being able to quickly summarize large quantities of text in a language-independent manner. This ability is crucial for search engines as well as other end-users, such as researchers, libraries and the media.”, says Prof. Mark Last.
With the huge increase of on-line textual data, the need arises for an automated method for extracting a summary from a text file, such as an article or an interview, for further processing. This, combined with ever shorter available time to evaluate the vast amount of published text, raise the need for an automated methodology for summary extraction from written texts. Most available solutions are language dependent and require training the algorithms on large volumes of text. Now BGN Technologies, the technology transfer company of Ben-Gurion University of the Negev, introduces a novel, automated and language-independent tool for summarizing text. The method is applicable for extraction of articles, magazines and databases within the media itself and by users of such media including libraries, academic research engines and general search engines.
Zafrir Levy, Senior VP Business Development of BGN Technologies, said “This tool will be a valuable addition to our ability to benefit from the vast amounts of text available online. After filing a patent to protect the technology, we are currently looking for potential partners for further development and commercialization of this promising invention.”
SOURCE BGN Technologies