Wikitag
From Devwiki
Tomaž Šolc
A common use of Wikipedia in blog and semi-professional web publishing is to provide explanations for various terms in published texts with which the reader may not be familiar. This is usually done in form of in-text hyperlinks to relevant pages in Wikipedia. Building on the existing research we have created a system that automatically adds such explanatory links to a plain text article. Combined with structured data extracted from linked Wikipedia articles, the system can also provide links to other websites concerning the subject and semantic information that can be used in any further processing.
In the presentation we will talk about the research that resulted in Wikitag, a system that is currently running as part of Zemanta service. An overview of the algorithm will be given with descriptions of its basic building blocks and discussion of the primary problems we encountered: how to get link candidates, automatically disambiguate terms, estimate link desirability and select only the most appropriate links for the final result. We will also briefly mention the methods used to reduce the noise inherent in a publicly editable text and give some suggestions on how Wikimedia could help to make its wikis more suitable for computer processing and some of our plans for further research into this topic.

