Friday, November 14, 2008

FrameNet

http://www.icsi.berkeley.edu/news/2007/framenet.html

Featured Research: FrameNet


The FrameNet project is one of the longest-running projects at ICSI. Led by Professor Charles Fillmore and Dr. Collin Baker, FrameNet researchers are creating "an online lexical resource for English, based on frame semantics and supported by corpus evidence." The theories of frame semantics used in the FrameNet project originated with Professor Charles Fillmore, while at UC Berkeley, prior to his work at ICSI.

Frame semantic theory categorizes words and ideas based on frames that the words evoke. Some frames are quite simple, such as the Placing frame, which involves an object, the location where it goes, and a word that suggests the object is being put in its place - for example, put, lay, shelve, or file.


In the sample sentence below, the words highlighted in black are frame-evoking words.
  • Thought evokes the Awareness/Cognition frame,
  • might evokes the Likelihood frame, and
  • die evokes the Death frame.
The color-highlighted words are elements of the frame.
  • In the Cognition frame, for example, there is the person who is thinking - I - and the thought - that I might die.
  • In the Likelihood frame, I die is the thing that might happen.
  • In the Death frame, I is the person who may die.

In the mapped image below, the relationship between the frame evoking words and their frame elements is shown in more detail, using the same sentence.



FrameNet annotators strive to document "the range of semantic and syntactic combinatory possibilities (valences) of each word in each of its senses, FrameNet annotators strive to document "the range of semantic and syntactic combinatory possibilities (valences) of each word in each of its senses, through computer-assisted annotation of example sentences".

These fully annotated examples are displayed automatically and are being used in a variety of artificial intelligence and Natural Language Processing (NLP) applications.

When using computers to extract semantic information for NLP tasks, FrameNet's semantic mapping provides a means for the computer to extract meaning from a string of words.

Currently, the FrameNet database contains over 10,000 lexical units (word senses), of which more than 6,100 are fully annotated. More than 825 semantic frames are represented and exemplified in over 140,000 sentences.

The data is available through the FrameNet web site and is already being used by researchers around the world, including NLP researchers at ICSI. Srini Narayanan, head of the AI Group, used FrameNet to aid in semantic information detection in the ongoing question-answering project known as AQUAINT, and a new effort by Adam Janin of the Speech Group and Michael Ellsworth of the AI Group will focus on paraphrasing, using FrameNet data to provide semantic information. Last year, Thomas Schmidt, then a visiting German postdoc, created a multi-lingual dictionary of soccer terms, called Kicktionary, using a FrameNet-style semantic analysis of each term. (See www.kicktionary.de for more information.)

A significant improvement to FrameNet is the development of tools to automate much of the annotation process. This is essential to enable the widespread use of FrameNet data in NLP research, as it will allow NLP researchers to quickly annotate the text they are using in their project. FrameNet developers are working to create software that will annotate semantic frame information, as well as collaborating with scientists working on practical applications for FrameNet data.

One such collaboration is with researchers led by Nancy Ide at Vassar, who are working on development of a large corpus of American English called the American National Corpus. The corpus includes a wide variety of language use, both speech and text, covering everything from sermons to sitcoms. The FrameNet team is working on a FrameNet-style analysis of part of this corpus, to provide semantic information for use of the corpus in NLP research. Another collaboration is with a team led by Christiane Fellbaum at Princeton University. Fellbaum's team developed WordNet, an online dictionary which provides less detailed information than FrameNet but for many more words. The NSF-funded collaboration between FrameNet and WordNet will explore theoretical issues involved in aligning the two resources.

......
......

In recent years, FrameNet projects in several other languages have begun. ICSI regularly hosts visiting scientists working to create FrameNet databases in their native languages, which to date include Spanish, Japanese, and German.