Distant Supervision: Mike Mintz, Steven Bills, Rion Snow and Dan Jurafsky

919 Words 4 Pages
In the research paper \Distant supervision for relation extraction without labeled data", the authors Mike Mintz, Steven Bills, Rion Snow and Dan Jurafsky investigate an alternate paradigm
[called distant supervision] for relation extraction. This algorithm combines the advantages of Super- vised Information Extraction and Unsupervised Information Extraction to achieve greater precision.
Apart from this, they also analyze feature performance for better understanding of the roles of lexical and syntactic features. Some of the key observations from this research are :
1) A combination of syntactic and lexical features o ffers a substantial improvement in relation extraction precision over either of these feature sets on its own.
…show more content…
The paper claims that the entity Steven Spielberg can either be a director or a CEO and thus the features are inconclusive. If we instead tag Steven Spielberg as person/director rather than just person this confusion can be avoided.

For constructing the classifier, negative training data is needed. For this, the authors of the paper create a feature vector during the training phase for an unrelated relation by randomly selecting entity pairs that do not appear in any freebase relation and extract features for them. Real care must be taken while randomly selecting the unrelated relations as skewed distribution might result in a decreased precision.

Consider the statements \Astronomer Edwin Hubble was born in Marsh eld, Missouri" and \As- tronomer Edwin Hubble took birth in Marsh eld, Missouri". Both these sentences convey the exact same thing. Similarly, consider the statements \The critic wrote a scathing review" and \A scathing review was written by the critic". One statement is in active voice and the other in passive voice.
Even though these sentences theoretically convey the same thing, in order to extract relations from them different set of features must be conjuncted. This is a computationally expensive process.
Instead of this, if we can identify the correlation between these sentences beforehand, we could reduce the number of computations by almost half.

In the research paper \Answer Extraction as

Related Documents