Publications:
Michael
Färber, Alexander Thiemann, and Adam Jatowt
To Cite, or Not to Cite? Detecting Citation Contexts in Text
Abstract: Recommending citations
for scientific texts and other texts such as news articles has recently
attracted considerable amount of attention. However, typically, the
existing approaches for citation recommendation do not explicitly
incorporate the question of whether a given context (e.g., a sentence),
for which citations are to be recommended, actually "deserves"
citations. Determining the "cite-worthiness" for each potential
citation context as a step before the actual citation recommendation is
beneficial, as (1) it can reduce the number of costly recommendation
computations to a minimum, and (2) it can more closely approximate
human-citing behavior, since neither too many nor too few
recommendations are provided to the user. In this paper, we present a
method based on a convolutional recurrent neural network for
classifying potential citation contexts. Our experiments show that we
can significantly outperform the baseline solution and reduce the
number of citation recommendations to about 1/10.
Used Data sets (arXiv CS, Scholarly
Dataset 2, ACL-ARC):
Source code:
https://github.com/agrafix/grabcite-net
https://github.com/agrafix/grabcite
For citing:
Michael Färber, Alexander Thiemann, and Adam Jatowt. "To Cite, or
Not to Cite? Detecting Citation Contexts in Text". In: Proceedings of
the 40th European Conference on Information Retrieval, ECIR'18, 2018.
@inproceedings{FaerberECIR2018shortpaper,
author
= {Michael F{\"{a}}rber and Alexander Thiemann and Adam Jatowt},
title
= "{To Cite, or Not to Cite? Detecting Citation Contexts in Text}",
booktitle = "{Proceedings of the 40th European Conference on Information Retrieval}",
series
= "{{ECIR} 2018}",
year
= {2018}
}
Michael
Färber, Alexander Thiemann, and Adam Jatowt
CITEWERTs: A System Combining Cite-Worthiness with Citation
Recommendation
Abstract: Due to the vast amount
of publications appearing in the various scientific disciplines, there
is a need for automatically recommending citations for text segments of
scientific documents. Surprisingly, only few demonstrations of
citation-based recommender systems have been proposed so far.
Moreover, existing solutions either do not consider the raw textual
context or they recommend citations for predefined citation contexts or
just for whole documents. In contrast to them, we propose a novel
two-step architecture: First, given some input text, our system
determines for each potential citation context, which is typically a
sentence long, if it is actually "cite-worthy." When this is the case,
secondly, our system recommends citations for that context. Given this
architecture, in our demonstration we show how we can guide the user to
only those sentences that deserve citations and how to present
recommended citations for single sentences. In this way, we reduce the
user's need to review too many sentences and recommendations.
Online Demo:
Manual assessements of citations in DRI
corpus:
Source code:
https://github.com/agrafix/grabcite-net
https://github.com/agrafix/grabcite
For citing:
Michael Färber, Alexander Thiemann, and Adam Jatowt. "CITEWERTs:
A System Combining Cite-Worthiness with Citation Recommendation". In:
Proceedings of
the 40th European Conference on Information Retrieval, ECIR'18, 2018.
@inproceedings{FaerberECIR2018demopaper,
author = {Michael F{\"{a}}rber and Alexander Thiemann and Adam Jatowt},
title = "{CITEWERTs: A System Combining Cite-Worthiness with Citation Recommendation}",
booktitle = "{Proceedings of the 40th European Conference on Information Retrieval}",
series = "{{ECIR} 2018}",
year = {2018}
}
Michael
Färber, Alexander Thiemann, and Adam Jatowt
A High-Quality Gold Standard for Citation-based Tasks
Abstract: Analyzing
and recommending citations with their specific citation contexts have
recently received much attention due to the growing number of available
publications. Although data sets such as CiteSeerX have been created
for evaluating approaches for such tasks, those data sets exhibit
striking defects. This is understandable if one considers that both
information extraction and entity linking as well as entity resolution
need to be performed. In this paper, we propose a new evaluation data
set for citation-dependent tasks based on arXiv.org publications. Our
data set is characterized by the fact that it exhibits almost zero
noise in the extracted content and that all citations are linked to
their correct publications. Besides the pure content, available on a
sentence-basis, cited publications are annotated directly in the text
via global identifiers. As far as possible, referenced publications are
further linked to DBLP. Our data set consists of over 15M sentences and
is freely available for research purposes. It can be used for training
and testing citation-based tasks, such as recommending citations,
determining the functions or importance of citations, and summarizing
documents based on their citations.
Data:
For citing:
Michael Färber, Alexander Thiemann, and Adam Jatowt. "A High-Quality Gold Standard for Citation-based Tasks". In:
Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC'18, 2018.
@inproceedings{FaerberLREC2018,
author
= {Michael F{\"{a}}rber and Alexander Thiemann and Adam Jatowt},
title
= "{A High-Quality Gold Standard for Citation-based Tasks}",
booktitle = "{Proceedings of the 11th International Conference on Language Resources and Evaluation}",
series
= "{{LREC} 2018}",
year
= {2018}
}