Publications:



Michael Färber, Alexander Thiemann, and Adam Jatowt
To Cite, or Not to Cite? Detecting Citation Contexts in Text

Abstract: Recommending citations for scientific texts and other texts such as news articles has recently attracted considerable amount of attention. However, typically, the existing approaches for citation recommendation do not explicitly incorporate the question of whether a given context (e.g., a sentence), for which citations are to be recommended, actually "deserves" citations. Determining the "cite-worthiness" for each potential citation context as a step before the actual citation recommendation is beneficial, as (1) it can reduce the number of costly recommendation computations to a minimum, and (2) it can more closely approximate human-citing behavior, since neither too many nor too few recommendations are provided to the user. In this paper, we present a method based on a convolutional recurrent neural network for classifying potential citation contexts. Our experiments show that we can significantly outperform the baseline solution and reduce the number of citation recommendations to about 1/10.

Used Data sets (arXiv CS, Scholarly Dataset 2, ACL-ARC):

Source code:

    https://github.com/agrafix/grabcite-net
    https://github.com/agrafix/grabcite


For citing:
Michael Färber, Alexander Thiemann, and Adam Jatowt. "To Cite, or Not to Cite? Detecting Citation Contexts in Text". In: Proceedings of the 40th European Conference on Information Retrieval, ECIR'18, 2018.

@inproceedings{FaerberECIR2018shortpaper,
 author     = {Michael F{\"{a}}rber and Alexander Thiemann and Adam Jatowt},
 title      = "{To Cite, or Not to Cite? Detecting Citation Contexts in Text}",
 booktitle  = "{Proceedings of the 40th European Conference on Information Retrieval}",
 series    = "{{ECIR} 2018}",
 year       = {2018}
 }




Michael Färber, Alexander Thiemann, and Adam Jatowt
CITEWERTs: A System Combining Cite-Worthiness with Citation Recommendation

Abstract: Due to the vast amount of publications appearing in the various scientific disciplines, there is a need for automatically recommending citations for text segments of scientific documents. Surprisingly, only few demonstrations of citation-based recommender systems have been proposed so far.  Moreover, existing solutions either do not consider the raw textual context or they recommend citations for predefined citation contexts or just for whole documents. In contrast to them, we propose a novel two-step architecture: First, given some input text, our system determines for each potential citation context, which is typically a sentence long, if it is actually "cite-worthy." When this is the case, secondly, our system recommends citations for that context. Given this architecture, in our demonstration we show how we can guide the user to only those sentences that  deserve citations and how to present recommended citations for single sentences. In this way, we reduce the user's need to review too many sentences and recommendations.

Online Demo:

Manual assessements of citations in DRI corpus:

Source code:

    https://github.com/agrafix/grabcite-net
    https://github.com/agrafix/grabcite
 

For citing:
Michael Färber, Alexander Thiemann, and Adam Jatowt. "CITEWERTs: A System Combining Cite-Worthiness with Citation Recommendation". In: Proceedings of the 40th European Conference on Information Retrieval, ECIR'18, 2018.

@inproceedings{FaerberECIR2018demopaper,
 author     = {Michael F{\"{a}}rber and Alexander Thiemann and Adam Jatowt},
 title      = "{CITEWERTs: A System Combining Cite-Worthiness with Citation Recommendation}",
 booktitle  = "{Proceedings of the 40th European Conference on Information Retrieval}",
 series    = "{{ECIR} 2018}",
 year       = {2018}
}






Michael Färber, Alexander Thiemann, and Adam Jatowt
A High-Quality Gold Standard for Citation-based Tasks

Abstract: Analyzing and recommending citations with their specific citation contexts have recently received much attention due to the growing number of available publications. Although data sets such as CiteSeerX have been created for evaluating approaches for such tasks, those data sets exhibit striking defects. This is understandable if one considers that both information extraction and entity linking as well as entity resolution need to be performed. In this paper, we propose a new evaluation data set for citation-dependent tasks based on arXiv.org publications. Our data set is characterized by the fact that it exhibits almost zero noise in the extracted content and that all citations are linked to their correct publications. Besides the pure content, available on a sentence-basis, cited publications are annotated directly in the text via global identifiers. As far as possible, referenced publications are further linked to DBLP. Our data set consists of over 15M sentences and is freely available for research purposes. It can be used for training and testing citation-based tasks, such as recommending citations, determining the functions or importance of citations, and summarizing documents based on their citations.

Data:

 

For citing:
Michael Färber, Alexander Thiemann, and Adam Jatowt. "A High-Quality Gold Standard for Citation-based Tasks". In: Proceedings of the 11th International Conference on Language Resources and Evaluation, LREC'18, 2018.

@inproceedings{FaerberLREC2018,
 author     = {Michael F{\"{a}}rber and Alexander Thiemann and Adam Jatowt},
 title      = "{A High-Quality Gold Standard for Citation-based Tasks}",
 booktitle  = "{Proceedings of the 11th International Conference on Language Resources and Evaluation}",
 series    = "{{LREC} 2018}",
 year       = {2018}
}




By Michael Färber, Alexander Thiemann, and Adam Jatowt, 2018