Deep Bayes Factor Scoring for Authorship Verification
Publication Date
9-2020
Conference/Sponsorship/Institution
Notebook for PAN 2020 at the Conference and Labs of the Evaluation Forum (CLEF), Thessaloniki, Greece
Description
The PAN 2020 authorship verification (AV) challenge focuses on a cross-topic/closed-set AV task over a collection of fanfiction texts. Fanfiction is a fan-written extension of a storyline in which a so-called fandom topic describes the principal subject of the document. The data provided in the PAN 2020 AV task is quite challenging because authors of texts across multiple/different fandom topics are included. In this work, we present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure: A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector. At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space. We also provide text preprocessing strategies to deal with the cross-topic issue.
Type
Article
Department
Electrical Engineering
Comments
The implementation of the algorithm described in the paper won the 2020 PAN Authorship Verification Challenge. The authors' team with name “boenninghoff20” placed first in both the large dataset and the small dataset challenge against 10 international research groups. The competition results are published at:
https://pan.webis.de/clef20/pan20-web/author-identification.html
Link to published version
https://arxiv.org/abs/2008.10105
Recommended Citation
“Deep Bayes Factor Scoring for Authorship Verification,” by B. Boenninghoff, J. Rupp, R. M. Nickel, and D. Kolossa, Notebook for PAN 2020 at the Conference and Labs of the Evaluation Forum (CLEF), Thessaloniki, Greece, September 22-25, 2020.