Deep Bayes Factor Scoring for Authorship Verification
Notebook for PAN 2020 at the Conference and Labs of the Evaluation Forum (CLEF), Thessaloniki, Greece
The PAN 2020 authorship verification (AV) challenge focuses on a cross-topic/closed-set AV task over a collection of fanfiction texts. Fanfiction is a fan-written extension of a storyline in which a so-called fandom topic describes the principal subject of the document. The data provided in the PAN 2020 AV task is quite challenging because authors of texts across multiple/different fandom topics are included. In this work, we present a hierarchical fusion of two well-known approaches into a single end-to-end learning procedure: A deep metric learning framework at the bottom aims to learn a pseudo-metric that maps a document of variable length onto a fixed-sized feature vector. At the top, we incorporate a probabilistic layer to perform Bayes factor scoring in the learned metric space. We also provide text preprocessing strategies to deal with the cross-topic issue.
The implementation of the algorithm described in the paper won the 2020 PAN Authorship Verification Challenge. The authors' team with name “boenninghoff20” placed first in both the large dataset and the small dataset challenge against 10 international research groups. The competition results are published at:
“Deep Bayes Factor Scoring for Authorship Verification,” by B. Boenninghoff, J. Rupp, R. M. Nickel, and D. Kolossa, Notebook for PAN 2020 at the Conference and Labs of the Evaluation Forum (CLEF), Thessaloniki, Greece, September 22-25, 2020.