Tarek Saier


I am a computer scientist and software developer with a passion for languages;
interested in NLP, digital libraries, and web technologies.

I enjoy projects that involve both technical aspects requiring analytical thinking
as well as communication facets such as inter-cultural and cross-functional communication.


2019/08 – present

Research Associate, Karlsruhe Institute of Technology, Karlsruhe, Germany
Research in the area of Data Mining and Information Extraction. / Thesis supervision. / Lecture “Knowledge Discovery”. / Administration of research group’s IT infrastructure.

Major projects
PhD research - Data mining and information extraction from scientific publications.
➟ result: 2 journal articles, 5 conference papers (all first author, peer reviewed)
➟ skills: NLP/ML (sklearn, Hugging Face libs, LLM deployment/use, etc.) scientific writing, public speaking
unarXive - 2M document NLP corpus generated from LaTeX papers.
➟ result: 150+ GitHub stars, 50+ citations, widespread use
➟ skills: Python, LaTeX, data curation and dissemination, work on HPCs.
ChemKB - A community-driven platform for curating and linking chemical research findings.
➟ result: successful platform creation, pilot application in the area of photocatalytic CO2 conversion
➟ skills: Semantic MediaWiki, knowledge modeling, cross-functional collaboration
IT administration - Administration of compute servers, ML technology stack, wikis, etc.
➟ result: new GPU server, Wiki migration, continuous adaption to changes of exterior systems
➟ skills: Linux system administration, Docker, problem solving, communication
2022/08 – 2022/10
Visiting Researcher, University of Tokyo, Tokyo, Japan
Research on scholarly information extraction at the Miyao Group for NLP and Computational Linguistics.
Project outcome published at ECIR 2024 and in the Special Interest Group on Maths Linguistics.
2022/07 – 2022/08
Visiting Researcher, National University of Singapore, Singapore
Research on scholarly information extraction at the Web Information Retrieval / NLP Group.
Project outcome published at JCDL 2023.
2017/10 – 2018/07

Research Intern, National Institute of Informatics, Tokyo, Japan
Research in the field of Digital Humanities, supervised by Prof. Dr. Asanobu Kitamoto.

Major project
IIIF Curation Platform (backend) - User-driven web platform for the curation of digitized cultural heritage; used in research, lectures (University of Tokyo), and museum exhibits (The Museum of Art, Kochi).
➟ result: conference paper co-authorship, academic award by the Japan Society for Digital Archives
➟ skills: Flask, API development, Docker, IIIF, Web standards, Japanese
2017/04 – 2017/08
Python Developer, geOps e.K., Freiburg, Germany
Part-time software development in the field of geoinformatics.
2012/09 – 2013/03
IT Security Intern, Schutzwerk GmbH, Ulm, Germany
Developing tools for and supporting IT security audits.


2019/08 – present

PhD, Karlsruhe Institut of Technology, Karlsruhe, Germany
Web Science group of the Institute for Applied Informatics and Formal Description Methods. Supervised by interim professor Dr. Michael Färber and Prof. Dr. York Sure-Vetter.

“Data Mining and Information Extraction Methods for Large-Scale High Quality Representations of Scientific Publications” (preliminary)
2014/09 – 2019/05

Master of Science, Albert Ludwig University of Freiburg, Freiburg, Germany
Graduate studies in computer science with a focus on information systems. / Elective lectures in psychology.

“Semantic Approaches to Citation Reommendation”, Grade 1.0
Final grade
2015/09 – 2016/09

Study abroad, Matsuyama University, Matsuyama, Japan
Graduate course lectures in intercultural communication and linguistics. / Japanese language classes for exchange students. / Supervision of a seminar group in “Game development and programming.”

Final grade
2010/09 – 2014/08

Bachelor of Science, Hochschule Furtwangen, Furtwangen, Germany
Undergraduate studies in computer networking with a focus on IT security. / Additional lectures in Japanese, English and psychology.

“Mobile Application Secuirty Audit: Android”, Grade 1.0
Final grade

Awards and Scholarships

2022 Academic Award (Systems · Platforms) - Japan Society for Digital Archives
For the IIIF Curation Platform project.
2022/08 – 2022/10
Research Travel Grant - Karlsruhe House of Young Scientists
For a research stay at the University of Tokyo.
2022/07 – 2022/08
Networking Grant - Karlsruhe House of Young Scientists
For a research stay at the National University of Singapore.
2022/04 – 2023/07
Software Campus Grant - European Institute of Innovation and Technology, ICT Labs
100,000 € grant by the German Federal Ministry of Education and Research for leading a research project.
2015/09 – 2016/08
Study abroad stipend - University of Matsuyama
Scholarship of 30,000 Yen/month.


[⧳] journal article / [⧲] conference proceedings / [○] workshop proceedings
2024/03 [⧲]
Tarek Saier, Mayumi Ohta, Takuto Asakura and Michael Färber: HyperPIE: Hyperparameter Information Extraction from Scientific Publications. In: Proceedings of the 46th European Conference on Information Retrieval (ECIR ’24), in press.
2023/06 [⧲]
Tarek Saier, Johan Krause, Michael Färber: unarXive 2022: All arXiv Publications Pre-Processed for NLP, Including Structured Full-Text and Citation Network. In: Proceedings of the 23rd ACM/IEEE Joint Conference on Digital Libraries (JCDL ’23), pp. 66–70. 
2023/06 [⧲]
Tarek Saier, Youxiang Dong, Michael Färber: CoCon: A Data Set on Combined Contextualized Research Artifact Use. In: Proceedings of the 23rd ACM/IEEE Joint Conference on Digital Libraries (JCDL ’23), pp. 47–50. 
2022/06 [○]
Tarek Saier, Meng Luan, Michael Färber: A Blocking-Based Approach to Enhance Large-Scale Reference Linking. In: Proceedings of the Workshop on Understanding Literature References in Academic Full Text (ULITE) at JCDL ’22, pp. 16–25. 
2022/06 [⧲]
Chifumi Nishioka, Michael Färber, Tarek Saier: The Influence of Author Affiliations on Preprint Citation Count. In: Proceedings of the 22nd ACM/IEEE Joint Conference on Digital Libraries (JCDL ’22), pp. 1–8. 
2022/04 [○]
Michael Färber, Christoph Braun, Nicholas Popovic, Tarek Saier, Kristian Noullet: Which Publications’ Metadata Are in Which Bibliographic Databases? A System for Exploration. In: Proceedings of the 12th International Workshop on Bibliometric-enhanced Information Retrieval (BIR@ECIR 2022), pp. 39–44. 
2022/02 [○]
Igor Shapiro, Tarek Saier, Michael Färber: Sequence Labeling for Citation Field Extraction from Cyrillic Script References. In: Proceedings of the AAAI Workshop on Scientific Document Understanding (SDU@AAAI’22), pp. 69–79. 
2021/12 [⧳]
Tarek Saier, Michael Färber, Tornike Tsereteli: Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Usage, and Impact. In: International Journal on Digital Libraries 23, pp. 179–195. 
2021/06 [○]
Johan Krause, Igor Shapiro, Tarek Saier, Michael Färber: Bootstrapping Multilingual Metadata Extraction: A Showcase in Cyrillic. In: Proceedings of the Second Workshop on Scholarly Document Processing. NAACL 2021, pp. 66–72. 
2020/11 [⧲]
Tarek Saier, Michael Färber: A Large-Scale Analysis of Cross-lingual Citations in English Papers. In: Digital Libraries at Times of Massive Societal Transition. ICADL 2020. Lecture Notes in Computer Science, vol 12504. 
2020/04 [⧲]
Tarek Saier, Michael Färber: Semantic Modelling of Citation Contexts for Context-Aware Citation Recommendation. In: Advances in Information Retrieval. ECIR 2020. Lecture Notes in Computer Science, vol 12035. 
2020/03 [⧳]
Tarek Saier, Michael Färber: unarXive: A Large Scholarly Data Set with Publications’ Full-Text, Annotated In-Text Citations, and Links to Metadata. In: Scientometrics 125, pp. 3085–3108. 
2019/04 [○]
Tarek Saier, Michael Färber: Bibliometric-Enhanced arXiv: A Data Set for Paper-Based and Citation-Based Tasks. In: Proceedings of the 8th International Workshop on Bibliometric-enhanced Information Retrieval (BIR@ECIR 2019), pp. 14–26. 
2018/12 [⧲]
Asanobu Kitamoto, Jun Homma, Tarek Saier, IIIF Curation Platform: Next Generation IIIF Open Platform Supporting User-Driven Image Sharing, IPSJ SIG Computers and the Humanities Symposium 2018, pp. 327–334. (in Japanese) 

Invited Talks

How the Karlsruhe Institute of Technology deals with the corona crisis. 19th Cyber symposium for the exchange concerning the current state of distance learning at universities, National Institute of Informatics, Japan. (in Japanese)

Community Involvement

2019/09 – present
Anki - Japanese Pitch Accent
Development and maintenance of a free, open source learning software add-on.
➟ result: Over 20,000 downloads
➟ skills: Python, Git, SVG, Japanese
2019/11 – present
Cultural exchange meetup
Initiation and organization of a monthly cultural exchange meetup for Japanese and people interested in Japan.
➟ result: Over 50 participants, meetup website is the top Google result for “カールスルーエ 日本” (“Karlsruhe Japan”)
➟ skills: Community management, intercultural communication, Japanese, Jekyll



Full Professional working proficiency

IELTS Academic: 8.0

Professional working proficiency

Kanji Kentei level 4
Japanese-Language Proficiency Test N1