Presentations
Presentations
Is training deep neural embeddings worth the effort? A preliminary investigation of different representation methods for semantic similarity tasks in Buddhist Chinese and related languages of the Buddhist tradition
June 2025
Online workshop "Navigating Indra’s Net: Digital Approaches to Text Reuse-based Inter-textuality in Pre-Modern East Asian Texts" at the Hanmun Lab, Ruhr-Universität Bochum
This presentation is part of an online workshop on digital approaches to intertextuality in pre-modern East Asian texts. The talk will provide a preliminary investigation of different representation methods for semantic similarity tasks in Buddhist Chinese and related languages of the Buddhist tradition.
From Sthiramati to Dharmamitra: Developing Digital Tools for a New Age of Philological Buddhist Studies
June 2025
DH International Workshop at Keio University, Tokyo, Japan
This presentation was part of a workshop at Keio University, co-organized by Kakenhi Special Promotion Research "Compilation of the Reiwa Daizokyo as a Digital Research Infrastructure - Presentation of a Research Infrastructure Construction Model for Next-Generation Humanities (JP25H00001)" and the Research Infrastructure Hub, Research and Development Project for the DX of Humanities and Social Sciences.
The workshop featured two lectures and a hands-on session by Sebastian Nehrdich: 1. "From Sthiramati to Dharmamitra: Developing Digital Tools for a New Age of Philological Buddhist Studies" 2. "Practical Application of the Various MITRA Tools for Philological Research"
The event explored the latest developments in the Dharmamitra project, which applies the extensive computing resources of the UC Berkeley AI Research Lab to the machine translation of Buddhist scriptures. In addition to a technical overview, the workshop also delved into the career path of Sebastian Nehrdich, from his beginnings as a Buddhist studies scholar to his current work in applied research, offering insights for early-career researchers in the humanities.
Machine Translation for Asian Studies
March 2025
Annual Conference of the Association of Asian Studies, Columbus, Ohio
With the advent of large language models, machine translation (MT) has become a widely used, but little understood, tool for research, language learning, and communication. GPT, Claude, and many other model series allow researchers now to access literature in different languages, and even translate primary texts composed in classical languages with few resources available. But how to evaluate the translation output of such machines? How to decide which model is the best for my own research purposes and how to tweak it? How will MT impact language learning, which is fundamental for Asian Studies?
In the first part of this session, we will give an overview of the MT landscape for Asian Studies. Participants will learn how to use online interfaces and APIs to access language models for their own research needs. We will discuss different types of prompts and user defined parameters such as temperature or token length. We will demonstrate that results can differ radically according to parameterization and prompting, both within the same model and between models.
In the second part we will present Dharmamitra (dharmamitra.org), an open-source model trained on low-resource languages relevant for Asian Studies. Using examples from Sanskrit, Pali, Tibetan, and Classical Chinese we will show how even difficult, ancient languages are slowly becoming tractable for MT and what caveats to consider when using such language models to translate them.
This is a hands-on workshop with exercises. Participants will have to bring their computers. Programming experience is not needed.
Organizers - Marcus Bingenheimer, Temple University - Sebastian Nehrdich, University of California, Berkeley
Chair - Marcus Bingenheimer, Temple University
Presenters - Marcus Bingenheimer, Temple University - Sebastian Nehrdich, University of California, Berkeley - Xiang Wei, Temple University
MITRA Search: Building Information Retrieval Systems for Classical Asian Languages in the Age of AI
March 2025
CEAL (Council on East Asian Libraries) Technology Forum, Columbus, Ohio
Recent advances in artificial intelligence and natural language processing have revolutionized information retrieval and question-answering systems. This talk introduces MITRA Search, a specialized search platform designed for exploring Buddhist literature preserved across Classical Asian languages including Chinese, Tibetan, Sanskrit, and Pāli. The system leverages multilingual approximate search capabilities to enable scholars to identify parallel passages and conduct comparative analyses across different writing systems and translations. We demonstrate how large language models integrated into the Dharmamitra project enhance user interaction with search results, facilitating dynamic exploration of these classical texts. This innovation addresses the long-standing challenge of cross-linguistic textual research in Buddhist studies and offers new possibilities for digital humanities scholarship.
MITRA Search: Exploring Buddhist Literature Preserved in Classical Asian Languages with Multilingual Approximate Search
December 2024
Ito International Research Center, Thanksgiving Hall, Tokyo, Japan
Part of the International Symposium "Buddhist Studies and Digital Humanities: 100 Years of the Taishō Tripiṭaka and 30 Years of SAT"
Session: Machine Translation and Buddhist Studies (15:30-16:30)
This talk will present MITRA Search, a system for exploring Buddhist literature preserved in classical Asian languages through multilingual approximate search capabilities. The presentation will demonstrate how this technology enables scholars to search across Buddhist texts in different languages and writing systems, facilitating comparative textual research and discovery of parallel passages.
Dharmamitra: Developing a Toolkit for Philological Work on Premodern Asian Low-Resource Languages
November 2024
Workshop: Case studies from current research projects - Conversations on Digital Scholarly Editing, Śivadharma Project Headquarters, Palazzo Giusso, L'Orientale University of Naples, Naples, Italy
This talk was presented as part of the workshop "Case studies from current research projects - Conversations on Digital Scholarly Editing" organized by Martina Dello Buono and Florinda De Simini at L'Orientale University of Naples.
Dharmamitra
November 2024
Online - via Zoom, Heidelberg, Germany
Sanskrit presents unique challenges for digital processing due to the language's rich morphological complexity and the absence of word boundaries in written texts. While recent advances in Natural Language Processing have revolutionized the study of modern languages and made applications such as machine translation and reliable search engines possible, Sanskrit so far is lagging behind in these developments. In this talk, I will present Dharmamitra's Sanskrit-specific capabilities, particularly our new language model that achieves state-of-the-art accuracy in fundamental Sanskrit processing tasks such as word segmentation, lemmatization, and morphological analysis. I will demonstrate how these technical advances translate into practical tools for Sanskrit scholars – from assisting in basic text analysis to enabling sophisticated corpus-wide semantic search and machine translation. The talk will showcase examples of how our system can provide detailed grammatical explanations, annotated translations, and facilitate textual research via semantic search even across language boundaries. These tools are designed to serve both beginning Sanskrit students and advanced scholars conducting specialized research. I will also demonstrate how Dharmamitra's capabilities can be used as building blocks for Sanskrit digitization and annotation projects.
MITRA: Beyond Just Machine Translation for Premodern Asian Low Resource Languages
October 2024
Johns Hopkins University, Center for Language and Speech Processing, Baltimore, MD, USA
Recent years saw the rise of multilingual language models that achieve high levels of performance for a large number of tasks, with some of them handling hundreds of languages at once. Premodern languages are usually underrepresented in such models, leading to poor performance in downstream applications. The Dharmamitra project aims to develop a diverse set of language models to address these shortcomings for the classical Asian low-resource languages Sanskrit, Tibetan, Classical Chinese, and Pali. These models provide solutions for low-level NLP tasks such as word segmentation and morpho-syntactic tagging, as well as high-level tasks including semantic search, machine translation, and general chatbot interaction. The talk will address the individual challenges and unique characteristics of the data involved, and the strategies deployed to address these. It will also demonstrate how these different tools can be combined in an application that goes beyond simple sentence-to-sentence machine translation, providing detailed grammatical explanations and corpus-wide search to support both early-stage language learners and experienced researchers with specific demands.
Dharmamitra Search: Leveraging Multilingual Language Models for Search and Detection of Textual Reuse across Diverse Text Collections
October 2024
AI and the Future of Buddhist Studies Conference, Numata Center for Buddhist Studies, UC Berkeley, Berkeley, CA, USA
MITRA: Developing Language Models for Machine Translation and Search in Buddhist Source Languages
August 2024
PNC 2024 Annual Conference and Joint Meetings, Seoul, Korea
Translation and search are among the fundamental problems when researching the textual source material of Buddhist traditions. MITRA has successfully developed machine translation models to ease the access to this material. When it comes to search, The Dharmamitra project approaches this problem by using semantic embeddings that enable search on related passages in different languages, regardless of whether the answer to the query is found in a text preserved in Pāli, Sanskrit, Tibetan, or Chinese. In addition to providing researchers with this powerful search system, Dharmamitra also provides a system for the automatic detection of similar text passages within the same language and across different languages. In my talk, I will demonstrate how these tools are designed and how researchers can access them and integrate them in their workflow.
Massive Multilingual Machine Translation and Search for Buddhist Languages: The Mitra Project
April 2024
National Taiwan University (NTU), Taipei, Taiwan
Dharmamitra: Enabling Massive Multilingual Machine Translation for Ancient Languages of the Buddhist Tradition
March 2024
National University of Singapore, Singapore
Machine Translation and LLM-Powered Grammatical Explanation for Sanskrit
February 2024
International Sanskrit Computational Linguistics Conference, Auroville, Puducherry, India (Online presentation)
MITRA: Developing Natural Language Processing Tools for the Languages of Buddhist Literature
June 2023
Hong Kong
Developing Machine Translation for ancient Buddhist texts in canonical languages
June 2023
Seoul, Korea
Creating a Shared Semantic Vector Space for Buddhist Languages
April 2023
Vienna, Vienna, Austria
Multilingual Semantic Mining for Text Alignment and Parallel Corpus Building for Buddhist Languages
January 2023
Universität Hamburg, international symposium 'Perspectives of Digital Humanities in the Field of Buddhist Studies', Hamburg, Germany