Dharmamitra: Open Tools for Translation and Digital Philology of Ancient Asian Languages
Accelerating research on Classical Asian languages with modern deep‑learning methods.

About Dharmamitra
Dharmamitra is a meta‑platform that bundles state‑of‑the‑art NLP, OCR, information‑retrieval, and intertextuality exploration components for anybody working with the Ancient Asian languages Sanskrit, Pāli, Classical Chinese, and Tibetan. All code in this organisation is released under permissive licenses, and we provide large datasets in either public‑domain or under Creative Commons licensing.
News
- August 2025: We will present "MITA: New Research Tools for a Paradigm Shift in the Philological Study of Buddhist Texts Based on Machine Translation Technology" at the IABS conference in Leipzig. Please join our panel with Marcus Bingenheimer on Tuesday, August 12!
- June 2025: We presented "Is training deep neural embeddings worth the effort? A preliminary investigation of different representation methods for semantic similarity tasks in Buddhist Chinese and related languages of the Buddhist tradition" at the "Navigating Indra’s Net: Digital Approaches to Text Reuse-based Inter-textuality in Pre-Modern East Asian Texts" online workshop at the Hanmun Lab, Ruhr-Universität Bochum.
- June 2025: We presented "From Sthiramati to Dharmamitra: Developing Digital Tools for a New Age of Philological Buddhist Studies" at the DH International Workshop at Keio University, Tokyo.
- March 2025: We conducted a hands-on workshop on "Machine Translation for Asian Studies" at the Annual Conference of the Association of Asian Studies in Columbus, Ohio.
- March 2025: We presented "MITRA Search: Building Information Retrieval Systems for Classical Asian Languages in the Age of AI" at the CEAL Technology Forum in Columbus, Ohio.
- 2025: Our paper "MITRA‑zh‑eval: Using a Buddhist Chinese Language Evaluation Dataset to Assess Machine Translation and Evaluation Metrics" has been published in the Proc. 5th Intl. Conf. on NLP for Digital Humanities (details).
- December 2024: We presented "MITRA Search: Exploring Buddhist Literature Preserved in Classical Asian Languages with Multilingual Approximate Search" at the International Symposium "Buddhist Studies and Digital Humanities" in Tokyo, Japan.
- November 2024: We gave a presentation on "Dharmamitra" online for an audience in Heidelberg, Germany (recording).
- November 2024: We presented "Dharmamitra: Developing a Toolkit for Philological Work on Premodern Asian Low-Resource Languages" at a workshop at L'Orientale University of Naples, Italy.
- October 2024: We presented "MITRA: Beyond Just Machine Translation for Premodern Asian Low Resource Languages" at Johns Hopkins University, Baltimore, MD.
- October 2024: We presented "Dharmamitra Search: Leveraging Multilingual Language Models for Search and Detection of Textual Reuse across Diverse Text Collections" at the AI and the Future of Buddhist Studies Conference at UC Berkeley.
- October 2024: Our paper "One Model is All You Need: ByT5-Sanskrit, a Unified Model for Sanskrit NLP Tasks" has been published in the Findings of the Association for Computational Linguistics: EMNLP 2024 (details).
- August 2024: We presented "MITRA: Developing Language Models for Machine Translation and Search in Buddhist Source Languages" at the PNC 2024 Annual Conference in Seoul, Korea.
- 2024: Our paper "Breakthroughs in Tibetan NLP & Digital Humanities" has been published in the Revue d’Études Tibétaines (details).
- April 2024: We presented "Massive Multilingual Machine Translation and Search for Buddhist Languages: The Mitra Project" at National Taiwan University (NTU), Taipei, Taiwan.
- March 2024: We presented "Dharmamitra: Enabling Massive Multilingual Machine Translation for Ancient Languages of the Buddhist Tradition" at the National University of Singapore.
- February 2024: We gave an online presentation on "Machine Translation and LLM-Powered Grammatical Explanation for Sanskrit" at the International Sanskrit Computational Linguistics Conference in Auroville, India.
- 2023: Our paper "Observations on the Intertextuality of Selected Abhidharma Texts Preserved in Chinese Translation" has been published in the journal Religions (details).
- 2023: Our paper "MITRA‑zh: An efficient, open machine translation solution for Buddhist Chinese" has been published in the Proceedings of the Joint 3rd Intl. Conf. on NLP for Digital Humanities & 8th IWCLUL (details).
- June 2023: We presented "MITRA: Developing Natural Language Processing Tools for the Languages of Buddhist Literature" in Hong Kong.
- June 2023: We presented "Developing Machine Translation for ancient Buddhist texts in canonical languages" in Seoul, Korea.
- April 2023: We presented "Creating a Shared Semantic Vector Space for Buddhist Languages" in Vienna, Austria.