Tibetan Studies and Digital Tibetan by Kurt Keutzer
Bookmarks Related papers MentionsView impact
ACM Transactions on Asian and Low-Resource Language Information Processing, 2021
Over the past decade, through a mixture of optical character recognition and manual input, there ... more Over the past decade, through a mixture of optical character recognition and manual input, there is now a growing corpus of Tibetan literature available as e-texts in Unicode format. With the creation of such a corpus, the techniques of text analytics that have been applied in the analysis of English and other modern languages may now be applied to Tibetan. In this work, we narrow our focus to examine a modest portion of that literature, the Mind-section portion of the literature of the Tibetan tradition of the Great Perfection. Here, we will use the lens of text analytics tools based on machine learning techniques to investigate a number of questions of interest to scholars of this and related traditions of the Great Perfection. It has been necessary for us to participate in all portions of this process: corpora identification and text edition selection, rendering the text as e-texts in Unicode using both Optical Character Recognition and manual entry, data cleaning and transformation, implementation of software for text analysis, and interpretation of results. For this reason, we hope this study can serve as a model for other low-resource languages that are just beginning to approach the problem of providing text analytics for their language.
Bookmarks Related papers MentionsView impact
Revue d’Etudes Tibétaines, 2012
In sGa ston's list of the Southern Treasures discovered by gShen chen Klu dga' a series of texts ... more In sGa ston's list of the Southern Treasures discovered by gShen chen Klu dga' a series of texts referred to as the Facets of Mind, Nine Minor Texts on Mind are mentioned. The Bon tradition has acknowledged from that time to the present day that these are seminal texts in the literature of Bon. Furthermore, these texts would eventually be classified as the exemplary works of the Mind Section of Bon Dzogchen. Nevertheless, the precise content of these texts has been unclear to modern scholars, both Tibetan and Western, working outside of Tibet. With the publication in 1999 of Mongyal Lhase's Edition of the Bon Kangyur, as well as with other subsequent publications, we are now in a better position to identify and understand these works. The aim of this paper is to clearly identify the titles of these texts, to identify the various editions in which they are available, and to begin to understand how they work together with tantric elements to form a holistic system of training.
Bookmarks Related papers MentionsView impact
The use of advanced computational methods for the analysis of large corpora of electronic texts i... more The use of advanced computational methods for the analysis of large corpora of electronic texts is becoming increasingly popular in humanities and social science research. Unfortunately, Tibetan Studies has lacked such a repository of electronic, searchable texts. The automated recognition of printed texts, known as Optical Character Recognition (OCR), offers a solution to this problem; however, until recently, robust OCR systems for the Tibetan language have not been available. In this paper, we introduce one new system, called Namsel, which uses Optical Character Recognition (OCR) to support the production, review, and distribution of searchable Tibetan texts at a large scale. Namsel tackles a number of challenges unique to the recognition of complex scripts such as Tibean uchen and has been able to achieve high accuracy rates on a wide range of machine-printed works. In this paper, we discuss the details of Tibetan OCR, how Namsel works, and the problems it is able to solve. We also discuss the collaborative work between Namsel and its partner libraries aimed at building a comprehensive database of historical and modern Tibetan works—a database that consists of more than one million pages of texts spanning over a thousand years of literary production.
Bookmarks Related papers MentionsView impact
Proceedings of the 2011 Joint Workshop on Multilingual OCR and Analytics for Noisy Unstructured Text Data - MOCR_AND '11, 2011
ABSTRACT Recognition of Tibetan wood block print is a difficult problem that has many challenging... more ABSTRACT Recognition of Tibetan wood block print is a difficult problem that has many challenging steps. We propose a two stage framework involving image preprocessing, which consists of noise removal and baseline detection, and simultaneous character segmentation and recognition by the aid of a generalized hidden Markov model (also known as gHMM). For the latter stage, we train a gHMM and run the generalized Viterbi algorithm on our image to decode observations. There are two major motivations for using gHMM. First, it incorporates a language model into our recognition system which in turn enforces grammar and disambiguates classification errors caused by printing errors and image noise. Second, gHMM solves the segmentation challenge. Simply put gHMM is an HMM where the emission model allows multiple consecutive observations to be mapped to the same state. For features of our emission model we apply line and circle Hough transform to stroke detection, and use classspecific scaling for feature weighing. With gHMM, we find KMQDF to be the most effective distance metric for discriminating character classes. The accuracy of our system is 91.29%.
Bookmarks Related papers MentionsView impact
Deep Learning/Deep Neural Nets by Kurt Keutzer
2011 IEEE International Parallel & Distributed Processing Symposium, 2011
Bookmarks Related papers MentionsView impact
2013 IEEE International Conference on Image Processing, 2013
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many N... more Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks. However, their memory footprint, inference latency, and power consumption are prohibitive for many edge processors, and it has been a challenge to deploy these models for edge applications and devices that have resource constraints. While quantization can be a viable solution to this, previous work on quantizing Transformer based models uses floating-point arithmetic during inference, thus limiting model deployment on many edge processors. In this work, we propose a novel integer-only quantization scheme for Transformer based models that quantizes the entire inference process. In particular, we demonstrate how to approximate nonlinear operations in Transformer architectures, e.g., GELU, Softmax, and Layer Normalization, with lightweight integer computations. We use those approximations in our method, I-BERT, with an end-to-end integer-only inference, and...
Bookmarks Related papers MentionsView impact
Efficient Machine Learning by Kurt Keutzer
... Theory: L1-SPIRiT reconstruction requires solving a non-linear constrained optimization probl... more ... Theory: L1-SPIRiT reconstruction requires solving a non-linear constrained optimization problem: minimize ... Using our OpenMP calibration and our Cuda POCS solver results in 97-second ... Oursolvers are scalable to larger image sizes, more channels, and to larger processing ...
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Bookmarks Related papers MentionsView impact
Journal of Logic and Computation, 2014
Bookmarks Related papers MentionsView impact
IEEE Transactions on Medical Imaging, 2000
Bookmarks Related papers MentionsView impact
IEEE Signal Processing Magazine, 2000
Bookmarks Related papers MentionsView impact
2011 IEEE Workshop on Applications of Computer Vision (WACV), 2011
... In Work-shop on Generative-Model Based Vision, CVPR, 2004. 3 [10] P. Felzenszwalb, D. McAlles... more ... In Work-shop on Generative-Model Based Vision, CVPR, 2004. 3 [10] P. Felzenszwalb, D. McAllester, and D. Ramanan. A dis-criminatively trained, multiscale, deformable part model. In CVPR, 2008. 1, 2 [11] V. Ferrari, M. Marin-Jimenez, and A. Zisserman. ...
Bookmarks Related papers MentionsView impact
2009 IEEE 12th International Conference on Computer Vision, 2009
Abstract Image contour detection is fundamental to many image analysis applications, including im... more Abstract Image contour detection is fundamental to many image analysis applications, including image segmentation, object recognition and classification. However, highly accurate image contour detection algorithms are also very computationally intensive, ...
Bookmarks Related papers MentionsView impact
Uploads
Tibetan Studies and Digital Tibetan by Kurt Keutzer
Deep Learning/Deep Neural Nets by Kurt Keutzer
Efficient Machine Learning by Kurt Keutzer
ing and 2.300 manually curated and corrected bitext pairs for the evaluation of machine translation models for this language. We also train a number of sequence-to-sequence models and compare their translation performance against commercial models. We also provide a limited case studies where we examine the performance of different machine translation models on a selection of Buddhist Chinese passages.