원문정보
초록
영어
In this paper, various kinds of sub-word lexica are thoroughly investigated under the framework of Uyghur LVCSR system. Experimental results show that it is inefficient to directly model based on word units or small units like morpheme or even syllable units. It is observed that an optimal sub-word unit set between word and morpheme units can better fit for ASR system. In order to select best unit set we have investigated several effective unit segmentation, concatenation approaches, and their ASR performances. For segmentation approach, we investigate a supervised segmentation which split words into the smallest functional units - the linguistic morphemes, and an unsupervised segmentation which extract pseudo-morphemes (or statistical morphemes). In supervised model, a leaning algorithm is trained on a manually prepared training corpus, and morpho-phonetics changes are analyzed. In the unsupervised model, the Morfessor tool is used to extract pseudo-morphemes from a raw text corpus. For concatenation approach, several effective concatenation approaches are investigated based on linguistic morphemes. First is the data-driven approach which concatenates morpheme sequences based on certain measures like co-occurrence frequency or mutual probability. Second is a model based approach which merges units with global statistical criteria. In this study, the Morfessor program is revised and turned into concatenation program by controlling segmentation points. Third is the two-layer-lexica based concatenation approach which extracts an optimal sub-word unit set by aligning and comparing the ASR results of word and morpheme two lexical layers. This method utilizes both speech and text, and produced the best results in terms of WER and lexicon size, and proved to be very stable. The best optimal lexicon, which is obtained totally on the basis of HMM based acoustic model, outperformed all other baseline lexica. And when all these lexica are directly incorporated with a deep neural network (DNN) based acoustic model, without changing the speech and text training corpora and language models, the optimal lexicon not only drastically improved the ASR accuracy but also outperformed other units as a proof of the generality of the two-layer-lexica based approach.
목차
1. Introduction
2. Morpheme Segmentation Approaches
2.1. Supervised Morpheme Segmentation
2.2. Unsupervised Morpheme Segmentation
3. Morpheme Concatenation Approaches
3.1. Data-driven morpheme concatenation approaches
3.2. A statistical model based morpheme concatenation approach
3.3. Two-layer-lexica based morpheme concatenation approaches
4. ASR results for segmented and concatenated lexica
4.1. Acoustic model construction
4.2. Lexical model construction
4.3. ASR results on segmented lexica
4.4. ASR results on concatenated lexica
4.5. DNN based ASR results
5. Conclusions
Acknowledgements
References