Abstract:
A new method of automatically extracting pseudo-syllable from a flow of speech is explored. The method can be applied to automatic language identification (ALI). The whole procedure includes three phases: feature extraction, modeling and identification test. In order to automatically extract pseudo-syllable from a flow of speech, a consonant segment is integrated with a vowel segment to form a pseudo-syllable which is called CV-syllable. Moreover, an algorithm of automatic CV-syllable extraction is proposed. By using the algorithm, a feature vector can be extracted from each CV-syllable. The Gaussian mixture model (GMM) and language model (LM) are employed to build a language identification system. Experiments on Mandarin and six minority languages prove that the proposed method can effectively identify languages, and that it is fast in the training process and robust against the noise.