Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Browse all Theses and Dissertations

2013

Computer Science

Articles 1 - 1 of 1

Full-Text Articles in Computer Engineering

A Large Scale Distributed Syntactic, Semantic And Lexical Language Model For Machine Translation, Ming Tan Jan 2013

A Large Scale Distributed Syntactic, Semantic And Lexical Language Model For Machine Translation, Ming Tan

Browse all Theses and Dissertations

The n-gram model is the most widely used language model (LM) in statistical machine translation system, due to its simplicity and scalability. However, it only encodes the local lexical relation between adjacent words and clearly ignores the rich syntactic and semantic structures of the natural languages. Attempting to increase the order of an n-gram to describe longer range dependencies in natural language immediately runs into the curse of dimensionality. Although previous researches tried to increase the order of n-gram on a large corpus, they did not see obvious improvement beyond 6-gram. Meanwhile, other LMs, such as syntactic language models and …