Patent application number | Description | Published |
20130301920 | METHOD FOR PROCESSING OPTICAL CHARACTER RECOGNIZER OUTPUT - A method, a system, and a computer program product for processing the output of an OCR are disclosed. The system receives a first character sequence from the OCR. A first set of characters from the first character sequence are converted to a corresponding second set of characters to generate a second character sequence based on a look-up table and language scores. | 11-14-2013 |
20140156565 | METHODS AND SYSTEMS FOR PREDICTING LEARNING CURVE FOR STATISTICAL MACHINE TRANSLATION SYSTEM - The disclosed embodiments relate to a system and method for predicting the learning curve of an SMT system. A set of anchor points are selected. The set of anchor points correspond to a size of a corpus. Thereafter, a gold curve or a benchmark curve is fitted based on the set of anchor points to determine the BLEU score. Based on the BLEU score and a set of parameters associated with the first set of anchor points, a confidence score is computed. | 06-05-2014 |
20140207439 | MACHINE TRANSLATION-DRIVEN AUTHORING SYSTEM AND METHOD - An authoring method includes generating an authoring interface configured for assisting a user to author a text string in a source language for translation to a target string in a target language. Initial source text entered by the user is received through the authoring interface. Source phrases are selected that each include at least one token of the initial source text as a prefix and at least one other token as a suffix. The source phrase selection is based on a translatability score and optionally on fluency and semantic relatedness scores. A set of candidate phrases is proposed for display on the authoring interface, each of the candidate phases being the suffix of a respective one of the selected source phrases. The user may select one of the candidate phrases, which is appended to the source text following its corresponding prefix, or may enter alternative text. The process may be repeated until the user is satisfied with the source text and the SMT model can then be used for its translation. | 07-24-2014 |
20140214397 | SAMPLING AND OPTIMIZATION IN PHRASE-BASED MACHINE TRANSLATION USING AN ENRICHED LANGUAGE MODEL REPRESENTATION - Rejection sampling is performed to acquire at least one target language translation for a source language string s in accordance with a phrase-based statistical translation model p(x)=p(t, a|s) where t is a candidate translation, a is a candidate alignment comprising a biphrase sequence generating the candidate translation t, and x is a sequence representing the candidate alignment a. The rejection sampling uses a proposal distribution comprising a weighted finite state automaton (WFSA) q | 07-31-2014 |
20140358519 | Confidence-driven rewriting of source texts for improved translation - A method for rewriting source text includes receiving source text including a source text string in a first natural language. The source text string is translated with a machine translation system to generate a first target text string in a second natural language. A translation confidence for the source text string is computed, based on the first target text string. At least one alternative text string is generated, where possible, in the first natural language by automatically rewriting the source string. Each alternative string is translated to generate a second target text string in the second natural language. A translation confidence is computed for the alternative text string based on the second target string. Based on the computed translation confidences, one of the alternative text strings may be selected as a candidate replacement for the source text string and may be proposed to a user on a graphical user interface. | 12-04-2014 |
20150293908 | ESTIMATION OF PARAMETERS FOR MACHINE TRANSLATION WITHOUT IN-DOMAIN PARALLEL DATA - A system and method for estimating parameters for features of a translation scoring function for scoring candidate translations in a target domain are provided. Given a source language corpus for a target domain, a similarity measure is computed between the source corpus and a target domain multi-model, which may be a phrase table derived from phrase tables of comparative domains, weighted as a function of similarity with the source corpus. The parameters of the log-linear function for these comparative domains are known. A mapping function is learned between similarity measure and parameters of the scoring function for the comparative domains. Given the mapping function and the target corpus similarity measure, the parameters of the translation scoring function for the target domain are estimated. For parameters where a mapping function with a threshold correlation is not found, another method for obtaining the target domain parameter can be used. | 10-15-2015 |
20150293910 | RETRIEVAL OF DOMAIN RELEVANT PHRASE TABLES - A method for generating a phrase table for a target domain includes receiving a source corpus for a target domain and, for each of a set of comparative domain phrase tables, computing a measure of similarity between the source corpus and the comparative domain phrase table. Based on the computed similarity measures, a subset of the comparative domain phrase tables may be identified from the set of comparative domain phrase tables, and/or weights for combining them, and a phrase table is generated for the target domain based on the at least a subset of phrase tables. | 10-15-2015 |
20150347397 | METHODS AND SYSTEMS FOR ENRICHING STATISTICAL MACHINE TRANSLATION MODELS - Methods and systems for enriching translation models. The first strength metric associated with a phrase in a first translation model is determined. The second strength metric associated with the phrase is received from at least one second translation model. The first translation model is enriched based on one or more translations of the phrase received from the at least one second translation model. The one or more translations are received based on a comparison between the first strength metric and the second strength metric. | 12-03-2015 |