Patent application number | Description | Published |
20080300857 | METHOD FOR ALIGNING SENTENCES AT THE WORD LEVEL ENFORCING SELECTIVE CONTIGUITY CONSTRAINTS - An alignment method includes, for a source sentence in a source language, identifying whether the sentence includes at least one candidate term comprising a contiguous subsequence of words of the source sentence. A target sentence in a target language is aligned with the source sentence. This includes developing a probabilistic model which models conditional probability distributions for alignments between words of the source sentence and words of the target sentence and generating an optimal alignment based on the probabilistic model, including, where the source sentence includes the at least one candidate term, enforcing a contiguity constraint which requires that all the words of the target sentence which are aligned with an identified candidate term form a contiguous subsequence of the target sentence. | 12-04-2008 |
20090175545 | METHOD FOR COMPUTING SIMILARITY BETWEEN TEXT SPANS USING FACTORED WORD SEQUENCE KERNELS - A computer implemented method and an apparatus for comparing spans of text are disclosed. The method includes computing a similarity measure between a first sequence of symbols representing a first text span and a second sequence of symbols representing a second text span as a function of the occurrences of optionally noncontiguous subsequences of symbols shared by the two sequences of symbols. Each of the symbols comprises at least one consecutive word and is defined according to a set of linguistic factors. Pairs of symbols in the first and second sequences that form a shared subsequence of symbols are each matched according to at least one of the factors. | 07-09-2009 |
20100145673 | CROSS LANGUAGE TOOL FOR QUESTION ANSWERING - A cross-language question answering system includes a server which hosts a plurality of community question answering (CQA) websites for different countries. The websites can generate a graphical user interface on an associated user terminal. A machine translation system translates a user question from a first language into a second language. The system may alert the user to similar questions posted on other CQA websites in other languages. The system may post the translated question on another CQA website and notify the user of answers to the translated question that are posted on the website by other users. The system may include memory which stores a plurality of archives, each including questions and answers posted on a corresponding one of the CQA websites. The system may allow a user to enter a query in the user's language and receive responses to the queries retrieved from the archives of other CQA websites. | 06-10-2010 |
20100268527 | BI-PHRASE FILTERING FOR STATISTICAL MACHINE TRANSLATION - A computer-implemented system and a method for pruning a library of bi-phrases, suitable for use in a machine translation system are provided. The method includes partitioning a bi-phrase library into a set of sub-libraries. The sub-libraries may be of different complexity such that, when pruning bi-phrases from the plurality of sub-libraries is based on a common noise threshold, a complexity of bi-phrases is taken into account in pruning the bi-phrases. | 10-21-2010 |
20110022380 | PHRASE-BASED STATISTICAL MACHINE TRANSLATION AS A GENERALIZED TRAVELING SALESMAN PROBLEM - Systems and methods are described that facilitate phrase-based statistical machine translation (SMT) incorporating bigram (or higher n-gram) language models by modeling bi-phrases as nodes in a graph. Additionally, construction of a translation is modeled as a “tour” amongst the nodes of the graph, such that a translation solution is generated by treating the graph as a generalized traveling salesman problem (GTSP) and solving for an optimal tour. The overall cost of a tour is computed by adding the costs associated with the edges traversed during the tour. Thus, the described systems and methods map the SMT problem directly into a GTSP problem, which itself can be directly converted into a TSP problem. | 01-27-2011 |
20110178791 | STATISTICAL MACHINE TRANSLATION SYSTEM AND METHOD FOR TRANSLATION OF TEXT INTO LANGUAGES WHICH PRODUCE CLOSED COMPOUND WORDS - A translation system and method for translating source text from a first language to target text in a second language are disclosed. A library of bi-phrases is accessed to retrieve bi-phrases which each match a part of the source text. Each of the bi-phrases includes respective text fragments from the first and second language. Words of some (or all) of the bi-phrases are tagged with restricted part of speech (RPOS) tags. At least one of the RPOS tags is configured for identifying a word from the second language as being one which also forms a part of a closed compound word in the library. At least one target hypothesis is generated from the bi-phrases, which includes text fragments in the second language. The target hypothesis or hypotheses are evaluated, based at least in part on combinations of the restricted part of speech tags. Based on the evaluation, one of the at least one target hypothesis is output as the optimal hypothesis for forming the translation. | 07-21-2011 |
20110282643 | STATISTICAL MACHINE TRANSLATION EMPLOYING EFFICIENT PARAMETER TRAINING - A statistical machine translation (SMT) system employs a conditional translation probability conditioned on the source language content. A model parameters optimization engine is configured to optimize values of parameters of the conditional translation probability using a translation pool comprising candidate aligned translations for source language sentences having reference translations. The model parameters optimization engine adds candidate aligned translations to the translation pool by sampling available candidate aligned translations in accordance with the conditional translation probability. | 11-17-2011 |
20110288852 | DYNAMIC BI-PHRASES FOR STATISTICAL MACHINE TRANSLATION - A system and a method for phrase-based translation are disclosed. The method includes receiving source language text to be translated into target language text. One or more dynamic bi-phrases are generated, based on the source text and the application of one or more rules, which may be based on user descriptions. A dynamic feature value is associated with each of the dynamic bi-phrases. For a sentence of the source text, static bi-phrases are retrieved from a bi-phrase table, each of the static bi-phrases being associated with one or more values of static features. Any of the dynamic bi-phrases which each cover at least one word of the source text are also retrieved, which together form a set of active bi-phrases. Translation hypotheses are generated using active bi-phrases from the set and scored with a translation scoring model which takes into account the static and dynamic feature values of the bi-phrases used in the respective hypothesis. A translation, based on the hypothesis scores, is then output. | 11-24-2011 |
20110307245 | WORD ALIGNMENT METHOD AND SYSTEM FOR IMPROVED VOCABULARY COVERAGE IN STATISTICAL MACHINE TRANSLATION - A system and method for generating word alignments from pairs of aligned text strings are provided. A corpus of text strings provides pairs of text strings, primarily sentences, in source and target languages. A first alignment between a text string pair creates links therebetween. Each link links a single token of the first text string to a single token of the second text string. A second alignment also creates links between the text string pair. In some cases, these links may correspond to bi-phrases. A modified first alignment is generated by selectively modifying links in the first alignment which include a word which is infrequent in the corpus, based on links generated in the second alignment. This results in removing at least some of the links for the infrequent words, allowing more compact and better quality bi-phrases, with higher vocabulary coverage, to be extracted for use in a machine translation system. | 12-15-2011 |
20120101804 | MACHINE TRANSLATION USING OVERLAPPING BIPHRASE ALIGNMENTS AND SAMPLING - A system and method for machine translation are disclosed. Source sentences are received. For each source sentence, a target sentence comprising target words is generated. A plurality of translation neighbors of the target sentence is generated. Phrase alignments are computed between the source sentence and the translation neighbor. Translation neighbors are scored with a translation scoring model, based on the phrase alignment. Translation neighbors are ranked, based on the scores. In training the model, parameters of the model are updated based on an external ranking of the ranked translation neighbors. The generating of translation neighbors, scoring, ranking, and, in the case of training, updating the parameters, are iterated with one of the translation neighbors as the target sentence. In the case of decoding, one of the translation neighbors is output as a translation. The system and method may be at least partially implemented with a computer processor. | 04-26-2012 |
20120278060 | METHOD AND SYSTEM FOR CONFIDENCE-WEIGHTED LEARNING OF FACTORED DISCRIMINATIVE LANGUAGE MODELS - A system and method for building a language model for a translation system are provided. The method includes providing a first relative ranking of first and second translations in a target language of a same source string in a source language, determining a second relative ranking of the first and second translations using weights of a language model, the language model including a weight for each of a set of n-gram features, and comparing the first and second relative rankings to determine whether they are in agreement. The method further includes, when the rankings are not in agreement, updating one or more of the weights in the language model as a function of a measure of confidence in the weight, the confidence being a function of previous observations of the n-gram feature in the method. | 11-01-2012 |
20130030787 | SYSTEM AND METHOD FOR PRODUCTIVE GENERATION OF COMPOUND WORDS IN STATISTICAL MACHINE TRANSLATION - A method and a system for making merging decisions for a translation are disclosed which are suited to use where the target language is a productive compounding one. The method includes outputting decisions on merging of pairs of words in a translated text string with a merging system. The merging system can include a set of stored heuristics and/or a merging model. In the case of heuristics, these can include a heuristic by which two consecutive words in the string are considered for merging if the first word of the two consecutive words is recognized as a compound modifier and their observed frequency f | 01-31-2013 |
20130042108 | PRIVATE ACCESS TO HASH TABLES - A server and a client mutually exclusively execute server-side and client-side commutative cryptographic processes and server-side and client-side commutative permutation processes. The server has access to a hash table, while the client does not. The server and client perform a method including: encrypting and reordering the hash table using the server; communicating the encrypted and reordered hash table to the client; further encrypting and further reordering the hash table using the client; communicating the further encrypted and further reordered hash table back to the server; and partially decrypting and partially undoing the reordering using the server to generate a double-blind hash table. To read an entry, the client hashes and permute an index key and communicates same to the server which retrieves an item from the double-blind hash table using the hashed and permuted index key and sends it back to the client which decrypts the retrieved item. | 02-14-2013 |
20130301920 | METHOD FOR PROCESSING OPTICAL CHARACTER RECOGNIZER OUTPUT - A method, a system, and a computer program product for processing the output of an OCR are disclosed. The system receives a first character sequence from the OCR. A first set of characters from the first character sequence are converted to a corresponding second set of characters to generate a second character sequence based on a look-up table and language scores. | 11-14-2013 |
20140056428 | METHODS AND SYSTEMS FOR SECURELY ACCESSING TRANSLATION RESOURCE MANAGER - The disclosed embodiments relate to systems and methods for securely accessing a phrase table. One or more records in the phrase table are encrypted using a first set of keys. The first set of keys is encrypted using a second key. A decoder module is compiled based on the second key. Thereafter, the one or more encrypted records and/or the decoder module are transmitted to the first computing device at the client side. The first set of encrypted keys is transmitted to a second computing device. The first computing device transmits a request to the first computing device to send an encrypted key. The decoder module decrypts the encrypted key to generate a key. The first computing device uses the key to decrypt one or more encrypted records. | 02-27-2014 |
20140058718 | CROWDSOURCING TRANSLATION SERVICES - A method, system, and computer program product for translating a text file are disclosed. A text file in a source language is received and text snippets from the text file are extracted. The text snippets are distributed to a first set of remote workers for translation. The translated text snippets are validated by a second set of remote workers and the validated text snippets are used to generate a translated text file. | 02-27-2014 |
20140058879 | ONLINE MARKETPLACE FOR TRANSLATION SERVICES - A method, system, and computer program product for implementing an online marketplace for translation services is disclosed. A plurality of requirements is received from a client and are sent to one or more service providers. Further, service quotations from the one or more service providers are received. Based on the plurality of requirements and the service quotations, an estimate of quality of service is generated. Lastly, the service quotations and the estimate of quality of service are sent to the client. | 02-27-2014 |
20140156565 | METHODS AND SYSTEMS FOR PREDICTING LEARNING CURVE FOR STATISTICAL MACHINE TRANSLATION SYSTEM - The disclosed embodiments relate to a system and method for predicting the learning curve of an SMT system. A set of anchor points are selected. The set of anchor points correspond to a size of a corpus. Thereafter, a gold curve or a benchmark curve is fitted based on the set of anchor points to determine the BLEU score. Based on the BLEU score and a set of parameters associated with the first set of anchor points, a confidence score is computed. | 06-05-2014 |
20140200878 | MULTI-DOMAIN MACHINE TRANSLATION MODEL ADAPTATION - A method adapted to multiple corpora includes training a statistical machine translation model which outputs a score for a candidate translation, in a target language, of a text string in a source language. The training includes learning a weight for each of a set of lexical coverage features that are aggregated in the statistical machine translation model. The lexical coverage features include a lexical coverage feature for each of a plurality of parallel corpora. Each of the lexical coverage features represents a relative number of words of the text string for which the respective parallel corpus contributed a biphrase to the candidate translation. The method may also include learning a weight for each of a plurality of language model features, the language model features comprising one language model feature for each of the domains. | 07-17-2014 |