Patent application number | Description | Published |
20120089898 | Identifying Language Translations For Source Documents using Links - Technology is described for identifying language translations for source documents. The method includes finding source documents containing links to target documents and the link anchors of the links have language indicating text. A first tuple set can be generated for paired source documents and target documents with an expected target language for a target document. The first tuple set can be annotated with primary languages for the source documents and target documents to form a second tuple set where primary languages of the source documents and target documents are different. Further, a third tuple set can be generated using the second tuple set using a count of the number of times source documents and target documents occur in the first tuple set. Tuples can be removed from the third tuple set where a count ratio between source document count and target document count is less than a reference ratio. | 04-12-2012 |
20120203540 | LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS - The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model. | 08-09-2012 |
20130103695 | MACHINE TRANSLATION DETECTION IN WEB-SCRAPED PARALLEL CORPORA - Various technologies described herein pertain to detecting machine translated content. Documents in a document pair are mutual lingual translations of each other. Further, document level features of the documents in the document pair can be identified. The document level features can correlate with translation quality between the documents in the document pair. Moreover, statistical classification can be used to detect whether the document pair is generated through machine translation based at least in part upon the document level features. Further, a first document can be a machine translation of a second document in the document pair or a disparate document when generated through machine translation. | 04-25-2013 |
20140067365 | LANGUAGE SEGMENTATION OF MULTILINGUAL TEXTS - The claimed subject matter provides a system and/or method for segmenting a multi-language text. An exemplary method comprises determining an initial probability distribution for sentences in the multi-language text, the initial probability distribution indicating the likelihood of each sentence being in each of a set of languages. A probability of language transitions across sentences may be learned based on the initial probability distribution. Additionally, a highest probability language sequence of sentences in the multi-language text may be determined based on a combination of the probability of language transitions and the prior probability distribution provided by an initial model. Further, web documents are annotated at a sentence by sentence level such that each sentence of a web document is labeled in a given language according to the highest probability language determined. | 03-06-2014 |
20140172407 | LANGUAGE PROCESSING RESOURCES FOR AUTOMATED MOBILE LANGUAGE TRANSLATION - Automated language translation often involves language translation resources of significant size (e.g., 50-gigabyte phrase tables) and significant computational power exceeding the capabilities of many mobile devices. Remotely accessible servers capable of near-realtime, automated translation may be inaccessible or prohibitively costly while traveling abroad. Presented herein are adaptations of language translation techniques for offline mobile devices involving reducing the size and raising the efficiency of the language modeling resources. A word index may be provided that stores respective string representations of the words of a language, and maps respective words to a location (e.g., address or offset) of respective word representations within the word index. Language translation resources (e.g., phrase tables) may then specify logical relationships using the word index addresses of the involved words, rather than the string equivalents. This technique significantly condenses the language resources and provides faster, bidirectional access to the word representations of the language. | 06-19-2014 |
20140173200 | NON-BLOCKING CACHING TECHNIQUE - The described implementations relate to processing of electronic data. One implementation is manifested as a system that can include a cache module and at least one processing device configured to execute the cache module. The cache module can be configured to store data items in slots of a cache structure, receive a request for an individual data item that maps to an individual slot of the cache structure, and, when the individual slot of the cache structure is not available, return without further processing the request. For example, the request can be received from a calling application or thread that can proceed without blocking irrespective of whether the request is fulfilled by the cache module. | 06-19-2014 |
20150347399 | In-Call Translation - Call audio of a call between a source user speaking a source language and a target user speaking a target language is received from a remote source user device of a source user via a communication network of a communication system, the call audio comprising speech of the source user in the source language. An automatic speech recognition procedure is performed on the call audio. A translation of the source user's speech is generated in the target language using the results of the speech recognition procedure. A translated synthetic speech audio version of the source user's speech is mixed with the source user's call audio and/or with translated audio of the target user's speech in the source language. The mixed audio signal is transmitted to a remote target user device of the target user via the communication network for outputting to at least the target user during the call. | 12-03-2015 |
20150350451 | In-Call Translation - The disclosure pertains to a communication system for effecting a voice or video call between at least a source user speaking a source language and a target user speaking a target language. A translation procedure is performed on call audio of the call to generate an audio translation of the source user's speech in the target language for outputting to the target user. A notification is outputted to the target user to notify the target user of a change in the behaviour of the translation procedure, the change relating to the generation of the translation. | 12-03-2015 |