Tao Mei, Beijing CN

Patent application number	Description	Published
20090003712	Video Collage Presentation - A method, a computer-readable storage media, and a user interface describe techniques for creating a video collage synthesized from video content, selecting representative images from the video content, extracting and resizing regions of interest (ROI) from the representative images from the video content, and arranging the regions of interest on a canvas without seams while preserving a temporal structure of the video content. The described method, computer-readable storage, and user interface enhance the experience of the user in browsing a video collage that is compact.	01-01-2009
20090006368	Automatic Video Recommendation - Automatic video recommendation is described. The recommendation does not require an existing user profile. The source videos are directly compared to a user selected video to determine relevance, which is then used as a basis for video recommendation. The comparison is performed with respect to a weighted feature set including at least one content-based feature, such as a visual feature, an aural feature and a content-derived textural feature. Multimodal implementation including multimodal features (e.g., visual, aural and textural) extracted from the videos is used for more reliable relevance ranking. One embodiment uses an indirect textural feature generated by automatic text categorization based on a set of predefined category hierarchy. Another embodiment uses self-learning based on user click-through history to improve relevance ranking.	01-01-2009
20090076882	MULTI-MODAL RELEVANCY MATCHING - This document describes techniques capable of associating relevant entities, such as advertisements, with insertion points within a media file. These techniques calculate a global relevancy between entities and the media file. These techniques may also calculate a local relevancy between the entities and one or more insertion points within the media file. Both global and local relevancies may employ textual and non-textual information. With use of the calculated global and local relevancies, the techniques associate one or more entities with each of the one or more insertion points in the media file. These techniques thus enable, for each insertion point, associating a most relevant entity for a particular insertion point with the insertion point. Therefore, when a user consumes the media file the user may also consume a most relevant entity at and for each insertion point in the media file.	03-19-2009
20090079871	ADVERTISEMENT INSERTION POINTS DETECTION FOR ONLINE VIDEO ADVERTISING - Systems and methods for determining insertion points in a first video stream are described. The insertions points being configured for inserting at least one second video into the first video. In accordance with one embodiment, a method for determining the insertion points includes parsing the first video into a plurality of shots. The plurality of shots includes one or more shot boundaries. The method then determines one or more insertion points by balancing a discontinuity metric and an attractiveness metric of each shot boundary.	03-26-2009
20090171787	Impressionative Multimedia Advertising - A method for making online adverisement makes an impressionative presentation of an advertisement to a viewer. The impressionative presentation is an impressionized version of an original online source medium such as a photo. The method associates advertisements with the source medium based, at least in part, on calculated ad relevance, and determines one or more viewer iteractive points on the original source medium. The method then presents to the viewer an ad-augmented medium including an impressionized version of the source medium, which has the ability to change the form of impression to a viewer in response to an interactive act conducted by the viewer. The ad-augmented medium may include the associated advertisement content or direct the viewer's attention thereto.	07-02-2009
20090274434	VIDEO CONCEPT DETECTION USING MULTI-LAYER MULTI-INSTANCE LEARNING - Visual concepts contained within a video clip are classified based upon a set of target concepts. The clip is segmented into shots and a multi-layer multi-instance (MLMI) structured metadata representation of each shot is constructed. A set of pre-generated trained models of the target concepts is validated using a set of training shots. An MLMI kernel is recursively generated which models the MLMI structured metadata representation of each shot by comparing prescribed pairs of shots. The MLMI kernel is subsequently utilized to generate a learned objective decision function which learns a classifier for determining if a particular shot (that is not in the set of training shots) contains instances of the target concepts. A regularization framework can also be utilized in conjunction with the MLMI kernel to generate modified learned objective decision functions. The regularization framework introduces explicit constraints which serve to maximize the precision of the classifier.	11-05-2009
20090290802	CONCURRENT MULTIPLE-INSTANCE LEARNING FOR IMAGE CATEGORIZATION - The concurrent multiple instance learning technique described encodes the inter-dependency between instances (e.g. regions in an image) in order to predict a label for a future instance, and, if desired the label for an image determined from the label of these instances. The technique, in one embodiment, uses a concurrent tensor to model the semantic linkage between instances in a set of images. Based on the concurrent tensor, rank-1 supersymmetric non-negative tensor factorization (SNTF) can be applied to estimate the probability of each instance being relevant to a target category. In one embodiment, the technique formulates the label prediction processes in a regularization framework, which avoids overfitting, and significantly improves a learning machine's generalization capability, similar to that in SVMs. The technique, in one embodiment, uses Reproducing Kernel Hilbert Space (RKHS) to extend predicted labels to the whole feature space based on the generalized representer theorem.	11-26-2009
20090310854	Multi-Label Multi-Instance Learning for Image Classification - Described is a technology by which an image is classified (e.g., grouped and/or labeled), based on multi-label multi-instance data learning-based classification according to semantic labels and regions. An image is processed in an integrated framework into multi-label multi-instance data, including region and image labels. The framework determines local association data based on each region of an image. Other multi-label multi-instance data is based on relationships between region labels of the image, relationships between image labels of the image, and relationships between the region and image labels. These data are combined to classify the image. Training is also described.	12-17-2009
20090313294	AUTOMATIC IMAGE ANNOTATION USING SEMANTIC DISTANCE LEARNING - Images are automatically annotated using semantic distance learning. Training images are manually annotated and partitioned into semantic clusters. Semantic distance functions (SDFs) are learned for the clusters. The SDF for each cluster is used to compute semantic distance scores between a new image and each image in the cluster. The scores for each cluster are used to generate a ranking list which ranks each image in the cluster according to its semantic distance from the new image. An association probability is estimated for each cluster which specifies the probability of the new image being semantically associated with the cluster. Cluster-specific probabilistic annotations for the new image are generated from the manual annotations for the images in each cluster. The association probabilities and cluster-specific probabilistic annotations for all the clusters are used to generate final annotations for the new image.	12-17-2009
20090319883	Automatic Video Annotation through Search and Mining - Described is a technology in which a new video is automatically annotated based on terms mined from the text associated with similar videos. In a search phase, searching by one or more various search modalities (e.g., text, concept and/or video) finds a set of videos that are similar to a new video. Text associated with the new video and with the set of videos is obtained, such as by automatic speech recognition that generates transcripts. A mining mechanism combines the associated text of the similar videos with that of the new video to find the terms that annotate the new video. For example, the mining mechanism creates a new term frequency vector by combining term frequency vectors for the set of similar videos with a term frequency vector for the new video, and provides the mined terms by fitting a zipf curve to the new term frequency vector.	12-24-2009
20100149419	MULTI-VIDEO SYNTHESIS - Embodiments that provide multi-video synthesis are disclosed. In accordance with one embodiment, multi-video synthesis includes breaking a main video into a plurality of main frames and break a supplementary video into a plurality of supplementary frames. The multi-video synthesis also includes assigning one or more supplementary frames into each of a plurality of states of a Hidden Markov Model (HMM), where each of the plurality of states corresponding to one or more main frames. The multi-video synthesis further includes determining optimal frames in the plurality of main frames for insertion of the plurality of supplementary frames based on the plurality of states and visual properties. The optimal frames include optimal insertion positions. The multi-video synthesis additionally includes inserting the plurality of supplementary frames into the optimal insertion positions to form a synthesized video.	06-17-2010
20100153219	IN-TEXT EMBEDDED ADVERTISING - Computer program products, devices, and methods for generating in-text embedded advertising are described. Embedded advertising is “hidden” or embedded into a message by matching an advertisement to the message and identifying a place in the message to insert the advertisement. For textual messages, statistical analysis of individual sentences is performed to determine where it would be most natural to insert an advertisement. Statistical rules of grammar derived from a language model may be used choose a natural and grammatical place in the sentence for inserting the advertisement. Insertion of the advertisement creates a modified sentence without degrading a meaning of the original sentence, yet also includes the advertisement as a part of a new sentence.	06-17-2010
20100205202	Visual and Textual Query Suggestion - Techniques described herein enable better understanding of the intent of a user that submits a particular search query. These techniques receive a search request for images associated with a particular query. In response, the techniques determine images that are associated with the query, as well as other keywords that are associated with these images. The techniques then cluster, for each set of images associated with one of these keywords, the set of images into multiple groups. The techniques then rank the images and determine a representative image of each cluster. Finally, the tools suggest, to the user that submitted the query, to refine the search based on user selection of a keyword and a representative image. Thus, the techniques better understand the user's intent by allowing the user to refine the search based on another keyword and based on an image on which the user wishes to focus the search.	08-12-2010
20110075992	INTELLIGENT OVERLAY FOR VIDEO ADVERTISING - Video advertising overlay technique embodiments are presented that generally detect a set of spatio-temporal nonintrusive positions within a series of consecutive video frames in shots of a digital video and then overlay contextually relevant ads on these positions. In one general embodiment, this is accomplished by decomposing the video into a series of shots, and then identifying a video advertisement for each of a selected set of the shots. The identified video advertisement is one that is determined to be the most relevant to the content of the shot. An overlay area is also identified in each of the shots, where the selected overlay area is the least intrusive among a plurality of prescribed areas to a viewer of the video. The video advertisements identified for the shots are then respectively scheduled to be overlaid in the identified overlay area of a shot, whenever the shot is played.	03-31-2011
20110196859	Visual Search Reranking - An initial ranked list of a first plurality of visual documents is obtained from a first source in response to a query, and a second plurality of visual documents relevant to the query is gathered from a plurality of second sources. Visual patterns identified from the second plurality of visual documents are compared with the first visual documents for reranking the first visual documents.	08-11-2011
20110264700	ENRICHING ONLINE VIDEOS BY CONTENT DETECTION, SEARCHING, AND INFORMATION AGGREGATION - Many internet users consume content through online videos. For example, users may view movies, television shows, music videos, and/or homemade videos. It may be advantageous to provide additional information to users consuming the online videos. Unfortunately, many current techniques may be unable to provide additional information relevant to the online videos from outside sources. Accordingly, one or more systems and/or techniques for determining a set of additional information relevant to an online video are disclosed herein. In particular, visual, textual, audio, and/or other features may be extracted from an online video (e.g., original content of the online video and/or embedded advertisements). Using the extracted features, additional information (e.g., images, advertisements, etc.) may be determined based upon matching the extracted features with content of a database. The additional information may be presented to a user consuming the online video.	10-27-2011
20110267544	NEAR-LOSSLESS VIDEO SUMMARIZATION - Described is perceptually near-lossless video summarization for use in maintaining video summaries, which operates to substantially reconstruct an original video in a generally perceptually near-lossless manner. A video stream is summarized with little information loss by using a relatively very small piece of summary metadata. The summary metadata comprises an image set of synthesized mosaics and representative keyframes, audio data, and the metadata about video structure and motion. In one implementation, the metadata is computed and maintained (e.g., as a file) to summarize a relatively large video sequence, by segmenting a video shot into subshots, and selecting keyframes and mosaics based upon motion data corresponding to those subshots. The motion data is maintained as a semantic description associated with the image set. To reconstruct the video, the metadata is processed, including simulating motion using the image set and the semantic description, which recovers the audiovisual content without any significant information loss.	11-03-2011
20110288929	Enhancing Photo Browsing through Music and Advertising - Techniques for recommending music and advertising to enhance a user's experience while photo browsing are described. In some instances, songs and ads are ranked for relevance to at least one photo from a photo album. The songs, ads and photo(s) from the photo album are then mapped to a style and mood ontology to obtain vector-based representations. The vector-based representations can include real valued terms, each term associated with a human condition defined by the ontology. A re-ranking process generates a relevancy term for each song and each ad indicating relevancy to the photo album. The relevancy terms can be calculated by summing weighted terms from the ranking and the mapping. Recommended music and ads may then be provided to a user, as the user browses a series of photos obtained from the photo album. The ads may be seamlessly embedded into the music in a nonintrusive manner.	11-24-2011
20110289015	MOBILE DEVICE RECOMMENDATIONS - Users may browse web pages, interact with a plethora of applications, search for new content, and perform a wide variety of other tasks using a mobile device. Unfortunately, useful content may be difficult for a user to locate because of the large amount of content available (e.g. hundreds of thousands of applications within an application store). Accordingly, one or more systems and/or techniques for determining recommendations are disclosed herein. In particular, user input (e.g., text, numbers, etc.) and/or a user profile (e.g., contextual information relating to a user) may be used to determine a user intent. Recommendations may be determined based upon the user intent. For example, a user may input “I am hungry” using a mobile phone having a GPS location of Downtown and a noon timestamp. Using this information, an application allowing the user to make lunch reservations at local restaurants may be provided as a recommendation.	11-24-2011
20120095825	Incentive Selection of Region-of-Interest and Advertisements for Image Advertising - Techniques for image selection and region of interest analysis are described herein. A pair of two or more users is configured, and an image is displayed to the pair. The image can be a still image (i.e., a picture) or a moving image (i.e., video). In some instances, a plurality of advertisements is suggested for possible association with the image. Input is received from both users in the pair, indicating a positive or a negative association between each advertisement and the image. When the pair positively rates an advertisement, the advertisement is associated with the image. A plurality of regions of interest within the image may be suggested. In response, positive or negative input is received from the pair indicating whether each of the plurality of regions of interest is appropriately suggested for placement of an advertisement.	04-19-2012
20120109754	SPONSORED MULTI-MEDIA BLOGGING - The sponsored multi-media blogging technique is an advertising-driven service on a computing device, such as a mobile phone, that makes the multi-media micro-blog or blog an effective carrier for advertising. The data collected while employing the sponsored multi-media blogging technique is used for user intent mining and increasing advertisement relevance for mobile advertising projects. The benefits to the sponsored multi-media blogging technique's users are a natural interface for composing multi-media micro-blogs/blogs and instant experience sharing, while the benefits to advertisers is the promoted brand impression from the contextual advertising in rich media micro-blogs/blogs.	05-03-2012
20120110432	Tool for Automated Online Blog Generation - Techniques for the design and operation of a blogging tool for automated blog creation and automated upload to a server are described herein. A content capturing process may obtain a plurality of images, including still images or video, as well as audio capture of voices and other sound, according to direction of a user operating an image-capture device. One or more of the images may be annotated with metadata or with text, which may be derived from verbal content provided by the user. A template may be selected in either an automated or user-controlled manner. The images and other content may be assembled into the template to form a blog entry. The blog entry may be uploaded to a server or otherwise shared. In one example, the uploading may be in response to a single user command, obtained by operation of a physical user interface or from verbal user input.	05-03-2012
20120263433	Detecting Key Roles and Their Relationships from Video - Tools and techniques for acquiring key roles and their relationships from a video independent of metadata, such as cast lists and scripts, are described herein. These techniques include discovering key roles and their relationships by treating a video (e.g., a movie, television program, music video, and personal video, etc.) as a community. For instance, a video is segmented into a hierarchical structure that includes levels for scenes, shots, and key frames. In some implementations, the techniques include performing face detection and grouping on the detected key frames. In some implementations, the techniques include exploiting the key roles and their correlations in this video to discover a community. The discovered community provides for a wide variety of applications, including the automatic generation of visual summaries or video posters including acquired key roles.	10-18-2012
20120294520	GESTURE-BASED VISUAL SEARCH - A user may perform an image search on an object shown in an image. The user may use a mobile device to display an image. In response to displaying the image, the client device may send the image to a visual search system for image segmentation. Upon receiving a segmented image from the visual search system, the client device may display the segmented image to the user who may select one or more segments including an object of interest to instantiate a search. The visual search system may formulate a search query based on the one or more selected segments and perform a search using the search query. The visual search system may then return search results to the client device for display to the user.	11-22-2012
20120295640	User Behavior Model for Contextual Personalized Recommendation - A user behavior model provides personalized recommendations based in part on time and location, particularly to users of mobile devices. Entity types are ranked according to relevance to the user. Example entity types are restaurant, hotel, etc. The relevance may be based on reference to a large-scale database containing queries from other users. Additionally, entities within each entity type may be ranked based on relevance to the user and the time and location context. A user interface may display a ranked list of entity types, such as restaurant, hotel, etc., wherein each entity type is represented by a highest-ranked entity with the entity type. Thus, the user interface may display a highest-ranked restaurant, a highest-ranked hotel, etc. Upon user selection of one such entity type the user interface is replaced with a second user interface, for example showing a ranked hierarchy of restaurants, headed by the highest-ranked restaurant.	11-22-2012
20120297038	Recommendations for Social Network Based on Low-Rank Matrix Recovery - Techniques describe analyzing users and groups of a social network to identify user interests and providing recommendations for a user based on the user's identified interests. A content-awareness application obtains a collection of images and tags associated with the images belonging to members in the social network. The content-awareness application decomposes the members into a representative matrix to identify users and groups in order to calculate a similarity matrix between the users and their images based on a visual content of the images and a textual content of the tags. The content-awareness application further constructs a graph Laplacian over the users and the groups to align with the representative matrix based at least in part on the similarity matrix and further provides recommendations of groups for a user to join in the social network based at least in part on the graph Laplacian identifying the user's interests.	11-22-2012
20130179257	In-Text Embedded Advertising - Computer program products, devices, and methods for generating in-text embedded advertising are described. Embedded advertising is “hidden” or embedded into a message by matching an advertisement to the message and identifying a place in the message to insert the advertisement. For textual messages, statistical analysis of individual sentences is performed to determine where it would be most natural to insert an advertisement. Statistical rules of grammar derived from a language model may be used choose a natural and grammatical place in the sentence for inserting the advertisement. Insertion of the advertisement creates a modified sentence without degrading a meaning of the original sentence, yet also includes the advertisement as a part of a new sentence.	07-11-2013
20140003714	GESTURE-BASED VISUAL SEARCH	01-02-2014
20140075393	Gesture-Based Search Queries - An image-based text extraction and searching system extracts an image be selected by gesture input by a user and the associated image data and proximate textual data in response to the image selection. Extracted image data and textual data can be utilized to perform or enhance a computerized search. The system can determine one or more database search terms based on the textual data and generate at least a first search query proposal related to the image data and the textual data.	03-13-2014
20140244614	Cross-Domain Topic Space - Some examples include receiving a microblog entry from a social stream domain. Further, some implementations include determining, based on a topic space associated with the social stream domain and a media domain, a topic that is associated with the microblog entry. Some implementations include determining, based on the topic space, one or more media items that are associated with the topic.	08-28-2014
20140250120	Interactive Multi-Modal Image Search - A facility for visual search on a mobile device takes advantage of multi-modal and multi-touch input on the mobile device. By extracting lexical entities from a spoken search query and matching the lexical entities to image tags, the facility provides candidate images for each entity. Selected ones of the candidate images are used to construct a composite visual query image on a query canvas. The relative size and position of the selected candidate images in the composite visual query image, which need not be an existing image, contribute to a definition of a context of the composite visual query image being submitted for context-aware visual search.	09-04-2014
20140289228	USER BEHAVIOR MODEL FOR CONTEXTUAL PERSONALIZED RECOMMENDATION - A user behavior model provides personalized recommendations based in part on time and location, particularly to users of mobile devices. Entity types are ranked according to relevance to the user. Example entity types are restaurant, hotel, etc. The relevance may be based on reference to a large-scale database containing queries from other users. Additionally, entities within each entity type may be ranked based on relevance to the user and the time and location context. A user interface may display a ranked list of entity types, such as restaurant, hotel, etc., wherein each entity type is represented by a highest-ranked entity with the entity type. Thus, the user interface may display a highest-ranked restaurant, a highest-ranked hotel, etc. Upon user selection of one such entity type the user interface is replaced with a second user interface, for example showing a ranked hierarchy of restaurants, headed by the highest-ranked restaurant.	09-25-2014
20140354768	Socialized Mobile Photography - A system, method or computer readable storage device to enable mobile devices in capturing high quality photos by using both the rich context available from mobile devices and crowd-sourced social media on the Web. Considering the flexible and adaptive adoption of photography principles with different content and context composition rules and exposure principles are learned from the community-contributed images. Leveraging a mobile device user's scene context and social context, the proposed socialized mobile photography system is able to suggest optimal view enclosure to achieve appealing composition. Due to the complex scene content and a number of shooting-related contexts to exposure parameters, exposure learning is applied to suggest appropriate camera parameters.	12-04-2014

Patent applications by Tao Mei, Beijing CN

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Tao Mei, Beijing CN

Tao Mei, Beijing CN