Dihong Tian, San Jose US

Dihong Tian, San Jose, CA US

Patent application number	Description	Published
20080240237	REAL-TIME FACE DETECTION - An apparatus, a method, and a computer-readable medium having instructions encoded thereon that when executed cause a method to be carried out. The method includes dividing at least a portion of a picture of a video stream into parts of blocks, and processing the parts in parallel by a plurality of interconnected processors. The processing of a respective part by its respective processor includes edge detection and color segmentation to determine block-level edge features including block-level color-segmented edge features. Each processor also performs coding functions on its respective part of the picture. The method also includes block-level processing using the block-level edge features to determine which blocks in the picture are likely to be that of a face, the block-level processing being at the granularity of at least a block.	10-02-2008
20080240571	REAL-TIME FACE DETECTION USING TEMPORAL DIFFERENCES - An apparatus, a method, and a computer-readable medium having instructions encoded thereon that when executed cause a method to be carried out. The method includes dividing at least a portion of a picture of a video stream into parts of blocks, and processing the parts in parallel by a plurality of interconnected processors. The processing of a respective part by its respective processor includes determining block-level temporal difference features. Each processor also performs coding functions on its respective part of the picture. The method also includes block-level processing using the block-level temporal difference features to determine which blocks in the picture are likely to be that of a face, the block-level processing being at the granularity of at least a block. In one version, the processing in each processor includes edge detection and color segmentation to determine block-level edge features including block-level color-segmented edge features that are then used in the block level processing.	10-02-2008
20090016440	POSITION CODING FOR CONTEXT-BASED ADAPTIVE VARIABLE LENGTH CODING - Particular embodiments include a method, an apparatus, and logic embodied in tangible computer-readable medium that when executed carries out a method of encoding an ordered sequence of quantized transform coefficients of a block of image data. One embodiment is a context adaptive variable length coding method that includes position coding the positions of zero-valued and non-zero valued coefficients by either a mixed method that encodes either the run length of zeroes preceding a non-zero coefficient or the run length of nonzero-valued coefficients preceding a zero-valued coefficients. Another includes position coding that uses a variable length code for two parameters respectively indicating the number of zero-valued coefficient positions and nonzero-valued coefficient positions still to be coded.	01-15-2009
20090086815	CONTEXT ADAPTIVE POSITION AND AMPLITUDE CODING OF COEFFICIENTS FOR VIDEO COMPRESSION - A coding method, apparatus, and medium with software encoded thereon to implement a coding method. The coding method includes encoding the position of non-zero-valued coefficients in an ordered series of quantized transform coefficients of a block of image data, including encoding events using variable length coding using a plurality of variable length code mappings that each maps events to codewords, the position encoding including switching between the code mappings based on the context. The coding method further includes encoding amplitudes of the non-zero-valued coefficients using variable dimensional amplitude coding in the reverse order of the original ordering of the series.	04-02-2009
20090087109	REDUCED CODE TABLE SIZE IN JOINT AMPLITUDE AND POSITION CODING OF COEFFICIENTS FOR VIDEO COMPRESSION - A coding method, apparatus, and medium with software encoded thereon to implement a coding method. The coding method includes jointly encoding joint events that each are defined by a cluster of consecutive non-zero-valued coefficients, each joint event defined by three parameters: the number of zero-valued coefficients preceding the cluster, the number of non-zero-valued coefficients in the cluster, and an indication of which trailing coefficients up to a maximum number of M trailing coefficients have amplitude greater than 1, with the coding using a 3-dimensional joint VLC table. The method further includes encoding the amplitude of the non-zero-valued trailing coefficients that have amplitude greater than 1 encoding the amplitude of any remaining non-zero-valued coefficients in the clusters that have more than M non-zero-valued coefficients.	04-02-2009
20090087113	VARIABLE LENGTH CODING OF COEFFICIENT CLUSTERS FOR IMAGE AND VIDEO COMPRESSION - A coding method, apparatus, and medium with software encoded thereon to implement a coding method. The coding method includes encoding cluster of consecutive non-zero-valued coefficients, the encoding of a cluster including jointly encoding joint events that each are defined by at least two parameters: the number of zero-valued coefficients preceding the cluster, and the number of non-zero-valued coefficients in the cluster. The encoding of the cluster also includes encoding a parameter indicative of the number of amplitude-1 trailing non-zero-valued coefficients in the cluster, in one version with the parameter indicative of the number of trailing amplitude-1 coefficients part of the joint events such that the coding is according to a 3-dimensional joint variable length coding table. The method further includes encoding the amplitudes of the non-zero-valued coefficients that are not encoded by the joint encoding, e.g., encoding the amplitudes of the other than the trailing amplitude-1 coefficients.	04-02-2009
20090154820	CONTEXT ADAPTIVE HYBRID VARIABLE LENGTH CODING - A coding method for an ordered series of quantized transform coefficients of a block of image data, including a context adaptive position coding process to encode the position of clusters of non-zero-valued coefficients, e.g., a multidimensional position coder that uses one of a plurality of code mappings selected according to at least one criterion including at least one context-based criterion, and an amplitude encoding process to encode any amplitudes remaining to be coded, the amplitude coding using one or a plurality of amplitude code mappings selected according to at least one criterion, including a context-based criterion. A context-based selection criterion is meant a criterion that during encoding is known or derivable from one or more previously encoded items of information. Also a coding apparatus, a decoding apparatus, a computer readable medium configured with instructions that when executed implement a coding method, and another medium for a decoding method.	06-18-2009
20090238278	VIDEO COMPRESSION USING SEARCH TECHNIQUES OF LONG-TERM REFERENCE MEMORY - Particular embodiments generally relate to video compression. In one embodiment, a store of reference frames is provided in memory. The reference frames may be classified based on a plurality of classifiers. The classifiers may correspond to features that are found in the reference frame. A frame to encode is then received. The frame is analyzed to determine features found in the frame. As macroblocks in the frame are encoded, a macroblock is analyzed to determine which feature may be included in the macroblock. The feature is used to determine a classifier, which is used to determine a subset of the reference frames. The subset is then searched to determine a reference frame for the macroblock.	09-24-2009
20090256901	Pop-Up PIP for People Not in Picture - A system and method for alerting participants in a videoconference that one or more participants are improperly framed by the videoconference camera is provided. An embodiment comprises a temporary self-view picture-in-picture image appearing when the number of faces detected by the videoconference camera changes. A face detection algorithm is used to determine when the number of faces being detected by the videoconference camera has changed. The self-view picture-in-picture image displays, for a duration of time, a representation of the image being captured by the videoconference camera, allowing participants who are not properly framed by the videoconference camera to adjust their position to that their faces are captured by the videoconference camera.	10-15-2009
20090324023	Combined Face Detection and Background Registration - Techniques are provided to analyze video frames of a video signal in order to distinguish regions containing a face (and body torso) from regions that contain a relatively static background. The region containing the face is referred to as a foreground region. A current video frame is divided into a plurality of elements and the foreground regions and background regions are detected. The background regions of a subsequent video frame are detected/registered using the foreground regions of the current video frame. The foreground regions of the subsequent video frame are determined using the background regions of the current video frame as a temporal reference.	12-31-2009
20100061225	NETWORK-ADAPTIVE PREEMPTIVE REPAIR IN REAL-TIME VIDEO - A network-adaptive error recovery method for real-time video transmission based on sending repair frames preemptively with a frequency that is based on observed run-length of good frames and round trip time.	03-11-2010
20100119164	Embedded Image Quality Stamps - In an image generation and rendering system, a quality stamp indicative of image fidelity is embedded in image data units resulting from image data compression/encoding. At decoding, the image quality stamp is captured and when the decoded image is rendered, a fidelity indicator is displayed along with the image.	05-13-2010
20100125768	ERROR RESILIENCE IN VIDEO COMMUNICATION BY RETRANSMISSION OF PACKETS OF DESIGNATED REFERENCE FRAMES - Techniques are provided for video communication between multiple devices. Each of a plurality of video packets is designated as being part of a required reference frame that is subsequently to be used for a repair process. A stream of video packets that includes the packets for the required reference frame is transmitted from a source device over a communication medium for reception by a plurality of destination devices. A determination is made that at least one of the plurality of destination devices did not receive at least one packet of the required reference frame, and the at least one packet is retransmitted the at least one of the plurality of destination devices. When the retransmitted packet is received at the at least one destination device, it is decoded and stored without using it for generating a picture for display at the time that the at least one packet is received.	05-20-2010
20100202670	CONTEXT AWARE, MULTIPLE TARGET IMAGE RECOGNITION - In one embodiment, an apparatus may receive at least one image in which multiple targets are represented. The apparatus may assign possible identities to the targets based on probabilities associated with the identities. The apparatus may base a probability of a target being one of the identities, at least in part, on an identity-specific context and on a conditional probability that the target is the identity given that each one of at least two other of the targets is another respective one of the identities. The identity-specific context may be information that relates to a determined identity. The apparatus may identify the targets based on the identities and on the probabilities associated with the identities.	08-12-2010
20100208078	HORIZONTAL GAZE ESTIMATION FOR VIDEO CONFERENCING - Techniques are provided to determine the horizontal gaze of a person from a video signal generated from viewing the person with at least one video camera. From the video signal, a head region of the person is detected and tracked. The dimensions and location of a sub-region within the head region is also detected and tracked from the video signal. An estimate of the horizontal gaze of the person is computed from a relative position of the sub-region within the head region.	08-19-2010
20100246680	REFERENCE PICTURE PREDICTION FOR VIDEO CODING - A video coder includes a forward coder and a reconstruction module determining a motion compensated predicted picture from one or more previously decoded pictures in a multi-picture store. The reconstruction module includes a reference picture predictor that uses only previously decoded pictures to determine one or more predicted reference pictures. The predicted reference picture(s) are used for motion compensated prediction. The reference picture predictor may include optical flow analysis that uses a current decoded picture and that may use one or more previously decoded pictures together with affine motion analysis and image warping to determine at least a portion of at least one of the reference pictures.	09-30-2010
20110080946	LOCALLY VARIABLE QUANTIZATION AND HYBRID VARIABLE LENGTH CODING FOR IMAGE AND VIDEO COMPRESSION - A coding method, apparatus, and storage media with instructions to carry out a method. The method operates on an ordered series of transform coefficients of a block of image data, and for a fixed quantization method, and includes quantizing and encoding the ordered series to form a coded bitstream. The quantizing and encoding uses one or more variable length code (VLC) mappings. The quantizing includes quantizing to have amplitude-1 at least one coefficient that would be quantized by the fixed quantization method to have zero amplitude, quantizing to have zero amplitude at least one coefficient that would be quantized by the fixed quantization method to have amplitude-1, and using the fixed quantization method to quantize any coefficient that is quantized by the fixed quantization method not to have zero amplitude, amplitude-1, or amplitude-2.	04-07-2011
20110228096	SYSTEM AND METHOD FOR ENHANCING VIDEO IMAGES IN A CONFERENCING ENVIRONMENT - A method is provided in one example and includes receiving image data for a field of view associated with a display. The image data is used to generate a plurality of red green blue (RGB) frames. The method also includes emitting infrared energy onto the field of view in order to generate a plurality of infrared frames, the plurality of RGB frames and the plurality of infrared frames are generated by a single camera. The plurality of RGB frames can be combined with the plurality of infrared frames in order to generate a video data stream. In a more particular embodiment, the emitting of the infrared energy is synchronized with the camera such that the infrared energy is emitted onto the field of view at one half of an existing frame rate of the camera.	09-22-2011
20110285825	Implementing Selective Image Enhancement - A method that includes capturing depth information associated with a first field of view of a depth camera. The depth information is represented by a first plurality of depth pixels. The method also includes capturing color information associated with a second field of view of a video camera that substantially overlaps with the first field of view of the depth camera. The color information is represented by a second plurality of color pixels. The method further includes enhancing color information represented by at least one color pixel of the second plurality of color pixels to generate an enhanced image. The enhanced image adjusts an exposure characteristic of the color information captured by the video camera. The at least one color pixel is enhanced based on depth information represented by at least one corresponding depth pixel of the first plurality of depth pixels.	11-24-2011
20120057636	SYSTEM AND METHOD FOR SKIP CODING DURING VIDEO CONFERENCING IN A NETWORK ENVIRONMENT - A method is provided in one example and includes receiving an input video, and identifying values of pixels from noise associated with a current video image within the video input. The method also includes creating a skip-reference video image associated with the identified pixel values, and comparing a portion of the current video image to the skip-reference video image. The method also includes determining a macroblock associated with the current video image to be skipped before an encoding operation occurs.	03-08-2012
20120127259	SYSTEM AND METHOD FOR PROVIDING ENHANCED VIDEO PROCESSING IN A NETWORK ENVIRONMENT - A method is provided in one example and includes receiving a video input from a camera element; using change detection statistics to identify background image data; using the background image data as a temporal reference to determine foreground image data of a particular video frame within the video input; using a selected foreground image for a background registration of a subsequent video frame; and providing at least a portion of the subsequent video frame to a next destination.	05-24-2012
20120189222	POSITION CODING FOR CONTEXT-BASED ADAPTIVE VARIABLE LENGTH CODING - Particular embodiments include a method, an apparatus, and logic embodied in tangible computer-readable medium that when executed carries out a method of encoding an ordered sequence of quantized transform coefficients of a block of image data. One embodiment is a context adaptive variable length coding method that includes position coding the positions of zero-valued and non-zero valued coefficients by either a mixed method that encodes either the run length of zeroes preceding a non-zero coefficient or the run length of nonzero-valued coefficients preceding a zero-valued coefficients. Another includes position coding that uses a variable length code for two parameters respectively indicating the number of zero-valued coefficient positions and nonzero-valued coefficient positions still to be coded.	07-26-2012
20120257839	CONTEXT ADAPTIVE HYBRID VARIABLE LENGTH CODING - A coding method for an ordered series of quantized transform coefficients of a block of image data, including a context adaptive position coding process to encode the position of clusters of non-zero-valued coefficients, e.g., a multidimensional position coder that uses one of a plurality of code mappings selected according to at least one criterion including at least one context-based criterion, and an amplitude encoding process to encode any amplitudes remaining to be coded, the amplitude coding using one or a plurality of amplitude code mappings selected according to at least one criterion, including a context-based criterion. A context-based selection criterion means a criterion that during encoding is known or derivable from one or more previously encoded items of information. Also a coding apparatus, a decoding apparatus, a computer readable medium configured with instructions that when executed implement a coding method, and another medium for a decoding method.	10-11-2012
20120287302	SYSTEM AND METHOD FOR VIDEO CODING IN A DYNAMIC ENVIRONMENT - A method is provided in one example embodiment and includes receiving a camera dynamic parameter; determining a reference transform parameter based on the camera dynamic parameter; applying the reference transform parameter to generate a video image; and encoding the reference transform parameter in a bitstream for transmission with the video image. In other more specific instances, the method may include decoding a particular video image; decoding a particular reference transform parameter; and applying a particular reference transform parameter to the particular video image. The entropy-decoded data can undergo inverse quantization and transformation such that reference transformed data is combined with the entropy-decoded data. Additionally, the entropy-decoded data can be subjected to filtering before decoded video images are rendered on a display.	11-15-2012
20120328202	METHOD AND APPARATUS FOR ENROLLING A USER IN A TELEPRESENCE SYSTEM USING A FACE-RECOGNITION-BASED IDENTIFICATION SYSTEM - In one embodiment, a method includes obtaining a first image of a party that is stored in a first structure in response to an instruction to enroll the user in a system, and using information associated with the first image to identify a second image stored in a second structure. The second image has a relatively high likelihood of depicting the party. Finally, the method includes enrolling the party in the system, wherein enrolling the party in the system includes associating the second image with the party.	12-27-2012
20130002898	ENCODER-SUPERVISED IMAGING FOR VIDEO CAMERAS - A controller controls a camera that produces a sequence of images and that has output coupled to a video encoder. The camera has an operating condition including a field of view and lighting, and one or more imaging parameters. The video encoder encodes images from the camera into codewords. The controller receives one or more encoding properties from the video encoder, and causes adjusting one or more of the imaging parameters based on at least one of the received encoding properties, such that the camera produces additional images of the sequence of images for the video encoder using the adjusted one or more imaging parameters.	01-03-2013
20130010860	CONTEXT ADAPTIVE POSITION AND AMPLITUDE CODING OF COEFFICIENTS FOR VIDEO COMPRESSION - A coding method, apparatus, and medium with software encoded thereon to implement a coding method. The coding method includes encoding the position of non-zero-valued coefficients in an ordered series of quantized transform coefficients of a block of image data, including encoding events using variable length coding using a plurality of variable length code mappings that each maps events to codewords, the position encoding including switching between the code mappings based on the context. The coding method further includes encoding amplitudes of the non-zero-valued coefficients using variable dimensional amplitude coding in the reverse order of the original ordering of the series.	01-10-2013
20130136185	REFERENCE PICTURE PREDICTION FOR VIDEO CODING - A video coder includes a forward coder and a reconstruction module determining a motion compensated predicted picture from one or more previously decoded pictures in a multi-picture store. The reconstruction module includes a reference picture predictor that uses only previously decoded pictures to determine one or more predicted reference pictures. The predicted reference picture(s) are used for motion compensated prediction. The reference picture predictor may include optical flow analysis that uses a current decoded picture and that may use one or more previously decoded pictures together with affine motion analysis and image warping to determine at least a portion of at least one of the reference pictures.	05-30-2013
20130156332	SYSTEM AND METHOD FOR DEPTH-GUIDED IMAGE FILTERING IN A VIDEO CONFERENCE ENVIRONMENT - A method is provided in one example embodiment that includes receiving a plurality of depth values corresponding to pixels of an image; and filtering the image as a function of a plurality of variations in the depth values between adjacent pixels of a window associated with the image. In more detailed embodiments, the method may include encoding the image into a bit stream for transmission over a network. The filtering can account for a bit rate associated with the encoding of the image.	06-20-2013
20130342636	Image-Based Real-Time Gesture Recognition - Techniques are provided for image-based real-time gesture recognition. Video data of a person is obtained. Pixels are classified in the video stream at a given time instance during a time period as a foreground or a background pixel. A data entry is generated comprising data indicating foreground history values for each of a plurality of time instances of the video stream and data indicating a time period value. When the classifying indicates that a first pixel is a foreground pixel, the data structure associated with the first pixel is evaluated to determine whether or not to update a foreground history value associated with the first pixel at the given time instance. A motion gradient vector is generated for the video stream based on the foreground history value associated with the first pixel and foreground history values associated with other pixels.	12-26-2013
20140063177	Generating and Rendering Synthesized Views with Multiple Video Streams in Telepresence Video Conference Sessions - Techniques are provided for establishing a videoconference session between participants at different endpoints, where each endpoint includes at least one computing device and one or more displays. A plurality of video streams is received at an endpoint, and each video stream is classified as at least one of a people view and a data view. The classified views are analyzed to determine one or more regions of interest for each of the classified views, where at least one region of interest has a size smaller than a size of the classified view. Synthesized views of at least some of the video streams are generated, wherein the synthesized views include at least one view including a region of interest, and views including the synthesized views are rendered at one or more displays of an endpoint device.	03-06-2014
20140085398	REAL-TIME AUTOMATIC SCENE RELIGHTING IN VIDEO CONFERENCE SESSIONS - Video frames are captured at one or more cameras during a video conference session, where each video frame includes a digital image with a plurality of pixels. Depth values associated with each pixel are determined in at least one video frame, where each depth value represents a distance of a portion of the digital image represented by at least one corresponding pixel from the one or more cameras that capture the at least one video frame. Luminance values of pixels are adjusted within captured video frames based upon the depth values determined for the pixels so as to achieve relighting of the video frames as the video frames are displayed during the video conference session.	03-27-2014
20140160239	SYSTEM AND METHOD FOR DEPTH-GUIDED FILTERING IN A VIDEO CONFERENCE ENVIRONMENT - A method is provided in one example embodiment that includes generating a depth map that corresponds to a video image and filtering the depth map with the video image to create a filtered depth map. The video image can be filtered with the filtered depth map to create an image. In one example implementation, the video image is filtered using extended depth-guided filtering that is incorporated into a video encoding-decoding loop.	06-12-2014
20140169453	Context Adaptive Position and Amplitude Coding of Coefficients for Video - A coding method, apparatus, and medium with software encoded thereon to implement a coding method. The coding method includes encoding the position of non-zero-valued coefficients in an ordered series of quantized transform coefficients of a block of image data, including encoding events using variable length coding using a plurality of variable length code mappings that each maps events to codewords, the position encoding including switching between the code mappings based on the context. The coding method further includes encoding amplitudes of the non-zero-valued coefficients using variable dimensional amplitude coding in the reverse order of the original ordering of the series.	06-19-2014
20140253667	UTILIZING A SMART CAMERA SYSTEM FOR IMMERSIVE TELEPRESENCE - Video content is received at a computing device that camera views provided by video cameras that are aligned to capture images of participants within a defined space. Each camera view is at a first resolution and the video cameras are aligned such that a field of view (FOV) for each camera overlaps a portion of the FOV of at least one other adjacent camera. Positions of participants depicted within the video content are detected, where at least one participant is captured by overlapping FOVs of two adjacent camera views. A target view is generated from the first number of camera views, the target view having a second resolution that is lower than the first resolution, and the target view includes a view of the at least one participant captured within the overlapping FOVs of two adjacent camera views. The target view is displayed at a display device.	09-11-2014
20140254688	Perceptual Quality Of Content In Video Collaboration - Techniques are provided for receiving and decoding a sequence of video frames at a computing device, and analyzing a current video frame N to determine whether to skip or render the current video frame N for display by the computing device. The analyzing includes generating color histograms of the current video frame N and one or more previous video frames, determining a difference value representing a difference between the current video frame N and a previous video frame N−K, where K>0, the difference value being based upon the generated color histograms, in response to the difference value not exceeding a threshold value, rendering the current video frame N or a recently rendered video frame N−K using the current video frame, and in response to the difference value exceeding the threshold value, skipping the current video frame N from being rendered.	09-11-2014
20140292999	ANNOTATING A PRESENTATION IN A TELEPRESENCE MEETING - A processing system can include an encoder to encode a real-time transmission of a presentation. A memory buffer can copy and store images of the presentation and convert the images into snapshot images. A transmitter can transmit the snapshot images to an external annotation device, and a receiver can receive annotation data of an annotation performed on the snapshot images at the external annotation device. The annotation can be encoded, in accordance with the annotation data, into the real-time transmission of the presentation to display the real-time transmission with the annotation.	10-02-2014
20150042748	Generating and Rendering Synthesized Views with Multiple Video Streams in Telepresence Video Conference Sessions - Techniques are provided for establishing a videoconference session between participants at different endpoints, where each endpoint includes at least one computing device and one or more displays. A plurality of video streams is received at an endpoint, and each video stream is classified as at least one of a people view and a data view. The classified views are analyzed to determine one or more regions of interest for each of the classified views, where at least one region of interest has a size smaller than a size of the classified view. Synthesized views of at least some of the video streams are generated, wherein the synthesized views include at least one view including a region of interest, and views including the synthesized views are rendered at one or more displays of an endpoint device.	02-12-2015

Patent applications by Dihong Tian, San Jose, CA US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Dihong Tian, San Jose US

Dihong Tian, San Jose, CA US