Patent application number | Description | Published |
20080240237 | REAL-TIME FACE DETECTION - An apparatus, a method, and a computer-readable medium having instructions encoded thereon that when executed cause a method to be carried out. The method includes dividing at least a portion of a picture of a video stream into parts of blocks, and processing the parts in parallel by a plurality of interconnected processors. The processing of a respective part by its respective processor includes edge detection and color segmentation to determine block-level edge features including block-level color-segmented edge features. Each processor also performs coding functions on its respective part of the picture. The method also includes block-level processing using the block-level edge features to determine which blocks in the picture are likely to be that of a face, the block-level processing being at the granularity of at least a block. | 10-02-2008 |
20080240571 | REAL-TIME FACE DETECTION USING TEMPORAL DIFFERENCES - An apparatus, a method, and a computer-readable medium having instructions encoded thereon that when executed cause a method to be carried out. The method includes dividing at least a portion of a picture of a video stream into parts of blocks, and processing the parts in parallel by a plurality of interconnected processors. The processing of a respective part by its respective processor includes determining block-level temporal difference features. Each processor also performs coding functions on its respective part of the picture. The method also includes block-level processing using the block-level temporal difference features to determine which blocks in the picture are likely to be that of a face, the block-level processing being at the granularity of at least a block. In one version, the processing in each processor includes edge detection and color segmentation to determine block-level edge features including block-level color-segmented edge features that are then used in the block level processing. | 10-02-2008 |
20080247463 | LONG TERM REFERENCE FRAME MANAGEMENT WITH ERROR FEEDBACK FOR COMPRESSED VIDEO COMMUNICATION - An apparatus, software encoded in tangible media, and a method at an encoder. The method includes sending compressed video data including a reference frame message to create a long term reference frame to a plurality of decoders at one or more destination points, receiving feedback from the decoders indicative of whether or not the decoders successfully received the reference frame message, and in the case that the received feedback is such that at least one of the decoders did not successfully receive the reference frame message or does not have the indicated recent frame, repeating sending a reference frame message to create the long term reference frame. Using the method can replaces I-frame error recovery with long term reference frames, even in the case where the reference frame management messages are lost to at least one decoder. | 10-09-2008 |
20090015717 | IMAGE RESIZER AND RESIZING METHOD - An apparatus embodiment is operative to scale video and includes an input buffer coupled to a real time source of video data and configured to hold a number of lines of video, a horizontal resizer coupled to the input buffer to resize lines of image data, outputting horizontally scaled line(s) to an intermediate buffer configured to store a number of lines. The apparatus has a vertical resizer coupled to the intermediate buffer configured to output vertically and horizontally resized lines of image data. At any given time, some of the lines in the input buffer are scheduled using the DMA controller for replacement via DMA by lines generated by the source of video data, and some or all of the remaining lines in the input buffer are available for processing by the horizontal resizer. A sufficient number of lines are available in the intermediate buffer, such that in operation, the intermediate buffer need not introduce latency. | 01-15-2009 |
20090122867 | Coding Background Blocks in Video Coding that Includes Coding as Skipped - A method, an apparatus, and a method to encode a block in a picture of a time sequence of pictures such as video. The method includes selecting the mode for coding the block, one of the modes being to code the block as skipped. The method further includes limiting the number of consecutive times a particular block is coded as skipped without re-setting the quantization level to a relatively fine level of quantization and re-selecting the mode. | 05-14-2009 |
20090238278 | VIDEO COMPRESSION USING SEARCH TECHNIQUES OF LONG-TERM REFERENCE MEMORY - Particular embodiments generally relate to video compression. In one embodiment, a store of reference frames is provided in memory. The reference frames may be classified based on a plurality of classifiers. The classifiers may correspond to features that are found in the reference frame. A frame to encode is then received. The frame is analyzed to determine features found in the frame. As macroblocks in the frame are encoded, a macroblock is analyzed to determine which feature may be included in the macroblock. The feature is used to determine a classifier, which is used to determine a subset of the reference frames. The subset is then searched to determine a reference frame for the macroblock. | 09-24-2009 |
20090244257 | VIRTUAL ROUND-TABLE VIDEOCONFERENCE - A system and method for creating a virtual round table videoconference is described. An embodiment of the system comprises a plurality of displays arranged in an arc configuration with a table to create a virtual round table. Cameras are arranged around the plurality of displays such that when a participant looks at a display with an image of a remote participant, the camera associated with the display captures an image of the participant's gaze, making eye contact with the camera. The image is displayed at the remote participant's endpoint creating the effect of eye contact between the participants. In another embodiment, audio speakers are arranged to provide directional sound such that the video source for a display and the audio source for the associated speaker are from the same endpoint. | 10-01-2009 |
20090251528 | Video Switching Without Instantaneous Decoder Refresh-Frames - A system and method for reducing blurred video caused by intra-coded IDR-frames sent in response to when a destination endpoint in a multipoint videoconference switches to a new video source. An embodiment according to the invention comprises using inter-coded temporal predictive referencing a long term reference frame (LTRF) instead of IDR-frames in a multipoint videoconference system. | 10-08-2009 |
20090256901 | Pop-Up PIP for People Not in Picture - A system and method for alerting participants in a videoconference that one or more participants are improperly framed by the videoconference camera is provided. An embodiment comprises a temporary self-view picture-in-picture image appearing when the number of faces detected by the videoconference camera changes. A face detection algorithm is used to determine when the number of faces being detected by the videoconference camera has changed. The self-view picture-in-picture image displays, for a duration of time, a representation of the image being captured by the videoconference camera, allowing participants who are not properly framed by the videoconference camera to adjust their position to that their faces are captured by the videoconference camera. | 10-15-2009 |
20090324023 | Combined Face Detection and Background Registration - Techniques are provided to analyze video frames of a video signal in order to distinguish regions containing a face (and body torso) from regions that contain a relatively static background. The region containing the face is referred to as a foreground region. A current video frame is divided into a plurality of elements and the foreground regions and background regions are detected. The background regions of a subsequent video frame are detected/registered using the foreground regions of the current video frame. The foreground regions of the subsequent video frame are determined using the background regions of the current video frame as a temporal reference. | 12-31-2009 |
20100002006 | Modal Multiview Display Layout - A system and method for providing a plurality of viewing angles and images on a display. An embodiment comprises a display system where a user has the option of determining the number of images to view and the range of viewable angles for each image. A display system is configured to display a maximum number of images at different viewing angles by interlacing a plurality of images so that each viewing angle shows a selected image. The display system provides a method by which an operator can increase the viewing area of an image by interlacing the same image to more than one viewing angle. | 01-07-2010 |
20100061225 | NETWORK-ADAPTIVE PREEMPTIVE REPAIR IN REAL-TIME VIDEO - A network-adaptive error recovery method for real-time video transmission based on sending repair frames preemptively with a frequency that is based on observed run-length of good frames and round trip time. | 03-11-2010 |
20100123770 | MULTIPLE VIDEO CAMERA PROCESSING FOR TELECONFERENCING - A method, an apparatus, and a storage medium with executable code to execute a method including accepting camera views of at least some participants of a teleconference, each view from a corresponding video camera, with the camera views together including at least one view of each participant. The method includes accepting audio from a plurality of microphones, and processing the audio from the plurality of microphones to generate audio data and direction information indicative of the direction of sound received at the microphones. The method further includes generating one or more candidate people views, with each people view being of an area enclosing a head and shoulders view of at least one participant. The method also includes making a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints. | 05-20-2010 |
20100125768 | ERROR RESILIENCE IN VIDEO COMMUNICATION BY RETRANSMISSION OF PACKETS OF DESIGNATED REFERENCE FRAMES - Techniques are provided for video communication between multiple devices. Each of a plurality of video packets is designated as being part of a required reference frame that is subsequently to be used for a repair process. A stream of video packets that includes the packets for the required reference frame is transmitted from a source device over a communication medium for reception by a plurality of destination devices. A determination is made that at least one of the plurality of destination devices did not receive at least one packet of the required reference frame, and the at least one packet is retransmitted the at least one of the plurality of destination devices. When the retransmitted packet is received at the at least one destination device, it is decoded and stored without using it for generating a picture for display at the time that the at least one packet is received. | 05-20-2010 |
20100208078 | HORIZONTAL GAZE ESTIMATION FOR VIDEO CONFERENCING - Techniques are provided to determine the horizontal gaze of a person from a video signal generated from viewing the person with at least one video camera. From the video signal, a head region of the person is detected and tracked. The dimensions and location of a sub-region within the head region is also detected and tracked from the video signal. An estimate of the horizontal gaze of the person is computed from a relative position of the sub-region within the head region. | 08-19-2010 |
20100246680 | REFERENCE PICTURE PREDICTION FOR VIDEO CODING - A video coder includes a forward coder and a reconstruction module determining a motion compensated predicted picture from one or more previously decoded pictures in a multi-picture store. The reconstruction module includes a reference picture predictor that uses only previously decoded pictures to determine one or more predicted reference pictures. The predicted reference picture(s) are used for motion compensated prediction. The reference picture predictor may include optical flow analysis that uses a current decoded picture and that may use one or more previously decoded pictures together with affine motion analysis and image warping to determine at least a portion of at least one of the reference pictures. | 09-30-2010 |
20110018959 | Automatic Display Latency Measurement System for Video Conferencing - Methods and systems that compensate for display latency when separate speakers are used during video conferencing. A method includes determining whether speakers, which are not controlled by a display, are to be used in connection with a video conferencing session. The method further includes sending, to the display, data that causes the display to generate a predetermined pattern, capturing imagery of the effects of the predetermined pattern shown on the display, calculating a latency of the display based on a difference in a time the data was sent and the imagery of the effects is received, and storing a value of the latency of the display in a device that enables the video conferencing. When the speakers, which are not controlled by the display are selected, are selected, the audio portion of the video conferencing session is redirected to those speakers, but delayed for an amount of time substantially equivalent to the value of the latency of the display. In this way audio and video synchronization may be improved. | 01-27-2011 |
20110228096 | SYSTEM AND METHOD FOR ENHANCING VIDEO IMAGES IN A CONFERENCING ENVIRONMENT - A method is provided in one example and includes receiving image data for a field of view associated with a display. The image data is used to generate a plurality of red green blue (RGB) frames. The method also includes emitting infrared energy onto the field of view in order to generate a plurality of infrared frames, the plurality of RGB frames and the plurality of infrared frames are generated by a single camera. The plurality of RGB frames can be combined with the plurality of infrared frames in order to generate a video data stream. In a more particular embodiment, the emitting of the infrared energy is synchronized with the camera such that the infrared energy is emitted onto the field of view at one half of an existing frame rate of the camera. | 09-22-2011 |
20110279630 | SYSTEM AND METHOD FOR PROVIDING RETRACTING OPTICS IN A VIDEO CONFERENCING ENVIRONMENT - An apparatus is provided in one example and includes a camera configured to receive image data associated with an end user involved in a video session. The apparatus also includes a display and an optics element configured to interface with the camera. The optics element reflects the image data associated with the end user positioned in front of the display. A retracting mechanism is also provided and is configured to retract the optics element in a direction such that the camera moves to an inactive state and the optics element is removed from a view of the display from the perspective of the end user. An effective optical distance from the camera to the end user is increased by manipulating a position of the optics element. In more detailed embodiments, the camera can be configured above the display such that its lens points downward toward the optics element. | 11-17-2011 |
20110285825 | Implementing Selective Image Enhancement - A method that includes capturing depth information associated with a first field of view of a depth camera. The depth information is represented by a first plurality of depth pixels. The method also includes capturing color information associated with a second field of view of a video camera that substantially overlaps with the first field of view of the depth camera. The color information is represented by a second plurality of color pixels. The method further includes enhancing color information represented by at least one color pixel of the second plurality of color pixels to generate an enhanced image. The enhanced image adjusts an exposure characteristic of the color information captured by the video camera. The at least one color pixel is enhanced based on depth information represented by at least one corresponding depth pixel of the first plurality of depth pixels. | 11-24-2011 |
20120057636 | SYSTEM AND METHOD FOR SKIP CODING DURING VIDEO CONFERENCING IN A NETWORK ENVIRONMENT - A method is provided in one example and includes receiving an input video, and identifying values of pixels from noise associated with a current video image within the video input. The method also includes creating a skip-reference video image associated with the identified pixel values, and comparing a portion of the current video image to the skip-reference video image. The method also includes determining a macroblock associated with the current video image to be skipped before an encoding operation occurs. | 03-08-2012 |
20120120270 | SYSTEM AND METHOD FOR PROVIDING ENHANCED AUDIO IN A VIDEO ENVIRONMENT - A method is provided in one example and includes receiving audio data at a microphone array that includes a plurality of microphones. The microphone array is provisioned at a first endpoint, which includes a camera element configured to capture video data associated with a video session involving the first endpoint and a second endpoint. The method also includes formatting the audio data into a time division multiplex (TDM) stream, and communicating the stream to a port for a subsequent communication over a network and to the second endpoint. | 05-17-2012 |
20120127259 | SYSTEM AND METHOD FOR PROVIDING ENHANCED VIDEO PROCESSING IN A NETWORK ENVIRONMENT - A method is provided in one example and includes receiving a video input from a camera element; using change detection statistics to identify background image data; using the background image data as a temporal reference to determine foreground image data of a particular video frame within the video input; using a selected foreground image for a background registration of a subsequent video frame; and providing at least a portion of the subsequent video frame to a next destination. | 05-24-2012 |
20120328202 | METHOD AND APPARATUS FOR ENROLLING A USER IN A TELEPRESENCE SYSTEM USING A FACE-RECOGNITION-BASED IDENTIFICATION SYSTEM - In one embodiment, a method includes obtaining a first image of a party that is stored in a first structure in response to an instruction to enroll the user in a system, and using information associated with the first image to identify a second image stored in a second structure. The second image has a relatively high likelihood of depicting the party. Finally, the method includes enrolling the party in the system, wherein enrolling the party in the system includes associating the second image with the party. | 12-27-2012 |
20130136185 | REFERENCE PICTURE PREDICTION FOR VIDEO CODING - A video coder includes a forward coder and a reconstruction module determining a motion compensated predicted picture from one or more previously decoded pictures in a multi-picture store. The reconstruction module includes a reference picture predictor that uses only previously decoded pictures to determine one or more predicted reference pictures. The predicted reference picture(s) are used for motion compensated prediction. The reference picture predictor may include optical flow analysis that uses a current decoded picture and that may use one or more previously decoded pictures together with affine motion analysis and image warping to determine at least a portion of at least one of the reference pictures. | 05-30-2013 |
20140063177 | Generating and Rendering Synthesized Views with Multiple Video Streams in Telepresence Video Conference Sessions - Techniques are provided for establishing a videoconference session between participants at different endpoints, where each endpoint includes at least one computing device and one or more displays. A plurality of video streams is received at an endpoint, and each video stream is classified as at least one of a people view and a data view. The classified views are analyzed to determine one or more regions of interest for each of the classified views, where at least one region of interest has a size smaller than a size of the classified view. Synthesized views of at least some of the video streams are generated, wherein the synthesized views include at least one view including a region of interest, and views including the synthesized views are rendered at one or more displays of an endpoint device. | 03-06-2014 |
20140085398 | REAL-TIME AUTOMATIC SCENE RELIGHTING IN VIDEO CONFERENCE SESSIONS - Video frames are captured at one or more cameras during a video conference session, where each video frame includes a digital image with a plurality of pixels. Depth values associated with each pixel are determined in at least one video frame, where each depth value represents a distance of a portion of the digital image represented by at least one corresponding pixel from the one or more cameras that capture the at least one video frame. Luminance values of pixels are adjusted within captured video frames based upon the depth values determined for the pixels so as to achieve relighting of the video frames as the video frames are displayed during the video conference session. | 03-27-2014 |
20150042748 | Generating and Rendering Synthesized Views with Multiple Video Streams in Telepresence Video Conference Sessions - Techniques are provided for establishing a videoconference session between participants at different endpoints, where each endpoint includes at least one computing device and one or more displays. A plurality of video streams is received at an endpoint, and each video stream is classified as at least one of a people view and a data view. The classified views are analyzed to determine one or more regions of interest for each of the classified views, where at least one region of interest has a size smaller than a size of the classified view. Synthesized views of at least some of the video streams are generated, wherein the synthesized views include at least one view including a region of interest, and views including the synthesized views are rendered at one or more displays of an endpoint device. | 02-12-2015 |