Patent application number | Description | Published |
20080240237 | REAL-TIME FACE DETECTION - An apparatus, a method, and a computer-readable medium having instructions encoded thereon that when executed cause a method to be carried out. The method includes dividing at least a portion of a picture of a video stream into parts of blocks, and processing the parts in parallel by a plurality of interconnected processors. The processing of a respective part by its respective processor includes edge detection and color segmentation to determine block-level edge features including block-level color-segmented edge features. Each processor also performs coding functions on its respective part of the picture. The method also includes block-level processing using the block-level edge features to determine which blocks in the picture are likely to be that of a face, the block-level processing being at the granularity of at least a block. | 10-02-2008 |
20080240571 | REAL-TIME FACE DETECTION USING TEMPORAL DIFFERENCES - An apparatus, a method, and a computer-readable medium having instructions encoded thereon that when executed cause a method to be carried out. The method includes dividing at least a portion of a picture of a video stream into parts of blocks, and processing the parts in parallel by a plurality of interconnected processors. The processing of a respective part by its respective processor includes determining block-level temporal difference features. Each processor also performs coding functions on its respective part of the picture. The method also includes block-level processing using the block-level temporal difference features to determine which blocks in the picture are likely to be that of a face, the block-level processing being at the granularity of at least a block. In one version, the processing in each processor includes edge detection and color segmentation to determine block-level edge features including block-level color-segmented edge features that are then used in the block level processing. | 10-02-2008 |
20080247463 | LONG TERM REFERENCE FRAME MANAGEMENT WITH ERROR FEEDBACK FOR COMPRESSED VIDEO COMMUNICATION - An apparatus, software encoded in tangible media, and a method at an encoder. The method includes sending compressed video data including a reference frame message to create a long term reference frame to a plurality of decoders at one or more destination points, receiving feedback from the decoders indicative of whether or not the decoders successfully received the reference frame message, and in the case that the received feedback is such that at least one of the decoders did not successfully receive the reference frame message or does not have the indicated recent frame, repeating sending a reference frame message to create the long term reference frame. Using the method can replaces I-frame error recovery with long term reference frames, even in the case where the reference frame management messages are lost to at least one decoder. | 10-09-2008 |
20090109988 | Video Decoder with an Adjustable Video Clock - A method, an apparatus, and logic encoded in a computer-readable medium to carry out a method. The method includes receiving packets containing compressed video information, storing the received packets in a buffer memory, timestamping the received packets according to an adjustable clock; and removing packets from the buffer for decoding and playout of the video information, the removing according to playback order and at a time determined by the adjustable clock. The method includes adjusting the adjustable clock from time to time according to a measure the amount of time that the packets reside in the buffer memory, such that time latency caused by the buffer memory is limited. An overrun or an underrun of the buffer memory is unlikely. | 04-30-2009 |
20090122867 | Coding Background Blocks in Video Coding that Includes Coding as Skipped - A method, an apparatus, and a method to encode a block in a picture of a time sequence of pictures such as video. The method includes selecting the mode for coding the block, one of the modes being to code the block as skipped. The method further includes limiting the number of consecutive times a particular block is coded as skipped without re-setting the quantization level to a relatively fine level of quantization and re-selecting the mode. | 05-14-2009 |
20090207233 | METHOD AND SYSTEM FOR VIDEOCONFERENCE CONFIGURATION - Systems and methods for providing camera configuration for points in a multi-point videoconference system are provided. First configuration information is determined for a first point of a multi-point videoconferencing system. Second configuration information is determined for a second point of the multi-point videoconferencing system. One or more first cameras at the first point or one or more second cameras at the second point of the multi-point videoconferencing system are reconfigured based on the first configuration information or the second configuration information. | 08-20-2009 |
20090244257 | VIRTUAL ROUND-TABLE VIDEOCONFERENCE - A system and method for creating a virtual round table videoconference is described. An embodiment of the system comprises a plurality of displays arranged in an arc configuration with a table to create a virtual round table. Cameras are arranged around the plurality of displays such that when a participant looks at a display with an image of a remote participant, the camera associated with the display captures an image of the participant's gaze, making eye contact with the camera. The image is displayed at the remote participant's endpoint creating the effect of eye contact between the participants. In another embodiment, audio speakers are arranged to provide directional sound such that the video source for a display and the audio source for the associated speaker are from the same endpoint. | 10-01-2009 |
20090251528 | Video Switching Without Instantaneous Decoder Refresh-Frames - A system and method for reducing blurred video caused by intra-coded IDR-frames sent in response to when a destination endpoint in a multipoint videoconference switches to a new video source. An embodiment according to the invention comprises using inter-coded temporal predictive referencing a long term reference frame (LTRF) instead of IDR-frames in a multipoint videoconference system. | 10-08-2009 |
20090256901 | Pop-Up PIP for People Not in Picture - A system and method for alerting participants in a videoconference that one or more participants are improperly framed by the videoconference camera is provided. An embodiment comprises a temporary self-view picture-in-picture image appearing when the number of faces detected by the videoconference camera changes. A face detection algorithm is used to determine when the number of faces being detected by the videoconference camera has changed. The self-view picture-in-picture image displays, for a duration of time, a representation of the image being captured by the videoconference camera, allowing participants who are not properly framed by the videoconference camera to adjust their position to that their faces are captured by the videoconference camera. | 10-15-2009 |
20090273661 | METHOD OF LIGHTING - In one embodiment, a method of lighting is disclosed which includes apportioning a video display screen into a content portion and at least one light portion. The content portion displays video content and the light portion provides auxiliary lighting when energized to illuminate a subject adjacent the video display screen. | 11-05-2009 |
20090324023 | Combined Face Detection and Background Registration - Techniques are provided to analyze video frames of a video signal in order to distinguish regions containing a face (and body torso) from regions that contain a relatively static background. The region containing the face is referred to as a foreground region. A current video frame is divided into a plurality of elements and the foreground regions and background regions are detected. The background regions of a subsequent video frame are detected/registered using the foreground regions of the current video frame. The foreground regions of the subsequent video frame are determined using the background regions of the current video frame as a temporal reference. | 12-31-2009 |
20100002006 | Modal Multiview Display Layout - A system and method for providing a plurality of viewing angles and images on a display. An embodiment comprises a display system where a user has the option of determining the number of images to view and the range of viewable angles for each image. A display system is configured to display a maximum number of images at different viewing angles by interlacing a plurality of images so that each viewing angle shows a selected image. The display system provides a method by which an operator can increase the viewing area of an image by interlacing the same image to more than one viewing angle. | 01-07-2010 |
20100061225 | NETWORK-ADAPTIVE PREEMPTIVE REPAIR IN REAL-TIME VIDEO - A network-adaptive error recovery method for real-time video transmission based on sending repair frames preemptively with a frequency that is based on observed run-length of good frames and round trip time. | 03-11-2010 |
20100123770 | MULTIPLE VIDEO CAMERA PROCESSING FOR TELECONFERENCING - A method, an apparatus, and a storage medium with executable code to execute a method including accepting camera views of at least some participants of a teleconference, each view from a corresponding video camera, with the camera views together including at least one view of each participant. The method includes accepting audio from a plurality of microphones, and processing the audio from the plurality of microphones to generate audio data and direction information indicative of the direction of sound received at the microphones. The method further includes generating one or more candidate people views, with each people view being of an area enclosing a head and shoulders view of at least one participant. The method also includes making a selection, according to the direction information, of which at least one of the candidate people views are to be transmitted to one or more remote endpoints. | 05-20-2010 |
20100125768 | ERROR RESILIENCE IN VIDEO COMMUNICATION BY RETRANSMISSION OF PACKETS OF DESIGNATED REFERENCE FRAMES - Techniques are provided for video communication between multiple devices. Each of a plurality of video packets is designated as being part of a required reference frame that is subsequently to be used for a repair process. A stream of video packets that includes the packets for the required reference frame is transmitted from a source device over a communication medium for reception by a plurality of destination devices. A determination is made that at least one of the plurality of destination devices did not receive at least one packet of the required reference frame, and the at least one packet is retransmitted the at least one of the plurality of destination devices. When the retransmitted packet is received at the at least one destination device, it is decoded and stored without using it for generating a picture for display at the time that the at least one packet is received. | 05-20-2010 |
20100208078 | HORIZONTAL GAZE ESTIMATION FOR VIDEO CONFERENCING - Techniques are provided to determine the horizontal gaze of a person from a video signal generated from viewing the person with at least one video camera. From the video signal, a head region of the person is detected and tracked. The dimensions and location of a sub-region within the head region is also detected and tracked from the video signal. An estimate of the horizontal gaze of the person is computed from a relative position of the sub-region within the head region. | 08-19-2010 |
20100239000 | Camera Coupled Reference Frame - Techniques are provided for managing long-term reference frames (LTRFs) for two or more video sources. A first video source is selected from a plurality of video sources. The first video source is encoded to produce an encoded video stream, where a reference frame message identifies a recent video frame as long-term reference frame (LTRF) associated with the first video stream. The process is repeated for other video streams. The LTRF associated with the first video stream is used as a reference for temporal predictive coding upon receiving a signal that the first video source has been re-selected. | 09-23-2010 |
20100245535 | COMBINING VIEWS OF A PLURALITY OF CAMERAS FOR A VIDEO CONFERENCING ENDPOINT WITH A DISPLAY WALL - A telepresence apparatus, a method of operating a telepresence apparatus, and a tangible computer readable storage medium in a telepresence apparatus that is configured with instructions that when executed cause operating of the telepresence apparatus. The telepresence apparatus includes video cameras districted co-planar with a display wall and capturing camera views of a scene. The camera views from the video processor combined to form a video signal for transmission to one or more remote endpoints, the video signal corresponding to a synthetic view from a point that is substantially behind the wall and as if the wall was not there. | 09-30-2010 |
20100246680 | REFERENCE PICTURE PREDICTION FOR VIDEO CODING - A video coder includes a forward coder and a reconstruction module determining a motion compensated predicted picture from one or more previously decoded pictures in a multi-picture store. The reconstruction module includes a reference picture predictor that uses only previously decoded pictures to determine one or more predicted reference pictures. The predicted reference picture(s) are used for motion compensated prediction. The reference picture predictor may include optical flow analysis that uses a current decoded picture and that may use one or more previously decoded pictures together with affine motion analysis and image warping to determine at least a portion of at least one of the reference pictures. | 09-30-2010 |
20100302446 | Video Superposition for Continuous Presence - Techniques are described herein for combining video frames of two or more real-time video streams into combined video frames of a combined real-time video stream for continuous presence. Video frames of at least two real-time video streams are combined into combined video frames of a combined video stream. The combined video stream is supplied to a video display for displaying the combined video stream. Each video stream includes video frames with subject and background images. The subject images of corresponding video frames of the first and second video streams are combined into a combined video frame of a combined video stream such that the subject image of the first video stream is positioned in an anterior portion of the combined frame and the subject image of the second video stream is positioned in a posterior portion of the combined frame. | 12-02-2010 |
20110228096 | SYSTEM AND METHOD FOR ENHANCING VIDEO IMAGES IN A CONFERENCING ENVIRONMENT - A method is provided in one example and includes receiving image data for a field of view associated with a display. The image data is used to generate a plurality of red green blue (RGB) frames. The method also includes emitting infrared energy onto the field of view in order to generate a plurality of infrared frames, the plurality of RGB frames and the plurality of infrared frames are generated by a single camera. The plurality of RGB frames can be combined with the plurality of infrared frames in order to generate a video data stream. In a more particular embodiment, the emitting of the infrared energy is synchronized with the camera such that the infrared energy is emitted onto the field of view at one half of an existing frame rate of the camera. | 09-22-2011 |
20110279630 | SYSTEM AND METHOD FOR PROVIDING RETRACTING OPTICS IN A VIDEO CONFERENCING ENVIRONMENT - An apparatus is provided in one example and includes a camera configured to receive image data associated with an end user involved in a video session. The apparatus also includes a display and an optics element configured to interface with the camera. The optics element reflects the image data associated with the end user positioned in front of the display. A retracting mechanism is also provided and is configured to retract the optics element in a direction such that the camera moves to an inactive state and the optics element is removed from a view of the display from the perspective of the end user. An effective optical distance from the camera to the end user is increased by manipulating a position of the optics element. In more detailed embodiments, the camera can be configured above the display such that its lens points downward toward the optics element. | 11-17-2011 |
20110285825 | Implementing Selective Image Enhancement - A method that includes capturing depth information associated with a first field of view of a depth camera. The depth information is represented by a first plurality of depth pixels. The method also includes capturing color information associated with a second field of view of a video camera that substantially overlaps with the first field of view of the depth camera. The color information is represented by a second plurality of color pixels. The method further includes enhancing color information represented by at least one color pixel of the second plurality of color pixels to generate an enhanced image. The enhanced image adjusts an exposure characteristic of the color information captured by the video camera. The at least one color pixel is enhanced based on depth information represented by at least one corresponding depth pixel of the first plurality of depth pixels. | 11-24-2011 |
20120092443 | Network Synchronization Video for Composite Video Streams - Techniques are provided for upstream video sources to be synchronized in vertical sync time and in frame rate, so that a downstream device can create a composite image with low latency. At a video compositor device, a plurality of video streams are received that comprise at least first and second video streams. First and second vertical synchronization points associated with the first and second video streams points are determined. A difference in time between the first and second vertical synchronization points is determined. At least one control signal or message is generated that is configured to change a video capture frame rate associated with one or both of the first and second video streams to reduce the difference in time and the control message is sent to video capture devices for one or both of the first and second video streams. Techniques are also provided for upstream video sources, e.g., cameras, to receive the control message and respond accordingly. | 04-19-2012 |
20120120270 | SYSTEM AND METHOD FOR PROVIDING ENHANCED AUDIO IN A VIDEO ENVIRONMENT - A method is provided in one example and includes receiving audio data at a microphone array that includes a plurality of microphones. The microphone array is provisioned at a first endpoint, which includes a camera element configured to capture video data associated with a video session involving the first endpoint and a second endpoint. The method also includes formatting the audio data into a time division multiplex (TDM) stream, and communicating the stream to a port for a subsequent communication over a network and to the second endpoint. | 05-17-2012 |
20120127259 | SYSTEM AND METHOD FOR PROVIDING ENHANCED VIDEO PROCESSING IN A NETWORK ENVIRONMENT - A method is provided in one example and includes receiving a video input from a camera element; using change detection statistics to identify background image data; using the background image data as a temporal reference to determine foreground image data of a particular video frame within the video input; using a selected foreground image for a background registration of a subsequent video frame; and providing at least a portion of the subsequent video frame to a next destination. | 05-24-2012 |
20120328202 | METHOD AND APPARATUS FOR ENROLLING A USER IN A TELEPRESENCE SYSTEM USING A FACE-RECOGNITION-BASED IDENTIFICATION SYSTEM - In one embodiment, a method includes obtaining a first image of a party that is stored in a first structure in response to an instruction to enroll the user in a system, and using information associated with the first image to identify a second image stored in a second structure. The second image has a relatively high likelihood of depicting the party. Finally, the method includes enrolling the party in the system, wherein enrolling the party in the system includes associating the second image with the party. | 12-27-2012 |
20130044893 | System and method for muting audio associated with a source - In one embodiment, a method includes receiving audio at a plurality of microphones, identifying a sound source to be muted, processing the audio to remove sound received from the sound source at each of the microphones, and transmitting the processed audio. An apparatus is also disclosed. | 02-21-2013 |
20130136185 | REFERENCE PICTURE PREDICTION FOR VIDEO CODING - A video coder includes a forward coder and a reconstruction module determining a motion compensated predicted picture from one or more previously decoded pictures in a multi-picture store. The reconstruction module includes a reference picture predictor that uses only previously decoded pictures to determine one or more predicted reference pictures. The predicted reference picture(s) are used for motion compensated prediction. The reference picture predictor may include optical flow analysis that uses a current decoded picture and that may use one or more previously decoded pictures together with affine motion analysis and image warping to determine at least a portion of at least one of the reference pictures. | 05-30-2013 |
20130335508 | Adaptive Switching of Views for a Video Conference that Involves a Presentation Apparatus - Techniques are provided for dynamically adapting the view from a conference endpoint that includes a presentation apparatus, such as a whiteboard. A first signal is received that includes a video signal derived from a video camera that is viewing a room during a conference session in which a person is presenting information on a presentation apparatus. During the video conference, switching is performed between the first signal and a second signal representing content being displayed on the presentation apparatus during the conference session for output and transmission to other conference endpoints of the conference session. The determination as to whether to supply the first signal (for a normal view of the conference room) or the second signal may be based on a position determination of the presenter or may be instead be based on an external view selection command received from another conference endpoint participating in the conference session. | 12-19-2013 |
20140063177 | Generating and Rendering Synthesized Views with Multiple Video Streams in Telepresence Video Conference Sessions - Techniques are provided for establishing a videoconference session between participants at different endpoints, where each endpoint includes at least one computing device and one or more displays. A plurality of video streams is received at an endpoint, and each video stream is classified as at least one of a people view and a data view. The classified views are analyzed to determine one or more regions of interest for each of the classified views, where at least one region of interest has a size smaller than a size of the classified view. Synthesized views of at least some of the video streams are generated, wherein the synthesized views include at least one view including a region of interest, and views including the synthesized views are rendered at one or more displays of an endpoint device. | 03-06-2014 |
20140085398 | REAL-TIME AUTOMATIC SCENE RELIGHTING IN VIDEO CONFERENCE SESSIONS - Video frames are captured at one or more cameras during a video conference session, where each video frame includes a digital image with a plurality of pixels. Depth values associated with each pixel are determined in at least one video frame, where each depth value represents a distance of a portion of the digital image represented by at least one corresponding pixel from the one or more cameras that capture the at least one video frame. Luminance values of pixels are adjusted within captured video frames based upon the depth values determined for the pixels so as to achieve relighting of the video frames as the video frames are displayed during the video conference session. | 03-27-2014 |
20140085404 | Transition Control in a Videoconference - A method for transition control in a videoconference comprises receiving a plurality of video streams from a plurality of cameras, displaying a first video stream of the plurality of video streams, detecting a stream selection event for display of a second video stream of the plurality of video streams, determining a transition category for a transition from the first video stream to the second video stream, and selecting a display transition based on the transition category for displaying the transition from the first video stream to the second video stream. | 03-27-2014 |
20140112466 | System and Method for Clock Synchronization of Acoustic Echo Canceller (AEC) with Different Sampling Clocks for Speakers and Microphones - Clock synchronization for an acoustic echo canceller (AEC) with a speaker and a microphone connected over a digital link may be provided. A clock difference may be estimated by analyzing the speaker signal and the microphone signal in the digital domain. The clock synchronization may be combined in both hardware and software. This synchronization may be performed in two stages, first with coarse synchronization in hardware, then fine synchronization in software with, for example, a re-sampler. | 04-24-2014 |
20150042748 | Generating and Rendering Synthesized Views with Multiple Video Streams in Telepresence Video Conference Sessions - Techniques are provided for establishing a videoconference session between participants at different endpoints, where each endpoint includes at least one computing device and one or more displays. A plurality of video streams is received at an endpoint, and each video stream is classified as at least one of a people view and a data view. The classified views are analyzed to determine one or more regions of interest for each of the classified views, where at least one region of interest has a size smaller than a size of the classified view. Synthesized views of at least some of the video streams are generated, wherein the synthesized views include at least one view including a region of interest, and views including the synthesized views are rendered at one or more displays of an endpoint device. | 02-12-2015 |