Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees


By masking or reconfiguration

Subclass of:

714 - Error detection/correction and fault detection/recovery

714100000 - DATA PROCESSING SYSTEM ERROR OR FAULT HANDLING

714001000 - Reliability and availability

714002000 - Fault recovery

Patent class list (only not empty are listed)

Deeper subclasses:

Class / Patent application numberDescriptionNumber of patent applications / Date published
714005000 Of memory or peripheral subsystem 375
714004000 Of network 209
714010000 Of processor 109
714014000 Of power supply 41
Entries
DocumentTitleDate
20100083030REPAIRING HIGH-SPEED SERIAL LINKS - A method and system for repairing high speed serial links is provided. The system includes a first electronic components, connected to at least a second electronic component via at least one link. At least one of the first or second electronic components has a link controller. The link controller is configured to repair serial links by detecting a link error and mapping out individual lanes of a link where the link error is detected. The link controller resumes operation, i.e., transmission of data and continues to monitor the lanes for errors. If and when additional link errors occur, the link controller identifies the lanes in which the link error occurs and deactivates those lanes. The deactivated lane(s) can not be used in further transmissions which, in turn, reduces the occurrence of intermittent link errors.04-01-2010
20090070622Multi nodal Computer System and Method for Handling Check Stops in the Multi nodal Computer System - A new multi nodal computer system comprising a number of nodes on which chips of different types reside. The new multi nodal computer system is characterized in that there is one clock chip per node, each clock chip controlling only the chips residing on that node said chips being appropriate for sending a check stop request to the associated clock chip in case of a malfunction. A new check stop handling method is characterized in that depending on the source of the check stop request the clock chip that received the check stop request initiates a system check stop, a node check up, or a chip check stop.03-12-2009
20130086412CIPHER-CONTROLLING METHOD, NETWORK SYSTEM AND TERMINAL FOR SUPPORTING THE SAME, AND METHOD OF OPERATING TERMINAL - Disclosed are a cipher control method which supports to maintain a cipher mode between a network system and a terminal. The method of controlling an encryption includes: attempting a connection for operating a communication channel between a terminal and a network system; providing cipher information about a cipher algorithm operation of the terminal to the network system; determining whether the terminal is a problematic terminal operating an abnormal cipher algorithm by the networking system; and when the terminal is determined to be operating abnormal, instructing the terminal to perform a communication channel operation based on a normally operable cipher algorithm by the network system.04-04-2013
20120246509GLOBAL DETECTION OF RESOURCE LEAKS IN A MULTI-NODE COMPUTER SYSTEM - A process is disclosed for identifying and recovering from resource leaks on compute nodes of a parallel computing system. A resource monitor stores information about system resources available on a compute node in a clean state. After the compute node runs a job, the resource monitor compares the current resource availability to the clean state. If a resource leak is found, the resource monitor contacts a global resource manger to remove the resource leak.09-27-2012
20130086411HARDWARE CONSUMPTION ARCHITECTURE - Various exemplary embodiments relate to a method and related network node including one or more of the following: identifying a hardware failure of a failed component of a plurality of hardware components; determining a set of agent devices currently configured to utilize the failed component; reconfiguring an agent device to utilize a working component of the plurality of hardware components. Various exemplary embodiments additionally or alternatively relate to a method and related network node including one or more of the following: projecting a failure date for the hardware module; determining whether the projected failure date is acceptable based on a target replacement date for the hardware module; if the projected failure date is not acceptable: selecting a parameter adjustment for a hardware component, wherein the parameter adjustment is selected to move the projected failure date closer to the target replacement date, and applying the parameter adjustment to the hardware component.04-04-2013
20090144579Methods and Apparatus for Handling Errors Involving Virtual Machines - A virtual machine monitor (VMM) in a data processing system handles errors involving virtual machines (VMs) in the processing system. For instance, an error manager in the VMM may detect an uncorrectable error in involving a component associated with a first VM in the processing system. In response to detection of that error, the error manager may terminate the first VM, while allowing a second VM in the processing system to continue operating. In one embodiment, the error manager automatically determines which VM is affected by the uncorrectable error, in response to detecting the uncorrectable error. The error manager may also automatically spawn a new VM to replace the first VM, if the processing system has sufficient resources to support the new VM. Other embodiments are described and claimed.06-04-2009
20130080822PROACTIVELY REMOVING CHANNEL PATHS IN ERROR FROM A VARIABLE SCOPE OF I/O DEVICES - A method includes detecting a channel path error event on an identified channel path; recording channel path error data associated with the detected channel path error event; identifying an scope of the channel path error associated with the identified channel path; determining if the identified channel path is a defective channel path based on the scope of the channel path error; and removing the defective channel path from one or more devices.03-28-2013
20130080821PROACTIVELY REMOVING CHANNEL PATHS IN ERROR FROM A VARIABLE SCOPE OF I/O DEVICES - A channel path error correction system includes a processor with one or more channels and a switch operatively coupled to the one or more channels of the processor. The system also includes an I/O device including one or more ports, the I/O device being operatively coupled to the switch by the one or more ports; a plurality of control units. Each control unit includes at least one of the channels and at least one of the ports and a memory operable for storing information relating to detected channel path errors associated with each of the plurality of control units.03-28-2013
20100100760COMPUTER SYSTEM AND BOOT CONTROL METHOD - When a primary computer is taken over to a secondary computer in a redundancy configuration computer system where booting is performed via a storage area network (SAN), a management server delivers an information collecting/setting program to the secondary computer before the user's operating system of the secondary computer is started. This program assigns a unique ID (World Wide Name), assigned to the fibre channel port of the primary computer, to the fibre channel port of the secondary computer to allow a software image to be taken over from the primary computer to the secondary computer.04-22-2010
20090125752Systems And Methods For Managing A Redundant Management Module - Systems and methods for managing a redundant management module are provided. In this regard, a representative system, among others, includes first and second management modules that are configured to manage a computing device; and a programmable logic device that is configured to: instruct the first management module to manage the computing device responsive to detecting that the first management module is ready to manage the computing device, and instruct the second management module to manage the computing device responsive to detecting that the first management module failed to manage the computing device.05-14-2009
20090044041Redundant Data Bus System - A redundant data bus system has two data buses between which at least two failsafe control devices are connected. The two data buses operate with the same data bus protocol at essentially the same transmission frequency, and safety-related control messages are transmitted in parallel via both data buses and processed in the control devices. Each control device performs a separate control task via assigned control software. Each control device has two microcomputers which operate independently of one another and which have software for both the first and the second control tasks. When one control device fails, the control task can also be performed by the other. One data interface is arranged between the two microcomputers, via which result data calculated from the safety-related control messages can be exchanged and compared with one another. Based on such comparison a decision means determines which microcomputer or control device carries out a control task.02-12-2009
20090094478RECOVERY OF APPLICATION FAULTS IN A MIRRORED APPLICATION ENVIRONMENT - Provided are a method, system, and article of manufacture for recovery of application faults in a mirrored application environment. Application events are recorded at a primary system executing an instruction for an application. The recorded events are transferred to a buffer. The recorded events are transferred from the buffer to a secondary system, wherein the secondary system implements processes indicated in the recorded events to execute the instructions indicated in the events. An error is detected at the primary system. A determination is made of a primary order in which the events are executed by processes in the primary system. A determination is made of a modified order of the execution of the events comprising a different order of executing the events than the primary order in response to detecting the error. The secondary system processes execute the instructions indicated in the recorded events according to the modified order.04-09-2009
20130061086FAULT-TOLERANT SYSTEM, SERVER, AND FAULT-TOLERATING METHOD - To provide a fault-tolerant system requiring only one new server when the number of jobs to he processed concurrently exceeds the number of jobs processable by the current servers and requiring no standby servers. Servers 03-07-2013
20090070621BROADCAST RECEIVING DEVICE - A broadcast receiving device includes a card slot, a fan, a temperature sensor, a memory component and a control unit. The card slot accepts an IC card. The fan rotates to cool the IC card. The temperature sensor measures a first temperature. The memory component stores correlation information indicating a correlation between the first temperature and a second temperature of the IC card. The control unit acquires the second temperature based on the first temperature and the correlation information. The control unit determines if the second temperature exceeds a predetermined temperature. The control unit switches from a first output mode, in which an audio-video signal is outputted via the IC card, to a second output mode, in which the audio-video signal is outputted by bypassing the IC card, when the control unit determines that the second temperature exceeds the predetermined temperature.03-12-2009
20090031165Method for Self-Diagnosing Remote I/O Enclosures with Enhanced FRU Callouts - A method, apparatus, and computer instructions for self-diagnosing remote I/O enclosures with enhanced FRU callouts. when a failure is detected on a RIO drawer, a data processing system uses the bulk power controller to provide an alternate path, rather than using the existing RIO links, to access registers on the I/O drawers. The system logs onto the bulk power controller, which provides a communications path between the data processing system and the RIO drawer. The communications path allows the data processing system to read all of the registers on the I/O drawer. The register information in the I/O drawer is then analyzed to diagnose the I/O failure. Based on the register information, the data processing system identifies a field replacement unit to repair the I/O failure.01-29-2009
20090031164Method for Self-Diagnosing Remote I/O Enclosures with Enhanced FRU Callouts - A method, apparatus, and computer instructions for self-diagnosing remote I/O enclosures with enhanced FRU callouts. When a failure is detected on a RIO drawer, a data processing system uses the bulk power controller to provide an alternate path, rather than using the existing RIO links, to access registers on the I/O drawers. The system logs onto the bulk power controller, which provides a communications path between the data processing system and the RIO drawer. The communications path allows the data processing system to read all of the registers on the I/O drawer. The register information in the I/O drawer is then analyzed to diagnose the I/O failure. Based on the register information, the data processing system identifies a field replacement unit to repair the I/O failure.01-29-2009
20090031163SPEEDPATH REPAIR IN AN INTEGRATED CIRCUIT - A circuit comprises a first plurality of transistors of a first channel length disposed along a speedpath, the first plurality of transistors providing a first timing performance. The circuit also comprises a second plurality of transistors of a second channel length having an expected equivalent functionality as the first plurality of transistors and disposed in parallel with the first plurality of transistors along the speedpath, wherein the second channel length is different from the first channel length. In addition, the circuit comprises an element configured to selectively replace the first plurality of transistors with the second plurality of transistors in response to a determination that the first timing performance of the first plurality of transistors fails a timing requirement of the speedpath. In one embodiment, the second channel length is a sub-minimal geometry with respect to the first channel length.01-29-2009
20090235110INPUT/OUTPUT CONTROL METHOD, INFORMATION PROCESSING APPARATUS, COMPUTER READABLE RECORDING MEDIUM - An input/output control method for an information processing apparatus that is connected to an input/output device through first and second paths, monitors an input/output response to an input/output request issued to the input/output device through the first path, and performs a timeout process when the input/output response is not present within a timeout time. The input/output control method includes predicting a timeout time to the input/output request on the basis of statistic information that the information processing apparatus obtains by monitoring the input/output response, detecting an error on the first path when an input/output response to the input/output request is not present within the predicted timeout time and disconnecting the first path when the error on the first path is detected.09-17-2009
20130166942UNFUSING A FAILING PART OF AN OPERATOR GRAPH - Techniques for managing a fused processing element are described. Embodiments receive streaming data to be processed by a plurality of processing elements. Additionally, an operator graph of the plurality of processing elements is established. The operator graph defines at least one execution path and wherein at least one of the processing elements of the operator graph is configured to receive data from at least one upstream processing element and transmit data to at least one downstream processing element. Embodiments detect an error condition has been satisfied at a first one of the plurality of processing elements, wherein the first processing element contains a plurality of fused operators. At least one of the plurality of fused operators is selected for removal from the first processing element. Embodiments then remove the selected at least one fused operator from the first processing element.06-27-2013
20110035619INFORMATION PROCESSING APPARATUS AND INFORMATION PROCESSING APPARATUS CONTROL METHOD - An information processing apparatus includes an execution determination unit and a control unit. The execution determination unit determines whether a series of processes including multiple processes is executable at an execution time of the series of processes. The control unit selectively provides at least one recovery device for substituting for the series of processes when it is determined that the series of processes is not executable.02-10-2011
20110035618AUTOMATED TRANSITION TO A RECOVERY KERNEL VIA FIRMWARE-ASSISTED-DUMP FLOWS PROVIDING AUTOMATED OPERATING SYSTEM DIAGNOSIS AND REPAIR - A method (and structure) of operating an operating system (OS) on a computer. When a failure of the OS is detected, the computer automatically performs a diagnosis of the OS failure. The computer also attempts to automatically repair/recover the failed OS, based on the diagnosis, without requiring a reboot.02-10-2011
20100325471HIGH AVAILABILITY SUPPORT FOR VIRTUAL MACHINES - A computer implemented method, a tangible computer storage medium, and a data processing system provide high availability support for virtual machines in a logical partitioned platform. A monitoring system detect a failure in the virtual machine. Partition management firmware then restarts the virtual machine in a consistency failover image node utilizing a consistency failover image. If a subsequent failure of the virtual machine is detected within a predetermined time, partition management firmware restarts the virtual machine in a boot failover image node utilizing a boot failover image.12-23-2010
20090100288FAST SOFTWARE FAULT DETECTION AND NOTIFICATION TO A BACKUP UNIT - A method and system for quickly informing a backup unit that a primary unit has failed. Normally an exception handler is activated when a software failure occurs and network controller chips or the ASIC interface to a signal bus can operate even though there is a software failure. A software failure notification packet is programmed and stored in a location that is not affected by a software system failure. When a software failure occurs, control is shifted to the exception handler. The exception handler sends a pre-established and pre-addressed packet to the network controller card which transmits this packet to the backup unit. Upon receipt of the packet, the backup unit goes into operation. In some alternate embodiments that include multiple line cards in a single unit, the exception handler sends a signal to a backup unit via a signal bus or a data bus.04-16-2009
20110296230MAINTAINING A COMMUNICATION PATH FROM A HOST TO A STORAGE SUBSYSTEM IN A NETWORK - Provided are a method, system, computer storage device, and storage area network for maintaining a communication path from a host to a storage subsystem in a network. A storage subsystem controls data transfer and access to a storage devices in a network, wherein the storage subsystem is coupled to a switch and the switch is coupled to a host in the network. A topological storage is coupled to the host, the switch and the storage subsystem, for storing a topological coupling relationship between the host and the switch and a topological coupling relationship between the switch and the storage subsystem. In response to determining a failed path between the storage subsystem and the switch coupled to the storage subsystem, the storage subsystem determines a first port on the storage subsystem in the failed path. The storage subsystem determines from the topology storage the topological coupling relationship between the host and the switch and the topological coupling relationship between the switch and the storage subsystem. The storage subsystem redirects, based on the topological coupling relationships, a message sent to the first port of the storage subsystem to an operational second port in the storage subsystem coupled to the switch.12-01-2011
20090276656STORAGE DEVICE AND RECOVERY METHOD - A storage device including a plurality of storage units for storing data dispersively among the storage units, includes: a processor for controlling boot-up of the storage units; and a memory for storing operation history indicative of the sequence of any failure causing any of the storage units to become inoperative, the processor controlling reboot-up of the storage units, when a plurality of the storage units becomes inoperative on account of a plurality of failures, in accordance with process including: determining the order of the reboot up of the storage units that is reversal of the sequence of the failures causing the storage units to become inoperative in reference to the operation history in the memory; rebooting the inoperative storage units successively in accordance with the determined order.11-05-2009
20090158081Failover Of Blade Servers In A Data Center - Failover of blade servers in a data center including powering off a failing blade server by a system management server through a blade server management module (‘BSMM’) managing the failing blade server, the failing blade server characterized by a machine type, one or more network addresses, and one or more storage addresses, the addresses being virtual addresses; identifying, by the system management server from a pool of standby blade servers, a replacement blade server, the replacement blade server managed by a BSMM; assigning, by the system management server through the BSMM managing the replacement blade server, the one or more network addresses and the one or more storage addresses of the failing blade server to the replacement blade server, including enabling in the replacement blade server the assigned addresses; and powering on the replacement blade server by the system management server through the BSMM managing the replacement blade server.06-18-2009
20090313497Failover Enabled Telemetry Systems - The present invention discloses several techniques for providing failover in telemetry systems. The invention allows the continuous and uninterrupted connection between gathering units and a central data collection server, thereby ensuring the proper operation of telemetry systems.12-17-2009
20090150715DELIVERY OF STREAMS TO REPAIR ERRORED MEDIA STREAMS IN PERIODS OF INSUFFICIENT RESOURCES - In one embodiment, a method includes ingesting a program stream from a program source on a first channel. The method also includes storing the program stream, and receiving notification from a client of unrecoverable error in a stream received at the client. The unrecoverable error corresponds to at least a portion of the stored program stream. The method also includes distributing the corresponding portion of the stored program stream to the client on a second channel in response to the notification.06-11-2009
20100251005MULTIPROCESSOR SYSTEM AND FAILURE RECOVERING SYSTEM - A multiprocessor system includes a plurality of nodes, each of which includes a plurality of processors, a plurality of memories, and first and second node controllers. Unique identifiers are assigned to all the components. Each of the first and second node controllers includes: each of first and second request control sections configured to determine the identifier of a transmission destination of a request based on a memory address of an access destination of the request; each of first and second registers configured to hold in the first request control section, the identifier of the transmission destination of the request; a first routing table configured to specify one of the first request control section and the second request control section as an output destination of the request based on the identifier held by the first register, the identifier held by the second register, the identifier of the transmission destination of the request, when receiving the request, and a second routing table configured to specify a signal line for the identifier of the transmission destination of the request based on the identifier of the transmission destination which is determined by the first request control section or the second request control section, to transmit the request.09-30-2010
20100251004VIRTUAL MACHINE SNAPSHOTTING AND DAMAGE CONTAINMENT - Some embodiments provide a system that manages the execution of a virtual machine. During operation, the system takes a series of snapshots of the virtual machine during execution of the virtual machine. If an abnormal operation of the virtual machine is detected, the system spawns a set of snapshot instances from one of the series of snapshots, wherein each of the snapshot instances is executed with one of a set of limitations. Next, the system determines a source of the abnormal operation using a snapshot instance from the snapshot instances that does not exhibit the abnormal operation. Finally, the system updates a state of the virtual machine using the snapshot instance.09-30-2010
20100095147RECONFIGURABLE CIRCUIT WITH REDUNDANT RECONFIGURABLE CLUSTER(S) - Reconfigurable circuits, methods, and systems with reconfigurable interconnect devices, clusters of reconfigurable logic devices, and a programming interface configured to receive configuration data to configure a first combination of the reconfigurable interconnect and logic devices to implement a circuit, and to remap a portion of the received configuration data, corresponding to a defective cluster, from the defective cluster to another non-defective cluster of the plurality of clusters to configure a second combination of the reconfigurable interconnect and logic devices to implement the circuit.04-15-2010
20100125747Hardware Recovery Responsive to Concurrent Maintenance - Disclosed is a computer implemented method, data processing system, and apparatus to respond to detection of a hardware interface error on a system bus, for example, during a concurrent maintenance operation. The service processor may receive an error on the system bus. The error identifies at least one field replaceable unit and may inhibit the suppression of clock signal to the field replaceable unit. The service processor adds an identifier of the field replaceable unit to an eligible Field Replaceable Unit (FRU) list. The service processor recursively adds at least one field replaceable unit that the field replaceable unit depends upon. The service processor suppresses the clock signal to the field replaceable unit. The service processor inhibits tagging the field replaceable unit as unusable for next initial program load.05-20-2010
20100083031METHOD FOR QUEUING MESSAGE AND PROGRAM RECORDING MEDIUM THEREOF - According to an aspect of the embodiment, a message queuing unit of the message processing apparatus stores received messages. A message reception control unit receives a notification of destinations of messages, extracts only the messages for current processes based on a process control table recording current or standby of processes, and transmits the messages to corresponding applications as current processes. On the other hand, the message reception control unit does not transmit the messages to the applications as standby processes.04-01-2010
20090083573Method for detecting sources of faults or defective measuring sensors by positive case modeling and partial suppression of equations - A method establishes a global system model equation including model equations, which contain parameters, of individual components that form the global system. According to said method, the parameters of the individual components are detected using sensor values from the sensors that are allocated to the individual components and it is determined whether it is determined whether it is possible to adapt the parameters to the sensor values and to solve the global system model equation.03-26-2009
20080244306STORAGE SYSTEM AND MANAGEMENT METHOD FOR THE SAME - In a storage system performing remote copy, when a failure occurs in a storage apparatus, optimum redundancy configuration is reestablished promptly. In the storage system performing remote copy, when a storage apparatus detects a failure in its disk drive, a storage apparatus capable of providing a logical unit that can be a replacement for the logical unit affected by the failure in the disk drive is searched for based on storage apparatus performance, and a redundancy configuration is reestablished using a new logical unit the found storage apparatus provides.10-02-2008
20080288810METHOD AND SYSTEM FOR MANAGING RESOURCES DURING SYSTEM INITIALIZATION AND STARTUP - A method for managing a system's computer resources, includes: detecting an error condition in a computer resource; labeling the computer resource as not usable based on the error condition detected; reconfiguring the remaining computer resources to compensate for the detected error condition based on a failure mode policy; and wherein the failure mode policy manages the computer resources by one of: maximizing the amount of the remaining computer resources (mode 1), and maximizing the speed of the remaining computer resources (mode 2).11-20-2008
20080235532REDUCING OVERPOLLING OF DATA IN A DATA PROCESSING SYSTEM - A computer implemented method, apparatus, and computer usable program code for reducing overpolled data in a data processing system is provided. A controller identifies a set of redundant measurements in a cycle. The controller then identifies a number of measurements repeated in the set of redundant measurements. The controller the computes a percentage of redundant polls based on the number of measurements repeated in the set of redundant measurements. The controller then computes a new polling period by reducing an original polling period by the percentage of redundant polls.09-25-2008
20080229141Debugging method - The invention provides a debugging method applicable for an embedded system. The system includes a processor, a main memory and a debugging interface. A debugging program is first provided in the main memory. A debugging interruption is subsequently triggered to cause the processor to read the debugging program from the main memory and execute the debugging program. After execution, an execution result of the debugging program is stored into the main memory. The execution result is read and output via the debugging interface for further analysis. Because the architecture does not require a scan chain of ITR 09-18-2008
20090193287Memory management method, medium, and apparatus based on access time in multi-core system - A memory management method and apparatus based on an access time in a multi-core system. In the memory management method of the multi-core system, it is easy to estimate the execution time of a task to be performed by a processing core and it is possible to secure the same memory access time when a task is migrated between processing cores by setting a memory allocation order according to distances from the processing cores to the memories in correspondence with the processing cores, translating a logical address to be processed by one of the processing cores according to the set memory allocation order into a physical address of one of the memories, and allocating a memory corresponding to the translated physical address to the processing core.07-30-2009
20100268983Computer System and Method of Control thereof - A computer system is described having a plurality of hardware resources, a plurality of virtual partitions having allocated thereto some of those of hardware resources or parts thereof, said virtual partitions having an operating system loaded thereon, a partition monitoring application layer, which is capable of determining whether one or more of the partitions has failed, wherein said partition monitoring application layer also includes at least one hardware resource diagnostic function which is capable of interrogating at least one of the hardware resources allocated to a partition after failure of said partition, and a hardware resource reallocation function which is triggered when the hardware diagnostic function determines that one or more particular hardware resources associated with a failed partition is healthy, and which reallocates that healthy resource to an alternate healthy partition. A method of reallocating such hardware resources is also disclosed.10-21-2010
20080209257Modeller For A High-Security System - A modeller for a system for determining a residual error probability. The modeller includes a component modeller, which is adapted to receive an error probability and to model a change of the error probability due to a behaviour of a system component, in order to output a changed error probability as residual error probability.08-28-2008
20100005335MICROPROCESSOR INTERFACE WITH DYNAMIC SEGMENT SPARING AND REPAIR - A processing device, system, method, and design structure for providing a microprocessor interface with dynamic segment sparing and repair. The processing device includes drive-side switching logic including driver multiplexers to select driver data for transmitting on link segments of a bus, and receive-side switching logic including receiver multiplexers to select received data from the link segments of the bus. The bus includes multiple data link segments, a clock link segment, and at least two spare link segments selectable by the drive-side switching logic and the receive-side switching logic to replace one or more of the data link segments and the clock link segment.01-07-2010
20120272090SYSTEM RECOVERY USING EXTERNAL COMMUNICATION DEVICE - A method for computer system recovery is presented. In one embodiment, the method includes establishing a connection, via an interface, to a computer system to support the system recovery of the computer system. The method includes executing an emulation application as a recovery agent. The method includes retrieving, based on identifiers associated with the computer system, remote data via another interface. The method further includes performing the system recovery by using at least a part of the remote data.10-25-2012
20100146325Systems and methods for correcting software errors - Systems and methods consistent with the invention may include receiving an indication that a software error was detected during operation of the application program, generating an error message based on the software error, the error message including an error signature, comparing the error signature with information stored in a patch library database to identify a corresponding correction patch, and correcting, when the corresponding correction patch is identified, the software error by applying the corresponding correction patch.06-10-2010
20090049331Error propagation control within integrated circuits - A method of selecting where error detection circuits should be placed within an integrated circuit uses simulation of a reference and test design with errors injected into the test design and then fan out analysis performed upon those injected errors to identify error propagation characteristics. Thus, registers at which propagated errors are highly likely to manifest themselves or which protect key architectural state, or which protect state not otherwise protected can be identified and so an efficient deployment of error detection mechanisms achieved. Within an integrated circuit output signals from inactive circuit elements may be subject to isolation gating in dependence upon a detected current state of the integrated circuit. Thus, inactive circuit elements in which soft errors occur have inappropriate output signals gated from reaching the rest of the integrated circuit and thus reducing erroneous operation.02-19-2009
20090138750REDUNDANT 3-WIRE COMMUNICATION SYSTEM AND METHOD - A redundant communication system and method for providing data communication between a first computing node and a second computing node. A transmitter is provided as part of the first computing node. A receiver is provided as part of the second computing node. A first signal line carries a first data signal. The first signal line electrically couples the transmitter with the receiver. A second signal line carries a second data signal redundant to the first signal. The second signal line electrically couples the transmitter with the receiver. The receiver evaluates the first data signal to determine the presence of an error and the second node uses the second data signal if an error is detected in the first data signal.05-28-2009
20090049329REDUCING LIKELIHOOD OF DATA LOSS DURING FAILOVERS IN HIGH-AVAILABILITY SYSTEMS - A method, system, and computer program product for reducing likelihood of data loss during performance of failovers in a high-availability system comprising a primary system and a standby system are provided. The method, system, and computer program product provide for defining a halt duration, periodically determining a halt end time, halting data modifications at the primary system responsive to failure of data replication to the standby system, resuming data modifications at the primary system responsive to a last determined halt end time being reached or data replication to the standby system resuming, and responsive to the primary system failing prior to a previously determined halt end time, determining that a failover to the standby system will not result in data loss on the standby system with respect to the primary system.02-19-2009
20100017643Cluster system and failover method for cluster system - Provided is a failover method for a cluster system for realizing smooth failover of the guest OS's, even when there are many guest OS's, while reducing consumption of computer resources of a server. Smooth failover is realized by preventing competition during failover even when the number of guest OS's is increased. In a cluster configuration in which a slave/master cluster program is operated in a guest OS/host OS, the master cluster program (01-21-2010
20090177912RECONFIGURABLE CIRCUIT WITH REDUNDANT RECONFIGURABLE CLUSTER(S) - A reconfigurable circuit having redundant reconfigurable clusters is described herein.07-09-2009
20090177911APPARATUS, SYSTEM, AND METHOD TO PREVENT QUEUE STALLING - An apparatus, system, and method are disclosed to prevent queue stalling. The apparatus to prevent queue stalling is provided with a plurality of modules configured to functionally execute the necessary steps of detecting a connection failure on a first logical path, wherein the first logical path is associated with a first entry in a queue, and wherein the first logical path is configured to define a communication path between an entity associated with a first entry in the queue and a queue manager, scanning the queue to identify a second entry associated with a second logical path in response to the connection failure, and advancing the second entry to a position within the queue that is ahead of the first entry. These modules in the described embodiments include a detection module, a scanning module, and an advancing module.07-09-2009
20090024867Redundant data path - Disclosed are redundant data path(s) for transmission of graphical data between components in a graphical display system. The redundant data path(s) are used to transmit graphical data by at least two independent means, so that if a failure in one data path occurs, a data transmitted via a separate data path can be used for display. The system is particularly advantageous for multiple-serial-module configurations. The redundant data path(s) minimize disruption of data display and make repair and maintenance of the display system more efficient. The invention includes apparatus for graphical display systems, and also includes methods of data transmission for graphical display systems, and methods of maintenance of graphical display systems.01-22-2009
20090049330METHOD AND SYSTEM FOR VIRTUAL REMOVAL OF PHYSICAL FIELD REPLACEABLE UNITS - A method of virtually removing field replaceable units (FRUs) from a computer system during concurrent maintenance operations. Firmware within a flexible service processor (FSP) assigns unique resource identification (RID) numbers to each FRU in the computer system. The firmware collects vital product data (VPD) for each FRU and generates a duplicate test shared library, which is stored in a memory directory corresponding to the FSP. When the firmware receives input from a graphical user interface (GUI) that includes at least a first FRU selected for virtual removal from the computer system, the firmware adds the RID number of the selected FRU to the memory directory and recollects VPD. The FSP subsequently ignores any FRUs corresponding to RID numbers stored in the memory directory during operation of the computer system.02-19-2009
20120144228Apparatus and Method to Read Information from an Information Storage Medium - A method to read information from an information storage medium using a read channel, where that read channel includes a data cache, where the method generates an analog waveform comprising the information, provides that analog waveform to a read channel generates a digital signal from that analog waveform using one or more first operating parameters, corrects that digital signal at an actual error correction rate, determines if the actual error correction rate is greater than an error correction rate threshold. If the actual error correction rate exceeds the error correction rate threshold, then the method captures the digital signal, stores that captured data in a data cache, reads that digital signal from the cache, generates one or more second operating parameters, provides those one or more second operating parameters to the read channel.06-07-2012
20110231698BLOCK BASED VSS TECHNOLOGY IN WORKLOAD MIGRATION AND DISASTER RECOVERY IN COMPUTING SYSTEM ENVIRONMENT - Methods and apparatus involve migrating workloads and disaster recovery. A snapshot is taken of a source volume using a volume shadow service. Depending whether a user seeks a migration or disaster recovery action, blocks of data read from the snapshot are transferred to a target volume in various amounts. The amounts of transfer include all of the blocks, only changed blocks between the volumes, or only blocks incrementally changed since a last transfer operation. Users make indications for transfer on a computing device storing and consuming data on the volumes and optionally do so in the context of Novell's Platespin® products. Other features contemplate kernel drivers to monitor the blocks of the volumes, as well as techniques for comparing them. Still other features involve computing systems, volume devices, such as readers, writers and filters, and computer program products, to name a few.09-22-2011
20110231696Method and System for Cluster Resource Management in a Virtualized Computing Environment - Methods and systems for cluster resource management in virtualized computing environments are described. VM spares are used to reserve (or help discover or otherwise obtain) a set of computing resources for a VM. While VM spares may be used for a variety of scenarios, particular uses of VM spares include using spares to ensure resource availability for requests to power on VMs as well as for discovering, obtaining, and defragmenting the resources and VMs on a cluster, e.g., in response to requests to reserve resources for a VM or to respond to a notification of a failure for a given VM.09-22-2011
20090083574METHOD FOR OPERATING A MANAGEMENT SYSTEM OF FUNCTION MODULES - Methods for operating a management system that manages a large number of first function modules and second function modules. An inhibitor module I sets first control statuses to designating blocking when associated events are detected by an event detecting device, and then the management system no longer makes associated first function modules available for execution. The inhibitor module I sets second control statuses to designating executable when associated events are detected by an event detecting device, and then the management system makes associated second function modules available for execution.03-26-2009
20090265577Method of managing paths for an externally-connected storage system and method of detecting a fault site - Provided is a method of controlling a computer system that includes: a computer; a first storage device connected to the computer via a first path and a second path; and a second storage device externally-connected to the first storage system via a third path and connected to the computer via a fourth path, the first storage device providing a first storage area to the computer, the second storage device including a second storage area corresponding to the first storage area, the method including: judging whether or not a fault has occurred in at least one of the first to fourth paths; selecting, a path used for access to the first or second storage area; and transmitting the access request for the first or second storage area by using the selected path. Accordingly, in the computer system, an application can be prevented from being stopped despite a fault in a path.10-22-2009
20090222686SELF MAINTAINED COMPUTER SYSTEM UTILIZING ROBOTICS - A self-maintained computer system includes a computer system having a plurality of interconnected computer components and a robot associated with the computer system that is configured to carry a spare computer component and further configured to replace a computer component of the computer system with the spare computer component. The robot automatically replaces an individual computer component when a failure of the individual computer component is detected.09-03-2009
20090132849Method and Computer Program for Selecting Circuit Repairs Using Redundant Elements with Consideration of Aging Effects - A method and computer program for selecting circuit repairs using redundant elements with consideration of aging effects provides a mechanism for raising short-term and long-term performance of memory arrays beyond present levels/yields. Available redundant elements are used as replacements for selected elements in the array. The elements for replacement are selected by BOL (beginning-of-life) testing at a selected operating point that maximizes the end-of-life (EOL) yield distribution as among a set of operating points at which post-repair yield requirements are met at beginning-of-life (BOL). The selected operating point is therefore the “best” operating point to improve yield at EOL for a desired range of operating points or maximize the EOL operating range. For a given BOL repair operating point, the yield at EOL is computed. The operating point having the best yield at EOL is selected and testing is performed at that operating point to select repairs.05-21-2009
20080313490SYSTEM AND ARTICLE OF MANUFACTURE FOR EXECUTING INITIALIZATION CODE TO CONFIGURE CONNECTED DEVICES - Provided are a system and article of manufacture for executing initialization code to configure connected devices. A plurality of segments are provided to configure at least one connected device, wherein each segment includes configuration code to configure the at least on connected device. The segments are executed according to a segment order by executing the configuration code in each segment to perform configuration operations with respect to the at least one connected device. Completion of the segment is indicated in a memory in response to completing execution of the configuration operations for the segment.12-18-2008
20080313489FLASH MEMORY-HOSTED LOCAL AND REMOTE OUT-OF-SERVICE PLATFORM MANAGEABILITY - A method, apparatus, and system are disclosed. In one embodiment, the method determines whether one or more manageability conditions are present in a computer system, and then invokes an out-of-service manageability remediation environment stored within a portion of a flash device in the computer system when one or more manageability conditions are present.12-18-2008
20100162030METHOD AND APPARATUS FOR INITIATING CORRECTIVE ACTION FOR AN ELECTRONIC TERMINAL - A method and device are provided for initiating corrective actions for a terminal, such as an ATM. A method of initiating corrective actions for a terminal comprises, monitoring a fault status of a first component, detecting a fault status of the first component with a first trigger plug-in, activating a first action plug-in based upon the detected fault status of the first component, and recycling the first component.06-24-2010
20080307249Digital mixing system with double arrangement for fail safe - A digital mixing system has a console having a display and an operator for transmitting and receiving a control signal, an engine having input channels and output channels for mixing a plurality of audio signals fed from the input channels while exchanging the control signal with the console and feeding the mixed audio signals to the output channels, and peripheral input and output units connected to the input and output channels of the engine, respectively. The console and the engine are located remotely from each other, and a cable connecting therebetween is duplicated for the purpose of fail safe. The engine may be installed in pair. If a main engine fails, a sub engine backs up instantly to continue the mixing operation. The console may be also prepared in pair for the purpose of fail safe.12-11-2008
20100185893Topology Collection Method and Dual Control Board Device For A Stacking System - The invention provides a topology collection method and dual control board device applicable to a stacking system comprising dual control board devices. A master control board of a dual control board device advertises through a stack port the topology information of the member device in which the master control board resides, including information about the master control board and, if a slave control board is present, information about the slave control board; and stores the topology information or updates the existing topology information upon receiving the topology information of the stacking system through the stack port, and backs up the stored topology information of the stacking system to the slave control board after the slave control board is inserted. This invention is applicable for collecting the topology information of a stacking system comprising distributed dual control board devices.07-22-2010
20120246508METHOD AND SYSTEM FOR CONTINUOUSLY PROVIDING A HIGH PRECISION SYSTEM CLOCK - A method is presented for continuously providing a high precision system clock associated with a processing core, wherein the system clock includes a host clock register that is incremented via a high precision oscillator, the method includes: providing a firmware clock register, incrementing the firmware clock register based on the host clock register being incremented, monitoring for failures of the host clock register, and during a failure of the host clock register continuously incrementing the firmware clock register by means of timing signals of the processing core, and upon receipt of a request to provide a clock value, providing the content of the host clock register if no failure was detected, or if failure was detected, providing the content of the firmware clock register.09-27-2012
20100251006SYSTEM AND METHOD FOR FAILOVER OF GUEST OPERATING SYSTEMS IN A VIRTUAL MACHINE ENVIRONMENT - A system and method provides for failover of guest operating systems in a virtual machine environment. During initialization of a computer executing a virtual machine operating system, a first guest operating system allocates a first memory region within a first domain and notifies a second guest operating system operating in a second domain of the allocated first memory region. Similarly, the second guest operating system allocates a second region of memory within the second domain and notifies the first operating system of the allocated second memory region. In the event of a software failure affecting one of the guest operating systems, the surviving guest operating system assumes the identity of the failed operating system and utilizes data stored within the shared memory region to replay to storage devices to render them consistent.09-30-2010
20130219211Elastic Cloud-Driven Task Execution - A method, an apparatus and an article of manufacture for cloud-driven application execution. The method includes determining a plurality of attributes of a failed application, wherein the plurality of attributes comprises at least one policy context attribute and at least one context attribute, correlating each of the plurality of attributes to at least one alternative asset, wherein the at least one alternative asset is a part of an environment on which the failed application can be executed, using the plurality of attributes correlated to the at least one alternative asset to identify an alternative asset set of alternative assets, wherein the alternative asset set is capable of enabling an alternative environment on which to execute the failed application, and provisioning the alternative assets in the alternative asset set from at least one cloud network to create the alternative environment on which the failed application is executed.08-22-2013
20090319823RUN-TIME FAULT RESOLUTION FROM DEVELOPMENT-TIME FAULT AND FAULT RESOLUTION PATH IDENTIFICATION - Embodiments of the present invention address deficiencies of the art in respect to fault handling and provide a method, system and computer program product for run-time fault resolution from development time fault and fault resolution path identification. In an embodiment of the invention, a method for run-time fault resolution from development time fault and fault resolution path identification can be provided. The method can include detecting a recoverable fault condition in a computing system, selecting a fault resolution path from amongst a multiple development time specified fault resolution paths to match the recoverable fault condition, prompting an operator with the selected fault resolution path, and resuming operation of the computing system without restart subsequent to the operator performing the selected resolution fault path.12-24-2009
20090319822APPARATUS AND METHOD TO MINIMIZE PERFORMANCE DEGRADATION DURING COMMUNICATION PATH FAILURE IN A DATA PROCESSING SYSTEM - A method to minimize performance degradation during communication path failure in a data processing system, comprising a host computer, a storage controller, and a plurality of physical communication paths in communication with the host computer and the storage controller, where the method establishes a. threshold communication path error rate, and determines an (i)th actual communication path error rate for an (i)th physical communication path, wherein that (i)th communication path is one of the plurality of physical communication paths. If the (i)th actual communication path error rate is greater than the threshold communication path error rate, the method discontinues use of the (i)th physical communication path.12-24-2009
20080301486Customization conflict detection and resolution - A computer-implemented method is disclosed for managing customization conflicts. The method includes receiving an indication of a conflict. The conflict is indicative of an error created by a customization of a core application. A customization correction is identified as a remedy for the customization conflict. The customization correction is transmitted over a network to a party affiliated with a system affected by the customization conflict.12-04-2008
20100306571MULTIPLE MEDIA ACCESS CONTROL (MAC) ADDRESSES - A method for providing multiple media access control (MAC) addresses in a device of a master/slave system may include providing a first MAC address in a MAC address storage of the device. The method may also include providing a second MAC address in a multicast table entry of a multicast hash filter of the device.12-02-2010
20110145627IMPROVING DATA AVAILABILITY DURING FAILURE DETECTION AND RECOVERY PROCESSING IN A SHARED RESOURCE SYSTEM - A system and method for managing shared resources is disclosed. The system includes a primary coherency processing unit which processes lock requests from a plurality of data processing hosts, the primary coherency processing unit also storing a first current lock state information for the plurality of data processing hosts, the first current lock state information including a plurality of locks held by the plurality of data processing hosts. The system further includes a standby coherency processing unit storing fewer locks than the primary coherency processing unit, the locks stored by the standby coherency processing unit being a subset of locks included in the first current lock state information, the standby coherency unit configured to perform a plurality of activities of the primary coherency processing unit using the subset of locks in response to a failure of the primary coherency processing unit.06-16-2011
20110041001AUTOMATIC SYSTEM FOR POWER AND DATA REDUNDANCY IN A WIRED DATA TELECOMMUNICATIONS NETORK - Redundancy of data and/or Inline Power in a wired data telecommunications network from a pair of power sourcing equipment (PSE) devices via an automatic selection device is provided by providing redundant signaling to/from each of the pair of PSE devices, and coupling a port of one PSE device and a redundant port of the second PSE device to respective first and second interfaces of a port of the selection device. The selection device initially selects one of the two PSE devices and communicates data and/or Inline Power to a third interface of the selection device. A powered device (PD) coupled to that third interface communicates data and/or Inline Power with the selected one of the first and second PSE device through the selection device. Upon detection of a condition, such as a failure condition, the selection device may select the other of the two interfaces.02-17-2011
20100131793SMALL STORE SYSTEM - System and methods of use are discloses that default routing of an ID read by an ID reader as part of a purchase transaction in a retail store, to a first computer system (MCS) instead of the POS computer system for the retail store, the first computer system processes the ID, and the POS computer system receives the results of the processing in the form of a IDs recognizable by the POS computer system and for which the POS computer system has associated costs.05-27-2010
20110154098DCAS HEADEND SYSTEM AND METHOD FOR PROCESSING ERROR OF SECURE MICRO CLIENT SOFTWARE - A Downloadable Conditional Access System (DCAS) headend system and method for processing an error of Secure Micro (SM) Client Software are provided to prevent further transmission of SM Client Software where an error occurred, and to prevent unnecessary traffic due to repeat reinstallation of SM Client Software between an Authentication Proxy (AP) server and a terminal, by changing policy information regarding a transmission of SM Client Software associated with error information, when a number of terminals that transmit the error information exceeds a reference value as a result of analyzing result information regarding a reception and an installation of the SM Client Software received from a terminal corresponding to a DCAS headend system.06-23-2011
20120303997Flexible Bus Architecture for Monitoring and Control of Battery Pack - A method for diagnosing a control system for a stacked battery. The control system comprises a plurality of processors, a plurality of controllers, and a monitoring unit (control unit). The method comprises sending a diagnostic information from the central unit to a top processor of the plurality of processors, transmitting a return information from the top processor of the plurality of processors to the central unit, comparing the diagnostic information sent from the central unit with the return information received by the central unit, and indicating a communication problem if the diagnostic information sent from the central unit is different from the return information received by the central unit. The steps are repeated by eliminating the top processor from a previous cycle and assigning a new top processor if there is no problem with the reconfigurable communication system.11-29-2012
20110078489METHOD, APPARATUS, AND SYSTEM FOR MAINTAINING STATUS OF BOOTSTRAP PEER - A method for maintaining the status of a bootstrap peer includes: selecting a bootstrap peer; obtaining the status information of the bootstrap peer; updating a local bootstrap peer list according to the status information of the bootstrap peer. An apparatus and system for maintaining the status of a bootstrap peer are also disclosed. The bootstrap peer list is updated according to the status information of the selected bootstrap peer, which ensures the validity of the bootstrap peer list on the bootstrap server so that the information in the bootstrap peer list obtained by a joining peer is valid. This improves the success rate of joining the overlay network by the joining peer, shortens the joining process time of the joining peer, and implements load balancing between the bootstrap peers.03-31-2011
20110060938COMPUTER INTERLOCKING SYSTEM AND CODE BIT LEVEL REDUNDANCY METHOD THEREFOR - A code bit level redundancy method for a computer interlocking system is provided. The method includes: (1) controlling the output in parallel, and (2) sharing the collected information.03-10-2011
20100325472Autonomous System State Tolerance Adjustment for Autonomous Management Systems - In general, the techniques of this invention are directed to determining whether a component failure in a distributed computing system is genuine. In particular, embodiments of this invention analyze monitoring data from other application nodes in a distributed computing system to determine whether the component failure is genuine. If the component failure is not genuine, the embodiments may adjust a fault tolerance parameter that caused the component failure to be perceived.12-23-2010
20090287953STORAGE SYSTEM - A storage system encrypts plain text from an external device and stores the cryptogram into a disk unit, decrypts stored data in the disk unit and transmits decrypted text to the external device. The plain and decrypted text must be in agreement when seen from the external device. If a failure occurs in the encrypting or decrypting process, the plain and decrypted text disagree. The storage system includes an encryption unit for encrypting first data, a decryption unit for decrypting the encrypted data into second data, and a comparison unit for comparing the first and second data. When the first and second data do not agree, the first data is encrypted by a different encryption unit and the encrypted data is decrypted into third data, whereupon the first and third data are compared. When the first and third data do not agree, a failure report is sent.11-19-2009
20100122111DYNAMIC PHYSICAL AND VIRTUAL MULTIPATH I/O - Embodiments that dynamically manage physical and virtual multipath I/O are contemplated. Various embodiments comprise one or more computing devices, such as one or more servers, having at least two HBAs. At least one of the HBAs may be associated with a virtual I/O server that employs the HBA to transfer data between a plurality of virtual clients and one or more storage devices of a storage area network. The embodiments may monitor the availability of the HBAs, such as monitoring the HBAs for a failure of the HBA or a device coupled to the HBA. Upon detecting the unavailability of one of the HBAs, the embodiments may switch, dynamically, from the I/O path associated with the unavailable HBA to the alternate HBA.05-13-2010
20110078488HARDWARE RESOURCE ARBITER FOR LOGICAL PARTITIONS - A computer implemented method, data processing system, and apparatus for hardware resource arbitration in a data processing environment having a plurality of logical partitions. A hypervisor receives a request for a hardware resource from a first logical partition, wherein the request corresponds to an operation. The hypervisor determines the hardware resource is free from contention by a second logical partition. The hypervisor writes the hardware resource to a hardware resource pool data structure, as associated with the first logical partition, in response to a determination the hardware resource is free. The hypervisor presents the hardware resource to the first logical partition. The hypervisor determines that the operation is complete. The hypervisor release the hardware resource from a hardware resource pool, responsive to the determination that the operation is complete.03-31-2011
20100088539System and Method for Providing Fault Tolerant Processing in an Implantable Medical Device - Embodiments herein generally relate to implantable medical devices and, specifically, to a system and method for providing fault tolerant processing in an implantable medical device. In an embodiment a system for providing fault tolerant processing in an implantable medical device is provided. The system can include an implantable medical device comprising a processor and memory store configured to execute a plurality of threads, temporal and spatial constraints assigned to one or more of the threads, and a kernel. The kernel can include a scheduler and a thread monitor configured to monitor execution of threads against the temporal and spatial constraints, and further configured to issue a response upon violation of either of the constraints by one of the plurality of threads. In an embodiment a method for providing fault tolerant processing in an implantable medical device is provided. Other embodiments are also included herein.04-08-2010
20100218032REDUNDANT SYSTEM, CONTROL APPARATUS, AND CONTROL METHOD - A redundant system includes a redundant apparatus and a control unit that controls power supplied to the redundant apparatus. The redundant apparatus includes a state management unit that manages an operational state of the redundant apparatus, and a response unit that returns the operational state to the control unit. The control unit includes a first requesting unit that requests a redundant apparatus that operates as an operation system for the operational state information, a first determination unit that determines whether the response to the request is returned within a predetermined time, a second determination unit that determines whether the operational state is normal if the response is returned within the predetermined time, and a shutdown unit that shuts down the power supply to the redundant apparatus, if the second determination unit determines that the operational state is not normal.08-26-2010
20090300405BACKUP COORDINATOR FOR DISTRIBUTED TRANSACTIONS - A primary coordinator generates a prepare message for a two-phase commit distributed transaction, the prepare message including an address of a backup coordinator. The primary coordinator maintains a transaction log of the distributed transaction, wherein the transaction log is accessible to both the primary coordinator and the backup coordinator. The prepare message is sent to a plurality of participants. The primary coordinator fails over to the backup coordinator without interrupting the distributed transaction.12-03-2009
20090300406INFORMATION PROCESSING SYSTEM AND INFORMATION PROCESSING DEVICE - An information processing system includes a plurality of server devices including a main server device and at a standby server device, and a client device coupled to said server devices via a network. The client device includes a monitor unit to asynchronously monitor an operation state of each of the plurality of server devices, and a display control unit to acquire a content from the main server device and display the content in a display area on a screen once the monitor unit detects an operation state of the main server device is active, and to acquire from the standby server device a content for a process that the standby server device has taken over from the main server device and displays the content on the screen once the monitor unit detects an operation state of the standby server device is switched from standby state to active state.12-03-2009
20090300404Managing Execution Stability Of An Application Carried Out Using A Plurality Of Pluggable Processing Components - Methods, apparatus, and products are disclosed for managing execution stability of an application carried out using a plurality of pluggable processing components. Managing execution stability of an application includes: receiving, by an application manager, component stability metrics for a particular pluggable processing component; determining, by the application manager, that the particular pluggable processing component is unstable in dependence upon the component stability metrics for the particular pluggable processing component; and notifying, by the application manager, a system administrator that the particular pluggable processing component is unstable.12-03-2009
20110138219HANDLING ERRORS IN A DATA PROCESSING SYSTEM - A method of managing errors in a data processing system may involve at least one computer system. Each computer system may include a processor that executes an operating system, firmware, and system memory storing instructions for the operating system. A firmware error handler resident in the firmware may identify an error occurring in the computer system. The firmware error handler may determine whether the operating system is required to take an action in response to the error. If the operating system is not required to take an action in response to the error, the firmware error handler may create an error log accessible to the operating system appropriate to cause the operating system to take no action.06-09-2011
20110154097FIELD REPLACEABLE UNIT FAILURE DETERMINATION - A system and method for fault management in a computer-based system are disclosed herein. A system includes a plurality of field replaceable units (“FRUs”) and fault management logic. The fault management logic is configured to collect error information from a plurality of components of the system. The logic stores, for each component identified as a possible cause of a detected fault, a record assigning one of two different component failure probability indications. The logic identifies a single of the plurality of FRUs that has failed based on the stored probability indications.06-23-2011
20110307734BIOLOGICALLY INSPIRED HARDWARE CELL ARCHITECTURE - Disclosed is a system comprising: a reconfigurable hardware platform; a plurality of hardware units defined as cells adapted to be programmed to provide self-organization and self-maintenance of the system by means of implementing a program expressed in a programming language defined as DNA language, where each cell is adapted to communicate with one or more other cells in the system, and where the system further comprises a converter program adapted to convert keywords from the DNA language to a binary DNA code; where the self-organisation comprises that the DNA code is transmitted to one or more of the cells, and each of the one or more cells is adapted to determine its function in the system; where if a fault occurs in a first cell and the first cell ceases to perform its function, self-maintenance is performed by that the system transmits information to the cells that the first cell has ceased to perform its function, and then the self-organisation is performed again in order to provide that a second cell undertakes the function of the first cell.12-15-2011
20120042195MANAGING OPERATING SYSTEM DEPLOYMENT FAILURE - A method for managing operating system deployment failure includes, with an operating system deployment server, running an operating system deployment process that comprises running a progressive hardware discovery process of a target machine to which an operating system is deployed, the discovery process to capture inventory information related to the target machine, monitoring the operating system deployment to detect failure in a pre-operating system environment running on the target machine for a predefined period of time, and executing a remediation action in response to generation of a failure code during the period of time, the remediation action related to a Basic Input Output System (BIOS) of the target machine.02-16-2012
20120047393DYNAMICALLY REASSIGNING A CONNECTED NODE TO A BLOCK OF COMPUTE NODES FOR RE-LAUNCHING A FAILED JOB - Methods, systems, and products for dynamically reassigning a connected node to a block of compute nodes for re-launching a failed job that include: identifying that a job failed to execute on the block of compute nodes because connectivity failed between a compute node assigned as at least one of the connected nodes for the block of compute nodes and its supporting I/O node; and re-launching the job, including selecting an alternative connected node that is actively coupled for data communications with an active I/O node; and assigning the alternative connected node as the connected node for the block of compute nodes running the re-launched job.02-23-2012
20110167293NON-DISRUPTIVE I/O ADAPTER DIAGNOSTIC TESTING - A primary I/O adapter and a redundant I/O adapter of a data processing system are assigned to support access to a system resource. While the primary I/O adapter is in service and the redundant I/O adapter is not in service in providing access to the system resource, a fail over command is issued to remove the primary I/O adapter from service and place the redundant I/O adapter in service in supporting access to the system resource. While the redundant I/O adapter is in service and the primary I/O adapter is not in service in providing access to the system resource, diagnostic testing on the primary I/O adapter is performed. In response to the diagnostic testing revealing no fault in the primary I/O adapter, a fail back command is issued to restore the primary I/O adapter to service and to remove the redundant I/O adapter from service.07-07-2011
20120060047COMBINATION OF AN ELECTRIC ROTARY MACHINE AND OF AN ELECTRIC CONTROL UNIT IN AN AUTOMOBILE - A system including at least one electric rotary machine and an integrated control circuit and an electronic control unit, the system being embarked in an automobile. The integrated control circuit of the system includes a RAM connected to the electronic control unit via a data communication link, and the electronic control unit includes a rewritable memory. The system further includes a configuration data permanent storage of the system in the rewritable memory as well as an upload of the configuration data into the RAM during a configuration phase of the system. The system herein enables the integrated control circuit of the electric rotary machine to be standardized by virtue of the fact that the configuration data are no longer written in a read-only memory but reside in a RAM of this circuit.03-08-2012
20120066541CONTROLLED AUTOMATIC HEALING OF DATA-CENTER SERVICES - Subject matter described herein is directed to reallocating an application component from a faulty data-center resource to a non-faulty data-center resource. Background monitors identify data-center resources that are faulty and schedule migration of application components from the faulty data-center resources to non-faulty data-center resources. Migration is carried out in an automatic manner that allows an application to remain available. Thresholds are in place to control a rate of migration, as well as, detect when resource failure might be resulting from data-center-wide processes or from an application failure.03-15-2012
20120159232FAILURE RECOVERY METHOD FOR INFORMATION PROCESSING SERVICE AND VIRTUAL MACHINE IMAGE GENERATION APPARATUS - An information processing service is allowed to be immediately recovered from failure without clustering information apparatuses that provide an information processing service. A failure recovery method for an information processing service provided by an information apparatus, the method includes: preparing a virtual machine image generation apparatus which generates a virtual machine image, and a virtual machine execution apparatus which runs a virtual machine based on the virtual machine image; generating and storing, by the virtual machine image generation apparatus, the virtual machine image based on system data and hardware configuration information which enable implementation of the information processing service at a time of normal operation of the information processing service; and running, by the virtual machine execution apparatus, a virtual machine which provides a function of the information processing service based on the virtual machine image when a failure occurs in the information processing service.06-21-2012
20110107136Fault Surveillance and Automatic Fail-Over Processing in Broker-Based Messaging Systems and Methods - An exemplary method includes attempting, by a message broker subsystem, to deliver one or more messages intended for a recipient software application to the recipient software application during a predetermined fault interval, determining, by the message broker subsystem, that the recipient software application is in a fault state after failing to deliver the one or more messages to the recipient software application during the predetermined fault interval, and automatically performing, by the message broker subsystem, a fail-over process on one or more other messages intended for the recipient software application in response to the determination that the recipient software application is in the fault state. Corresponding methods and systems are also disclosed.05-05-2011
20090132850ERROR HANDLING SCHEME FOR TIME-CRITICAL PROCESSING ENVIRONMENTS - As a result of detecting an error, command routing logic for device driver logic is reconfigured so that command processing logic of the device driver is not invoked and to return from commands in a manner indicative of successful completion of command processing.05-21-2009
20120216070METHOD AND APPARATUS FOR REALIZING APPLICATION HIGH AVAILABILITY - A method, apparatus, and computer program product for realizing application high availability are provided. The application is installed on both a first node and a second node, the first node being used as an active node, and the second node being used as a passive node. The method includes: monitoring access operations to files by an application during its execution on the active node; replicating the monitored updates to the file by the application from the active node to a storage device accessible to the passive node if the application performs updates to a file during the access operations; sniffing the execution of the application on the active node; and switching the active node to the second node and initiating the application on the second node in response to sniffing a failure in the execution of the application on the active node.08-23-2012
20120216069Data Transfer and Recovery Process - A backup image generator can create a primary image and periodic delta images of all or part of a primary server. The images can be sent to a network attached storage device and a remote storage server. In the event of a failure of the primary server, the failure can be diagnosed to develop a recovery strategy. Based on the diagnosis, at least one delta image may be applied to a copy of the primary image to generate an updated primary image at either the network attached storage or the remote storage server. The updated primary image may be converted to a virtual server in a physical to virtual conversion at either the network attached storage device or remote storage server and users may be redirected to the virtual server. The updated primary image may also be restored to the primary server in a virtual to physical conversion. As a result, the primary data storage may be timely backed-up, recovered and restored with the possibility of providing server and business continuity in the event of a failure.08-23-2012
20100205479INFORMATION SYSTEM, DATA TRANSFER METHOD AND DATA PROTECTION METHOD - Availability of an information system including a storage system that performs remote copy between two or more storage apparatuses and a host computer using such storage system is improved. A third storage apparatus including a third volume is coupled to a first storage apparatus, a fourth storage apparatus including a fourth volume is coupled to a second storage apparatus, the first and third storage apparatuses perform remote copy of copying data stored in a first volume to the third volume, the first and second storage apparatuses perform remote copy of copying data stored in the first volume to a second volume, and the third and fourth storage apparatuses perform remote copy of copying data stored in the third volume to the fourth volume.08-12-2010
20100205478RESOURCE INTEGRITY DURING PARTIAL BACKOUT OF APPLICATION UPDATES - At least one physically inconsistent system resource is identified in response to a failure of an application, where the physically inconsistent system resource was left in a physically inconsistent state as a result of the failure of the application. Available backout operations for any system resources updated by the failed application other than the physically inconsistent system resource are ignored. An automated partial backout of the physically inconsistent system resource is performed. This abstract is not to be considered limiting, since other embodiments may deviate from the features described in this abstract.08-12-2010
20120137163MULTI-CORE SYSTEM, METHOD OF CONTROLLING MULTI-CORE SYSTEM, AND MULTIPROCESSOR - A multi-core system 05-31-2012
20100174938Method for Operating an Industrial Automation System Comprising a Plurality of Networked Computer Units, and Industrial Automation System - An automation system comprising a plurality of networked computer units, functions of the automation system are provided by services of the computer units in which the services are configured and activated using system configuration data and service configuration data. The system configuration data comprise information for assigning services to providing computer units and for assigning dependencies between services. The system configuration data are accepted and checked by a first service of a control and monitoring unit of the automation system and are forwarded to target computer units. The system configuration data are checked by second services provided by the target computer units and are used to provide resources needed to activate local services. The service configuration data are transmitted to the target computer units following system configuration. A local service is activated by a target computer unit assigned to the service using the service configuration data.07-08-2010
20120317438METHOD AND SYSTEM FOR PROVIDING IMMUNITY TO COMPUTERS - A method and system for providing immunity to a computer system wherein the system includes an immunity module, a recovery module, a maintenance module, an assessment module, and a decision module, wherein the immunity module, the recovery module, the maintenance module and the assessment module are each linked to the decision module. The maintenance module monitors the system for errors and sends an error alert message to the assessment module, which determines the severity of the error and the type of package required to fix the error. The assessment module sends a request regarding the type of package required to fix the error to the recovery module. The recovery module sends the package required to fix the error to the maintenance module, which fixes the error in the system.12-13-2012
20120173918COMMUNICATIONS PATH STATUS DETECTION SYSTEM - Virtual network interface selection manager in a client-server system, in which client and server are connectable through a plural alternate networks. System includes plural interfaces connectable to server through the plural networks, current interface indicator identifying a current interface through which data is transmitted to and/or received from the server, and prioritized listing of plural interfaces ranked in a descending order. Event detector detects occurrence of an event including time-out condition; successful interface test; and change in the plurality of interfaces, and a tester tests each plural interface in prioritized listing in a ranked order to test whether server is reachable. Marker identifies which of plural interfaces successfully pass the test, and switch switches from current interface to a higher priority interface when the marker identifies a higher priority interface as having successfully passed the test, whereby current interface indicator will identify the higher priority interface as current interface.07-05-2012
20120233491MAINTAINING A COMMUNICATION PATH FROM A HOST TO A STORAGE SUBSYSTEM IN A NETWORK - Provided are a method, system, computer storage device, and storage area network for maintaining a communication path from a host to a storage subsystem in a network. A storage subsystem controls data transfer and access to a storage devices in a network including a switch and a host. A topological storage stores topological coupling relationship between the host and the switch and a topological coupling relationship between the switch and the storage subsystem. In response to determining a failed path, the storage subsystem determines a first port on the storage subsystem in the failed path. The storage subsystem determines from the topology storage the topological coupling relationships between the host and the switch and the switch and the storage subsystem. The storage subsystem redirects, based on the topological coupling relationships, a message sent to the first port of the storage subsystem to an operational second port in the storage subsystem.09-13-2012
20120317437Ranking Service Units to Provide and Protect Highly Available Services Using the Nway Redundancy Model - Presented are methods and apparatus for protecting a plurality of High Availability (HA) Service Instances (SIs) with a plurality of Service Units (SUs) with an Nway redundancy model. Any of the SUs associated with the Nway redundancy model can simultaneously be assigned an active HA state for some of the SIs and a standby HA state for other SIs. However, only one SU can have the active state for any given SI. The Nway redundancy model is a configured prior to runtime operation.12-13-2012
20120221885MONITORING DEVICE, MONITORING SYSTEM AND MONITORING METHOD - A monitoring device including: a receiving unit configured to receive a malfunction notice of a data processing device, the data processing device being connected to the monitoring device which monitors running condition through a network; a malfunction device identification unit configured to identify a data processing device that is malfunctioning based on the received malfunction notice; a data obtaining unit configured to obtain running data and device data of the data processing device that is malfunctioning and an another data processing device; and a malfunction cause identification unit configured to identify a cause of the malfunction, based on the obtained running data and the obtained device data.08-30-2012
20120260122Video conferencing with multipoint conferencing units and multimedia transformation units - In one embodiment, a method includes receiving at a multimedia transformation unit, media streams from a plurality of endpoints, transmitting audio components of the media streams to a multipoint conferencing unit, receiving an identifier from the multipoint conferencing unit identifying one of the media streams as an active speaker stream, processing at the multimedia transformation unit, a video component of the active speaker stream, and transmitting the active speaker stream to one or more of the endpoints without transmitting the video component to the multipoint conferencing unit. An apparatus is also disclosed.10-11-2012
20090044042Device Management Method, Analysis System Used for the Device Management Method, Data Structure Used in Management Database, and Maintenance Inspection Support Apparatus Used for the Device Management Method - Either a complete overhaul for replacing with recommended devices the entire number of devices in a large group of managed devices T, or a partial overhaul for repairing or replacing with recommended devices only those managed devices T that are malfunctioning is selectively performed as an initial overhaul. A complete test involving the entire number of the managed devices T is then periodically performed to determine whether the devices are operating normally or have a malfunction. Any devices found to be malfunctioning during any complete test are repaired or replaced with recommended devices.02-12-2009
20080301487VIRTUAL COMPUTER SYSTEM AND CONTROL METHOD THEREOF - When a failure occurs in an LPAR on a physical computer under an SAN environment, a destination LPAR is set in another physical computer to enable migrating of the LPAR and setting change of a security function on the RAID apparatus side is not necessary. When a failure occurs in an LPAR generated on a physical computer under an SAN environment, configuration information including a unique ID (WWN) of the LPAR where the failure occurs is read, a destination LPAR is generated on another physical computer, and the read configuration information of the LPAR is set to the destination LPAR, thereby enabling migrating of the LPAR when the failure occurs, under the control of a management server.12-04-2008
20120324272OPTICAL COMMUNICATION SYSTEM, INTERFACE BOARD AND CONTROL METHOD PERFORMED IN INTERFACE BOARD - An embodiment of the invention is an optical communication system including: a plurality of interface boards which transmit and receive optical signals to and from interface boards facing the plurality of interface boards; and a monitoring control device which monitors states of the plurality of interface boards. A first interface board of the plurality of interface boards includes: a replacement unit capable of monitoring the states of the plurality of interface boards on behalf of the monitoring control device and independently receiving supply of power; and a control unit configured to start the replacement unit in a case where a fault occurs in the monitoring control device and stop or halt the replacement unit in a case where there is no fault in the monitoring control device.12-20-2012
20120272091PARTIAL FAULT PROCESSING METHOD IN COMPUTER SYSTEM - As regards a hardware fault which has occurred in a computer, a hypervisor notifies an LPAR which can continue execution, of a fault occurrence as a hardware fault for which execution can be continued. Upon receiving the notice, the LPAR notifies the hypervisor that it has executed processing to cope with a fault. The hypervisor provides an interface for acquiring a situation of a notice situation. It is made possible to register and acquire a situation of coping with a hardware fault allowing continuation of execution through the interface, and it is made possible to make a decision as to the situation of coping with a fault in the computers as a whole.10-25-2012
20110239038MANAGEMENT APPARATUS, MANAGEMENT METHOD, AND PROGRAM - When a fault occurs in a guest machine 09-29-2011
20110276821METHOD AND SYSTEM FOR MIGRATING DATA FROM MULTIPLE SOURCES - An approach is provided for migrating data. Data is received from a plurality of source systems. The received data is processed for conversion to a target system. A failure condition associated with the processing is detected. An action is selectively initiated from a point of failure corresponding to the detected failure condition. The action includes either retrying the processing, aborting the processing, initiating simulation of the process, forcing completion of the processing, or a combination thereof.11-10-2011
20130013954Detecting Browser Failure - Embodiments are configured to improve the stability of a Web browser by identifying plug-in modules that cause failures. Data in memory at the time of a failure is analyzed, and a failure signature is generated. The failure signature is compared to a database of known failure signatures so that the source of the failure may be identified. If a plug-in module to a Web browser is identified as the source of a failure, options are presented to the user who may update the plug-in module with code that does not produce a failure or disable the plug-in module altogether.01-10-2013
20110246813REPURPOSABLE RECOVERY ENVIRONMENT - A reconfiguration manager is operable to reconfigure a repurposable recovery environment between a recovery environment for a production environment and a second environment different from the recovery environment. A storage system in the repurposable recovery environment periodically saves production information from the production environment while the repurposable recovery environment is operating as the second environment. The production information in the storage system is used to reconfigure the repurposable recovery environment from the second environment to the recovery environment.10-06-2011
20130145203DYNAMICALLY CONFIGUREABLE PLACEMENT ENGINE - A stream application may allocate processing elements to one or more compute nodes (or hosts) to achieve a desired optimization goal. Each optimization mode may define processing element selection criteria and/or host selection criteria. When allocating a processing element to a host, a scheduler may place each processing element individually. Accordingly, the scheduler may use the processing element selection criteria for selecting which processing element in the stream application to allocate next. The scheduler may then determine, based on one or more constraints, which host the processing element can be placed on. If the scheduler determines that multiple hosts are suitable candidates for the processing element, it may use the host selection criteria to pick one of the candidate hosts that further optimize the stream application to meet the desired goal.06-06-2013
20130145204SUPERVISING AND RECOVERING SOFTWARE COMPONENTS ASSOCIATED WITH MEDICAL DIAGNOSTICS INSTRUMENTS - A system for applying a recovery mechanism to a network of medical diagnostics instruments is provided herein. The system includes the following: a plurality of medical diagnostics instruments, each associated with a network connected component; a plurality of communication modules, each associated with a corresponding one of the plurality of network connected components, wherein each one of the plurality of communication modules is arranged to report on malfunctioning components that are network connected with the corresponding component, and a recovery module, configured to: (i) obtain reports from the communication modules; (ii) re-establish the malfunctioning components; and (iii) notify all communication modules of the re-establishment of the malfunctioning components, wherein the communication modules are further configured to re-establish connection between the corresponding components and the re-established components.06-06-2013
20110271138SYSTEM AND METHOD FOR HANDLING SYSTEM FAILURE - A system and a method for handling a system failure are disclosed. The method is adapted for an information handling system having a basic input and output system and a micro-controller. The method includes the following steps: sending, via the micro-controller, a signal; checking, via the micro-controller, whether an acknowledgement is received from the basic input and output system responsive to the signal; and scanning, via the micro-controller, a type of a system failure in response to the acknowledgement being not received.11-03-2011
20080215909APPARATUS, SYSTEM, AND METHOD FOR TRANSACTIONAL PEER RECOVERY IN A DATA SHARING CLUSTERING COMPUTER SYSTEM - The invention provides an apparatus, system, and method for cluster-wide peer recovery in the event of a computer failure. A failure of a first computer is detected and a recovery module is registered as the first computer. In one embodiment, the recovery module is a peer computer. The recovery module retrieves a privately held undo log data through the authorized assumption of the failure identity associated with the failed first computer, backs out in-flight transaction updates of the first computer, and frees up data resources locked by the first computer.09-04-2008
20130173952ELECTRONIC DEVICE AND METHOD FOR LOADING FIRMWARE - An electronic device includes an internal storage module, a baseboard management controller (BMC) and a port. The internal storage module stores a first firmware and a boot application. The port connects to an external storage for storing a second firmware which is a backup of the first firmware. After the electronic device is powered on, the BMC runs the boot application to load the first firmware from the internal storage module. If the first firmware fails to load, the BMC copies the second firmware from the external storage to the internal storage module to replace the first firmware.07-04-2013
20130091376SELF-REPAIRING DATABASE SYSTEM - A method, system, and computer program product include generating a database copy from a database of a primary virtual machine (VM), provisioning a standby VM with the database copy, detecting a failure associated with the database, and promoting the standby VM to replace the primary VM.04-11-2013
20130097455METHOD AND SYSTEM FOR IMPLEMENTING INTERCONNECTION FAULT TOLERANCE BETWEEN CPU - A system for implementing interconnection fault tolerance between CPUs, a first CPU and a second CPU implements interconnection through a first CPU interconnect device and a second CPU interconnect device. The system adds a data channel between a first SerDes interface of the first CPU interconnect device and a second SerDes interface of the second CPU interconnect device, and transmits link connection state information and a link control signal through the added data channel. The system monitors a link state of any one link in a CPU interconnection system, transmits the link state through the added data channel, recovers any one of the connection links when determining whether any one of the first connection link, the second connection link and the third connection link is faulty.04-18-2013
20130103974Firmware Management In A Computing System - Managing firmware in a computing system storing a plurality of different firmware images for the same firmware includes: calculating, for each firmware image in dependence upon a plurality of predefined factors, a preference score; responsive to a failure of a particular firmware image, selecting a firmware image having a highest preference score; and failing over to the selected firmware image.04-25-2013
20110231697SYSTEMS AND METHODS FOR IMPROVING RELIABILITY AND AVAILABILITY OF AN INFORMATION HANDLING SYSTEM - In one aspect, a method for improving reliability and availability of an information handling system is disclosed. Operational data associated with an operating margin may be captured. A threshold specified by a pre-defined profile may be identified. The pre-defined profile may be useable in adjusting the operating margin. The captured operational data may be compared to the pre-defined threshold. A parameter specified by the pre-defined profile may be identified. The operation of a component of the information handling system may be modified based, at least in part, on the identified parameter specified by the pre-defined profile. The modification may result in adjusting the operating margin.09-22-2011
20110239037System And Method For Providing Indexing With High Availability In A Network Based Suite of Services - A suite of network-based services, such as the services corresponding to Microsoft® SharePoint™, are provided to users with high availability. The suite of network-based services may include browser-based collaboration functions, process management functions, index and search functions, document-management functions, and/or other functions. In particular, the indexing service associated with the suite of network-based services may be provided with high availability.09-29-2011
20100318834Method and device for avionic reconfiguration - The invention in particular has as an object a method and a device for reconfiguration of an avionic system comprising at least two computers and a software application, in an aircraft, each of the said computers being adapted for running the software application, the aircraft further comprising a module for detection of failure of at least one of the said computers as well as a loading module making it possible to load the software application into each of the computers. After an information item relating to the state of one of the computers has been received from the detection module, a failure of one of the computers is detected according to the information item received. A configuration according to which at least one application run by the faulty computer is run by another of the computers then is determined. The system then is reconfigured according to the determined configuration, the reconfiguration comprising the transmission of a request to the loading module to load the application run by the faulty computer into the other computer.12-16-2010
20130198557DATA TRANSFER AND RECOVERY - A backup image generator can create a primary image and periodic delta images of all or part of a primary server. The images can be sent to a network attached storage device and one or more remote storage servers. In the event of a failure of the primary server, an updated primary image may be used to provide an up-to-date version of the primary system at a backup or other system. As a result, the primary data storage may be timely backed-up, recovered and restored with the possibility of providing server and business continuity in the event of a failure.08-01-2013
20120297236HIGH AVAILABILITY SYSTEM ALLOWING CONDITIONALLY RESERVED COMPUTING RESOURCE USE AND RECLAMATION UPON A FAILOVER - In one embodiment, a method attempts, by a computing device, to determine a placement of a set of virtual machines on available hosts upon failure of a host. The placement considers the set of virtual machines as being not powered on any of the available hosts. The method further determines, by the computing device, a placed list of virtual machines in the set of virtual machines as a recommendation to power on to the available hosts. The determination of the placed list of virtual machines is used to determine a power off list of virtual machines in the set of virtual machines to power off, wherein virtual machines in the power off list of virtual machines are currently powered on available hosts but were considered to be powered off to determine the placement.11-22-2012
20120047392DISASTER RECOVERY REPLICATION THROTTLING IN DEDUPLICATION SYSTEMS - Various embodiments for disaster recovery (DR) replication throttling in a computing environment by a processor device are provided. Communication is arrested between a source data entity and a replicated data entity at a location declared in a DR mode. The DR mode is negotiated to a central replication management component as a DR mode entry event. The DR mode entry event is distributed, by the central replication management component, to each member in a shared group. The DR mode is enforced using at least one replication policy.02-23-2012
20120072765JOB MIGRATION IN RESPONSE TO LOSS OR DEGRADATION OF A SEMI-REDUNDANT COMPONENT - A computer program product and method of managing the workload in a computer system having one or more semi-redundant hardware components are provided. The method comprises detecting loss or degradation of the level of performance of one or more of the semi-redundant hardware components, identifying hardware components that are affected by the loss or degradation of the one or more semi-redundant components, migrating a critical job from an affected hardware component to an unaffected hardware component, and performing less-critical jobs on an affected hardware component. Loss or degradation of the semi-redundant component reduces the capacity of affected hardware components in the computer system without entirely disabling the computer system. Jobs identified as being critical are run on hardware components having the most capacity and reliability, while allowing less-critical jobs to make use of the remaining capacity of affected hardware components. Optionally, the semi-redundant hardware component may be selected from a memory module, CPU core, Ethernet port, power supply, fan, disk drive, and an input output port.03-22-2012

Patent applications in class By masking or reconfiguration

Patent applications in all subclasses By masking or reconfiguration