Patent application number | Description | Published |
20080256420 | ERROR CHECKING ADDRESSABLE BLOCKS IN STORAGE - Provided are a method, system, and article of manufacture for error checking addressable blocks in storage. Addressable blocks of data are stored in a storage in stripes, wherein each stripe includes a plurality of data blocks for one of the addressable blocks and at least one checksum block including checksum data derived from the data blocks for the addressable block. A write request is received to modify data in one of the addressable blocks. The write and updating the checksum are performed in the stripe having the modified addressable block. An indication is made to perform an error checking operation on the stripe for the modified addressable block in response to the write request, wherein the error checking operation reads the data blocks and the checksum in the stripe to determine if the checksum data is accurate. An error handling operation is initiated in response to determining that the checksum data is not accurate. | 10-16-2008 |
20090006886 | SYSTEM AND METHOD FOR ERROR CORRECTION AND DETECTION IN A MEMORY SYSTEM - A system and method for error correction and detection in a memory system. The system includes a memory controller, a plurality of memory modules and a mechanism. The memory modules are in communication with the memory controller and with a plurality of memory devices. The mechanism detects that one of the memory modules has failed possibly coincident with a memory device failure on an other of the memory modules. The mechanism allows the memory system to continue to run unimpaired in the presence of the memory module failure and the memory device failure. | 01-01-2009 |
20090006900 | SYSTEM AND METHOD FOR PROVIDING A HIGH FAULT TOLERANT MEMORY SYSTEM - A system and method for providing a high fault tolerant memory system. The system includes a memory system having a memory controller, a plurality of memory modules and a mechanism. The plurality of memory modules are in communication with the memory controller and with a plurality of memory devices. The plurality of memory devices include at least one spare memory device for providing memory device sparing capability. The mechanism is for detecting that one of the memory modules has failed possibly coincident with a memory device failure on an other of the memory modules. The mechanism allows the memory system to continue to run unimpaired in the presence of the memory module failure and the possible memory device failure. | 01-01-2009 |
20090055688 | DETECTION AND CORRECTION OF DROPPED WRITE ERRORS IN A DATA STORAGE SYSTEM - Methods are provided for detecting and correcting dropped writes in a storage system. Data and a checksum are written to a storage device, such as a RAID array. The state of the data is classified as being in a “new data, unconfirmed” state. The state of written data is periodically checked, such as with a timer. If the data is in the “new data, unconfirmed” state, it is checked for a dropped write. If a dropped write has occurred, the state of the data is changed to a “single dropped write confirmed” state and the dropped write error is preferably corrected. If no dropped write is detected, the state is, changed to a “confirmed good” state. If the data was updated through a read-modified-write prior to being checked for a dropped write event, its state is changed to an “unquantifiable” state. | 02-26-2009 |
20090216985 | METHODS, SYSTEMS, AND COMPUTER PROGRAM PRODUCTS FOR DYNAMIC SELECTIVE MEMORY MIRRORING - Methods, systems, and computer program products are provided for dynamic selective memory mirroring in solid state devices. An amount of memory is reserved. Sections of the memory to select for mirroring in the reserved memory are dynamically determined. The selected sections of the memory contain critical areas. The selected sections of the memory are mirrored in the reserved memory. | 08-27-2009 |
20100217915 | HIGH AVAILABILITY MEMORY SYSTEM - A memory system with high availability is provided. The memory system includes multiple memory channels. Each memory channel includes at least one memory module with memory devices organized as partial ranks coupled to memory device bus segments. Each partial rank includes a subset of the memory devices accessible as a subchannel on a subset of the memory device bus segments. The memory system also includes a memory controller in communication with the multiple memory channels. The memory controller distributes an access request across the memory channels to access a full rank. The full rank includes at least two of the partial ranks on separate memory channels. Partial ranks on a common memory module can be concurrently accessed. The memory modules can use at least one checksum memory device as a dedicated checksum memory device or a shared checksum memory device between at least two of the concurrently accessible partial ranks. | 08-26-2010 |
20100251072 | DETECTION AND CORRECTION OF DROPPED WRITE ERRORS IN A DATA STORAGE SYSTEM - A RAID system is provided for detecting and correcting dropped writes in a storage system. Data and a checksum are written to a storage device, such as a RAID array. The state of the data is classified as being in a “new data, unconfirmed” state. The state of written data is periodically checked, such as with a timer. If the data is in the “new data, unconfirmed” state, it is checked for a dropped write. If a dropped write has occurred, the state of the data is changed to a “single dropped write confirmed” state and the dropped write error is preferably corrected. If no dropped write is detected, the state is changed to a “confirmed good” state. If the data was updated through a read-modified-write prior to being checked for a dropped write event, its state is changed to an “unquantifiable” state. | 09-30-2010 |
20120185624 | Automated Cabling Process for a Complex Environment - A method is provided for cabling a plurality of hardware components. A chassis controller establishes a wireless connection to a wireless device. The chassis controller, via a wireless interface, transmits a chassis map to the wireless device over the wireless connection. The chassis controller, via the wireless interface, transmits to the wireless device, an indication of a first port to be cabled over the wireless connection, the first port. The first port is of a first hardware component of the plurality of hardware components. The chassis controller tests the first port to determine whether cabling of the first port has been performed correctly. | 07-19-2012 |
20120304025 | DUAL HARD DISK DRIVE SYSTEM AND METHOD FOR DROPPED WRITE DETECTION AND RECOVERY - A system is provided. The system detects a dropped write from a hard disk drive (HDD). The system includes two or more HDDs, each being configured to define a data block spread across the two or more HDDs. The data block is configured to regenerate a checksum across the full data block during a read operation to detect the dropped write. | 11-29-2012 |
20130010419 | REDUCING IMPACT OF REPAIR ACTIONS FOLLOWING A SWITCH FAILURE IN A SWITCH FABRIC - Techniques are disclosed for reducing impact of a switch failure and/or a repair action in a switch fabric. In one embodiment, a server system is provided that includes a first interposer card that operatively connects one or more server cards to a midplane. The first interposer card may include a switch module that switches network traffic for the one or more server cards. The first interposer card may be hot-swappable from the midplane, and the one or more server cards may be hot-swappable from the first interposer card. The server system may further include an interconnect between the first interposer card and a second interposer card. | 01-10-2013 |
20130010639 | SWITCH FABRIC MANAGEMENT - Techniques are disclosed for managing a switch fabric. In one embodiment, a server system is provided that includes a midplane, one or more server cards, switch modules and a management controller. The midplane may include a fabric interconnect for a switch fabric. The one or more server cards and the switch modules may be operatively connected to the midplane. The switch modules may be configured to switch network traffic for the one or more server cards. The management controller may be configured to manage the switch modules via the fabric interconnect. | 01-10-2013 |
20130013759 | MANAGING INVENTORY DATA FOR COMPONENTS OF A SERVER SYSTEM - Techniques are disclosed for managing inventory data for components of a server system. In one embodiment, a global management controller is provided, that is operatively connected to a plurality of local management controllers. Each local management controller is configured to manage a subset of the components of the server system. Each local management controller is also configured to generate, for each component, a checksum based on vital product data (VPD) of the component. Each local management controller is also configured to compute a composite checksum based on the checksums generated for the components in the subset. The global management controller is configured to maintain a global view of the VPD in the computer system, based on the checksums and/or composite checksums. | 01-10-2013 |
20130013956 | REDUCING IMPACT OF A REPAIR ACTION IN A SWITCH FABRIC - Techniques are disclosed for reducing impact of a repair action in a switch fabric. In one embodiment, a server system is provided that includes a first interposer card that operatively connects one or more server cards to a midplane. The first interposer card may include a switch module that switches network traffic for the one or more server cards. The first interposer card may be hot-swappable from the midplane, and the one or more server cards may be hot-swappable from the first interposer card. | 01-10-2013 |
20130013957 | REDUCING IMPACT OF A SWITCH FAILURE IN A SWITCH FABRIC VIA SWITCH CARDS - Techniques are disclosed for reducing impact of a switch failure in a switch fabric. In one embodiment, a server system is provided that includes a midplane, one or more server cards and one or more switch cards. The midplane may include a fabric interconnect for a switch fabric. The one or more server cards may be coupled with the midplane, where each server card is hot-swappable from the midplane. The one or more switch cards may also be coupled with the midplane, where each switch card is also hot-swappable from the midplane. Each switch card includes one or more switch modules, and each switch module is configured to switch network traffic for at least one server card. | 01-10-2013 |
20130094348 | SWITCH FABRIC MANAGEMENT - Techniques are disclosed for managing a switch fabric. In one embodiment, a server system is provided that includes a midplane, one or more server cards, switch modules and a management controller. The midplane may include a fabric interconnect for a switch fabric. The one or more server cards and the switch modules may be operatively connected to the midplane. The switch modules may be configured to switch network traffic for the one or more server cards. The management controller may be configured to manage the switch modules via the fabric interconnect. | 04-18-2013 |
20130094351 | REDUCING IMPACT OF A SWITCH FAILURE IN A SWITCH FABRIC VIA SWITCH CARDS - Techniques are disclosed for reducing impact of a switch failure in a switch fabric. In one embodiment, a server system is provided that includes a midplane, one or more server cards and one or more switch cards. The midplane may include a fabric interconnect for a switch fabric. The one or more server cards may be coupled with the midplane, where each server card is hot-swappable from the midplane. The one or more switch cards may also be coupled with the midplane, where each switch card is also hot-swappable from the midplane. Each switch card includes one or more switch modules, and each switch module is configured to switch network traffic for at least one server card. | 04-18-2013 |
20130097314 | MANAGING INVENTORY DATA FOR COMPONENTS OF A SERVER SYSTEM - Techniques are disclosed for managing inventory data for components of a server system. In one embodiment, a global management controller is provided, that is operatively connected to a plurality of local management controllers. Each local management controller is configured to manage a subset of the components of the server system. Each local management controller is also configured to generate, for each component, a checksum based on vital product data (VPD) of the component. Each local management controller is also configured to compute a composite checksum based on the checksums generated for the components in the subset. The global management controller is configured to maintain a global view of the VPD in the computer system, based on the checksums and/or composite checksums. | 04-18-2013 |
20130100799 | REDUCING IMPACT OF REPAIR ACTIONS FOLLOWING A SWITCH FAILURE IN A SWITCH FABRIC - Techniques are disclosed for reducing impact of a switch failure and/or a repair action in a switch fabric. In one embodiment, a server system is provided that includes a first interposer card that operatively connects one or more server cards to a midplane. The first interposer card may include a switch module that switches network traffic for the one or more server cards. The first interposer card may be hot-swappable from the midplane, and the one or more server cards may be hot-swappable from the first interposer card. The server system may further include an interconnect between the first interposer card and a second interposer card. | 04-25-2013 |
20130103329 | REDUCING IMPACT OF A REPAIR ACTION IN A SWITCH FABRIC - Techniques are disclosed for reducing impact of a repair action in a switch fabric. In one embodiment, a server system is provided that includes a first interposer card that operatively connects one or more server cards to a midplane. The first interposer card may include a switch module that switches network traffic for the one or more server cards. The first interposer card may be hot-swappable from the midplane, and the one or more server cards may be hot-swappable from the first interposer card. | 04-25-2013 |
20130117601 | IMPLEMENTING ULTRA HIGH AVAILABILITY PERSONALITY CARD - A method and circuit for implementing an enhanced availability personality card for a chassis computer system, and a design structure on which the subject circuit resides are provided. The personality card includes a first erasable programmable read only memory (EPROM) and a second EPROM, each EPROM storing Vital Product Data (VPD) and a first temperature sensor and a second temperature sensor sensing temperature. A primary bidirectional bus and a redundant bidirectional bus are respectively connected between the first EPROM and the first temperature sensor and the second EPROM and the second temperature sensor, and a pair of chassis management modules. Each chassis management module includes a switch connected to both the primary bidirectional bus and the redundant bidirectional bus providing redundant paths, enabling continued function with failure of any critical personality card component. | 05-09-2013 |
20140053013 | HANDLING INTERMITTENT RECURRING ERRORS IN A NETWORK - Embodiments relate to a computer for transmitting data in a network. The computer includes at least one data transmission port configured to be connected to at least one storage device via a plurality of paths of a network. The computer further includes a processor configured to detect recurring intermittent errors in one or more paths of the plurality of paths and to disable access to the one or more paths based on detecting the recurring intermittent errors. | 02-20-2014 |
20140053014 | HANDLING INTERMITTENT RECURRING ERRORS IN A NETWORK - Embodiments relate to a computer for transmitting data in a network. The computer includes at least one data transmission port configured to be connected to at least one storage device via a plurality of paths of a network. The computer further includes a processor configured to detect recurring intermittent errors in one or more paths of the plurality of paths and to disable access to the one or more paths based on detecting the recurring intermittent errors. | 02-20-2014 |
20140089725 | PHYSICAL MEMORY FAULT MITIGATION IN A COMPUTING ENVIRONMENT - Effects of a physical memory fault are mitigated. In one example, to facilitate mitigation, memory is allocated to processing entities of a computing environment, such as applications, operating systems, or virtual machines, in a manner that minimizes impact to the computing environment in the event of a memory failure. Allocation includes using memory structure information, including, information regarding fault containment zones, to allocate memory to the processing entities. By allocating memory based on fault containment zones, a fault only affects a minimum number of processing entities. | 03-27-2014 |
20150074302 | Automated Cabling Process for a Complex Environment - A method is provided for cabling a plurality of hardware components. A chassis controller establishes a wireless connection to a wireless device. The chassis controller, via a wireless interface, transmits a chassis map to the wireless device over the wireless connection. The chassis controller, via the wireless interface, transmits to the wireless device, an indication of a first port to be cabled over the wireless connection, the first port. The first port is of a first hardware component of the plurality of hardware components. The chassis controller tests the first port to determine whether cabling of the first port has been performed correctly. | 03-12-2015 |