Patent application number | Description | Published |
20090006886 | SYSTEM AND METHOD FOR ERROR CORRECTION AND DETECTION IN A MEMORY SYSTEM - A system and method for error correction and detection in a memory system. The system includes a memory controller, a plurality of memory modules and a mechanism. The memory modules are in communication with the memory controller and with a plurality of memory devices. The mechanism detects that one of the memory modules has failed possibly coincident with a memory device failure on an other of the memory modules. The mechanism allows the memory system to continue to run unimpaired in the presence of the memory module failure and the memory device failure. | 01-01-2009 |
20090006900 | SYSTEM AND METHOD FOR PROVIDING A HIGH FAULT TOLERANT MEMORY SYSTEM - A system and method for providing a high fault tolerant memory system. The system includes a memory system having a memory controller, a plurality of memory modules and a mechanism. The plurality of memory modules are in communication with the memory controller and with a plurality of memory devices. The plurality of memory devices include at least one spare memory device for providing memory device sparing capability. The mechanism is for detecting that one of the memory modules has failed possibly coincident with a memory device failure on an other of the memory modules. The mechanism allows the memory system to continue to run unimpaired in the presence of the memory module failure and the possible memory device failure. | 01-01-2009 |
20090187794 | SYSTEM AND METHOD FOR PROVIDING A MEMORY DEVICE HAVING A SHARED ERROR FEEDBACK PIN - A system and method for providing a memory device having a shared error feedback pin. The system includes a memory device having a data interface configured to receive data bits and CRC bits, CRC receiving circuitry, CRC creation circuitry, a memory device pad, and driver circuitry. The CRC receiving circuitry utilizes a CRC equation for the detection of errors in one or more of the received data and the received CRC bits. The CRC creation circuitry utilizes the CRC equation for the creation of CRC bits consistent with data to be transmitted to a separate device bits. The memory device pad is configured for reporting of any errors detected in the received data and the received CRC bits. The driver circuitry is connected to the memory device pad and merged with one or more other driver circuitries resident on one or more other memory devices into an error reporting line. | 07-23-2009 |
20100005366 | CASCADE INTERCONNECT MEMORY SYSTEM WITH ENHANCED RELIABILITY - A hub device, memory system, and method for providing a cascade interconnect memory system with enhanced reliability. The hub device includes an interface to a high-speed bus for communicating with a memory controller. The memory controller and the hub device are included in a cascade interconnect memory system and the high-speed bus includes bit lanes and one or more clock lanes. The hub device also includes a bi-directional fault signal line in communication with the memory controller and readable by a service interface. The hub device also includes a fault isolation register (FIR) for storing information about failures detected at the hub device, the information including severity levels of the detected failures. In addition, the hub device includes error recovery logic for responding to a failure detected at the hub device. Responding to the error includes recording a severity level of the failure in the FIR and taking an action at the hub device that is responsive to the severity level of the failure. The action includes one or more of fast clock stop, setting the bi-directional fault indicator, setting cyclical redundancy code (CRC) bits and transmitting them to the memory controller, re-try, sparing out a bit lane and sparing out a clock lane. | 01-07-2010 |
20100005375 | CYCLICAL REDUNDANCY CODE FOR USE IN A HIGH-SPEED SERIAL LINK - A system and method for providing a cyclical redundancy code (CRC) for use in a high-speed serial link. The system includes a cascade interconnect memory system including a memory controller, a memory hub device and a downstream link. The downstream link is in communication with the memory controller and the memory hub device and includes at least thirteen signal lanes for transmitting a multiple transfer downstream frame from the memory controller to the memory hub device. A portion of the downstream frame includes downstream CRC bits to detect errors in the downstream frame. The downstream CRC bits capable of detecting any one of a lane failure, a transfer failure and up to five bit random errors. | 01-07-2010 |
20100287445 | System to Improve Memory Reliability and Associated Methods - A system to improve memory reliability in computer systems that may include memory chips, and may rely on a error control encoder to send codeword symbols for storage in each of the memory chips. At least two symbols from a codeword are assigned to each memory chip and therefore failure of any of the memory chips could affect two symbols or more. The system may also include a table to record failures and partial failures of the codeword symbols for each of the memory chips so the error control encoder can correct subsequent partial failures based upon the previous partial failures. The error control coder is capable of correcting and/or detecting more errors if only a fraction of a chip is noted in the table as having a failure as opposed to a full chip noted as having a failure. | 11-11-2010 |
20100287454 | System to Improve Error Code Decoding Using Historical Information and Associated Methods - A system to improve error code decoding using historical information may include storage partitioned into memory ranks, and a table to record symbols having failures for each memory rank. The system may also generate a memory rank score for each memory rank. The system may also include an error control decoder that may use the memory rank score when each memory rank is accessed in order to determine whether an error should be corrected or not. | 11-11-2010 |
20100299576 | System to Improve Miscorrection Rates in Error Control Code Through Buffering and Associated Methods - A system to improve miscorrection rates in error control code may include an error control decoder with a safe decoding mode that processes at least two data packets. The system may also include a buffer to receive the processed at least two data packets from the error control decoder. The error control decoder may apply a logic OR operation to the uncorrectable error signal related to the processing of the at least two data packets to produce a global uncorrectable error signal. The system may further include a recipient to receive the at least two data packets and the global uncorrectable error signal. | 11-25-2010 |
20130138901 | IIMPLEMENTING MEMORY PERFORMANCE MANAGEMENT AND ENHANCED MEMORY RELIABILITY ACCOUNTING FOR THERMAL CONDITIONS - A method, system and computer program product implement memory performance management and enhanced memory reliability of a computer system accounting for system thermal conditions. When a primary memory temperature reaches an initial temperature threshold, reads are suspended to the primary memory and reads are provided to a mirrored memory in a mirrored memory pair, and writes are provided to both the primary memory and the mirrored memory. If the primary memory temperature reaches a second temperature threshold, write operations to the primary memory are also stopped and the primary memory is turned off with DRAM power saving modes such as self timed refresh (STR), and the reads and writes are limited to the mirrored memory in the mirrored memory pair. When the primary memory temperature decreases to below the initial temperature threshold, coherency is recovered by writing a coherent copy from the mirrored memory to the primary memory. | 05-30-2013 |
20140025223 | Performance Management of Subsystems in a Server by Effective Usage of Resources - An approach is provided in which a subsystem cooling manager detects an increased workload indicator corresponding to a computer subsystem's forthcoming workload requirement. The forthcoming workload requirement corresponds to future computing resources required by the subsystem to support one or more software programs executing on the computer system. The subsystem cooling manager determines that the forthcoming workload requirement exceeds a utilization threshold and in turn, directs one or more cooling systems towards the corresponding subsystem according. | 01-23-2014 |
20140063987 | MEMORY OPERATION UPON FAILURE OF ONE OF TWO PAIRED MEMORY DEVICES - A method and apparatus for continued operation of a memory module, including a first and second memory device, when one of memory devices has failed. The method includes receiving a write operation request to write a data word, having first and second sections, by a first memory module. The memory module may have a first memory device and a second memory device, for respectively storing the first and second sections of the data word. A determination if one of the first and second memory devices is inoperable is made. If one of the first and second memory devices is inoperable, a write operation is performed by writing the first and second sections of the data word to the operable one of the first and second memory devices. | 03-06-2014 |
20140157044 | IMPLEMENTING DRAM FAILURE SCENARIOS MITIGATION BY USING BUFFER TECHNIQUES DELAYING USAGE OF RAS FEATURES IN COMPUTER SYSTEMS - A method, system and computer program product are provided for implementing dynamic random access memory (DRAM) failure scenarios mitigation using buffer techniques delaying usage of RAS features in computer systems. A buffer is provided with a memory controller. Physical address data read/write failures are analyzed. Responsive to identifying predefined failure types for physical address data read/write failures, the buffer is used to selectively store and retrieve data. | 06-05-2014 |
20140164819 | MEMORY OPERATION OF PAIRED MEMORY DEVICES - A method and apparatus for operation of a memory module for storage of a data word is provided. The apparatus includes a memory module having a set of paired memory devices including a first memory device to store a first section of a data word and a second memory device to store a second section of the data word when used in failure free operation. The apparatus may further include a first logic module to perform a write operation by writing the first and second sections of the data word to both the first memory device and the second memory device upon the determination of certain types of failure. The determination may include that a failure exists in the word section storage of either the first or second memory devices but that no failures exist in equivalent locations of word section storage in the two memory devices. | 06-12-2014 |
20140164853 | MEMORY OPERATION OF PAIRED MEMORY DEVICES - A method and apparatus for operation of a memory module for storage of a data word is provided. The apparatus includes a memory module having a set of paired memory devices including a first memory device to store a first section of a data word and a second memory device to store a second section of the data word when used in failure free operation. The apparatus may further include a first logic module to perform a write operation by writing the first and second sections of the data word to both the first memory device and the second memory device upon the determination of certain types of failure. The determination may include that a failure exists in the word section storage of either the first or second memory devices but that no failures exist in equivalent locations of word section storage in the two memory devices. | 06-12-2014 |
20140250340 | SELF MONITORING AND SELF REPAIRING ECC - Exemplary embodiments of the present invention disclose a method and system for monitoring a first Error Correcting Code (ECC) device for failure and replacing the first ECC device with a second ECC device if the first ECC device begins to fail or fails. In a step, an exemplary embodiment detects that a specified number of correctable errors is exceeded. In another step, an exemplary embodiment detects the occurrence of an uncorrectable error. In another step, an exemplary embodiment performs a loopback test on an ECC device if a specified number of correctable errors is exceeded or if an uncorrectable error occurs. In another step, an exemplary embodiment replaces an ECC device that fails the loopback test with an ECC device that passes a loopback test. | 09-04-2014 |
20140317473 | IMPLEMENTING ECC REDUNDANCY USING RECONFIGURABLE LOGIC BLOCKS - A method, system and computer program product are provided for implementing ECC (Error Correction Codes) redundancy using reconfigurable logic blocks in a computer system. When a fail is detected when reading from memory, it is determined if the incorrect data is in the data or the ECC component of the data. When incorrect data is found in the ECC component of the data, and an actionable threshold is not reached, a predetermined Reliability, Availability, and Serviceability (RAS) action is taken. When the actionable threshold is reached with incorrect data identified in the ECC component of the data, an analysis process is performed to determine if the ECC logic is faulty. When a fail in the ECC logic is detected, the identified ECC failed logic is replaced with a spare block of logic. | 10-23-2014 |
20140328132 | MEMORY MARGIN MANAGEMENT - A method for testing and correcting a memory system is described. The method includes selecting a target memory unit of the memory system having a timing margin in response to a trigger to start a timing margin measurement. The stored data in the target memory unit is moved to a spare memory unit. The memory system performs reads and writes of user data from the spare memory unit while measuring the target memory unit. The timing margins of the target memory unit are measured. The reliability of the measured timing margins of the target memory unit based on a timing margin profile is determined. | 11-06-2014 |
20140355369 | MEMORY OPERATION UPON FAILURE OF ONE OF TWO PAIRED MEMORY DEVICES - A method and apparatus for continued operation of a memory module, including a first and second memory device, when one of memory devices has failed. The method includes receiving a write operation request to write a data word, having first and second sections, by a first memory module. The memory module may have a first memory device and a second memory device, for respectively storing the first and second sections of the data word. A determination if one of the first and second memory devices is inoperable is made. If one of the first and second memory devices is inoperable, a write operation is performed by writing the first and second sections of the data word to the operable one of the first and second memory devices. | 12-04-2014 |
20140359241 | MEMORY DATA MANAGEMENT - A method and computer-readable storage media are provided for rearranging data in physical memory units. In one embodiment, a method may include monitoring utilization counters. The method may further include, comparing the utilization counters for a match with an instance in a first table containing one or more instances when data may be rearranged in the physical memory units. The table may further include where the data should be relocated by a rearrangement. The method may also include, continuing to monitor the utilization counters if a match is not found with an instance in the first table. The method may further include, rearranging the data in the physical memory units if a match between the utilization counters and an instance in the first table is found. | 12-04-2014 |