Stuecheli, TX

Erica Stuecheli, Austin, TX US

Patent application number	Description	Published
20130110490	Verifying Processor-Sparing Functionality in a Simulation Environment	05-02-2013
20140074451	VERIFYING PROCESSOR-SPARING FUNCTIONALITY IN A SIMULATION ENVIRONMENT - A simulation environment verifies processor-sparing functions in a simulated processor core. The simulation environment executes a first simulation for a simulated processor core. During the simulation, the simulation environment creates a simulation model dump file. At a later point in time, the simulation environment executes a second simulation for the simulated processor core. The simulation environment saves the state of the simulated processor core. The simulation environment then replaces the state of the simulated processor core by loading the previously created simulation model dump file. The simulation environment then sets the state of the simulated processor core to execute processor-sparing code and resumes the second simulation.	03-13-2014

Jeff A. Stuecheli, Austin, TX US

Patent application number	Description	Published
20130205087	FORWARD PROGRESS MECHANISM FOR STORES IN THE PRESENCE OF LOAD CONTENTION IN A SYSTEM FAVORING LOADS - A multiprocessor data processing system includes a plurality of cache memories including a cache memory. In response to the cache memory detecting a storage-modifying operation specifying a same target address as that of a first read-type operation being processed by the cache memory, the cache memory provides a retry response to the storage-modifying operation. In response to completion of the read-type operation, the cache memory enters a referee mode. While in the referee mode, the cache memory temporarily dynamically increases priority of any storage-modifying operation targeting the target address in relation to any second read-type operation targeting the target address.	08-08-2013
20130205096	FORWARD PROGRESS MECHANISM FOR STORES IN THE PRESENCE OF LOAD CONTENTION IN A SYSTEM FAVORING LOADS BY STATE ALTERATION - A multiprocessor data processing system includes a plurality of cache memories including a cache memory. The cache memory issues a read-type operation for a target cache line. While waiting for receipt of the target cache line, the cache memory monitors to detect a competing store-type operation for the target cache line. In response to receiving the target cache line, the cache memory installs the target cache line in the cache memory, and sets a coherency state of the target cache line installed in the cache memory based on whether the competing store-type operation is detected.	08-08-2013
20130205098	FORWARD PROGRESS MECHANISM FOR STORES IN THE PRESENCE OF LOAD CONTENTION IN A SYSTEM FAVORING LOADS - A multiprocessor data processing system includes a plurality of cache memories including a cache memory. In response to the cache memory detecting a storage-modifying operation specifying a same target address as that of a first read-type operation being processed by the cache memory, the cache memory provides a retry response to the storage-modifying operation. In response to completion of the read-type operation, the cache memory enters a referee mode. While in the referee mode, the cache memory temporarily dynamically increases priority of any storage-modifying operation targeting the target address in relation to any second read-type operation targeting the target address.	08-08-2013
20130205099	FORWARD PROGRESS MECHANISM FOR STORES IN THE PRESENCE OF LOAD CONTENTION IN A SYSTEM FAVORING LOADS BY STATE ALTERATION - A multiprocessor data processing system includes a plurality of cache memories including a cache memory. The cache memory issues a read-type operation for a target cache line. While waiting for receipt of the target cache line, the cache memory monitors to detect a competing store-type operation for the target cache line. In response to receiving the target cache line, the cache memory installs the target cache line in the cache memory, and sets a coherency state of the target cache line installed in the cache memory based on whether the competing store-type operation is detected.	08-08-2013
20130262769	DATA CACHE BLOCK DEALLOCATE REQUESTS - A data processing system includes a processor core supported by upper and lower level caches. In response to executing a deallocate instruction in the processor core, a deallocation request is sent from the processor core to the lower level cache, the deallocation request specifying a target address associated with a target cache line. In response to receipt of the deallocation request at the lower level cache, a determination is made if the target address hits in the lower level cache. In response to determining that the target address hits in the lower level cache, the target cache line is retained in a data array of the lower level cache and a replacement order field in a directory of the lower level cache is updated such that the target cache line is more likely to be evicted from the lower level cache in response to a subsequent cache miss.	10-03-2013
20130262770	DATA CACHE BLOCK DEALLOCATE REQUESTS IN A MULTI-LEVEL CACHE HIERARCHY - In response to executing a deallocate instruction, a deallocation request specifying a target address of a target cache line is sent from a processor core to a lower level cache. In response, a determination is made if the target address hits in the lower level cache. If so, the target cache line is retained in a data array of the lower level cache, and a replacement order field of the lower level cache is updated such that the target cache line is more likely to be evicted in response to a subsequent cache miss in a congruence class including the target cache line. In response to the subsequent cache miss, the target cache line is cast out to the lower level cache with an indication that the target cache line was a target of a previous deallocation request of the processor core.	10-03-2013
20130262777	DATA CACHE BLOCK DEALLOCATE REQUESTS - A data processing system includes a processor core supported by upper and lower level caches. In response to executing a deallocate instruction in the processor core, a deallocation request is sent from the processor core to the lower level cache, the deallocation request specifying a target address associated with a target cache line. In response to receipt of the deallocation request at the lower level cache, a determination is made if the target address hits in the lower level cache. In response to determining that the target address hits in the lower level cache, the target cache line is retained in a data array of the lower level cache and a replacement order field in a directory of the lower level cache is updated such that the target cache line is more likely to be evicted from the lower level cache in response to a subsequent cache miss.	10-03-2013
20130262778	DATA CACHE BLOCK DEALLOCATE REQUESTS IN A MULTI-LEVEL CACHE HIERARCHY - In response to executing a deallocate instruction, a deallocation request specifying a target address of a target cache line is sent from a processor core to a lower level cache. In response, a determination is made if the target address hits in the lower level cache. If so, the target cache line is retained in a data array of the lower level cache, and a replacement order field of the lower level cache is updated such that the target cache line is more likely to be evicted in response to a subsequent cache miss in a congruence class including the target cache line. In response to the subsequent cache miss, the target cache line is cast out to the lower level cache with an indication that the target cache line was a target of a previous deallocation request of the processor core.	10-03-2013
20140149681	COHERENT PROXY FOR ATTACHED PROCESSOR - A coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request from an attached processor (AP) and an expected coherence state of a target address of the memory access request with respect to a cache memory of the AP. In response, the CAPP determines a coherence state of the target address and whether or not the expected state matches the determined coherence state. In response to determining that the expected state matches the determined coherence state, the CAPP issues a memory access request corresponding to that received from the AP on a system fabric of the primary coherent system. In response to determining that the expected state does not match the coherence state determined by the CAPP, the CAPP transmits a failure message to the AP without issuing on the system fabric a memory access request corresponding to that received from the AP.	05-29-2014
20140149682	PROGRAMMABLE COHERENT PROXY FOR ATTACHED PROCESSOR - A coherent attached processor proxy (CAPP) within a primary coherent system participates in an operation on a system fabric of the primary coherent system on behalf of an attached processor (AP) that is external to the primary coherent system and that is coupled to the CAPP. The operation includes multiple components communicated with the CAPP including a request and at least one coherence message. The CAPP determines one or more of the components of the operation by reference to at least one programmable data structure within the CAPP that can be reprogrammed.	05-29-2014
20140149683	PROGRAMMABLE COHERENT PROXY FOR ATTACHED PROCESSOR - A coherent attached processor proxy (CAPP) within a primary coherent system participates in an operation on a system fabric of the primary coherent system on behalf of an attached processor (AP) that is external to the primary coherent system and that is coupled to the CAPP. The operation includes multiple components communicated with the CAPP including a request and at least one coherence message. The CAPP determines one or more of the components of the operation by reference to at least one programmable data structure within the CAPP that can be reprogrammed.	05-29-2014
20140149686	COHERENT ATTACHED PROCESSOR PROXY SUPPORTING MASTER PARKING - In response to receiving a memory access request and expected coherence state at an attached processor at a coherent attached processor proxy (CAPP), the CAPP determines that a conflicting request is being serviced. In response to determining that the CAPP is servicing a conflicting request and that the expected state matches, a master machine of the CAPP is allocated in a Parked state to service the memory access request after completion of service of the conflicting request. The Parked state prevents servicing by the CAPP of a further conflicting request snooped on the system fabric. In response to completion of service of the conflicting request, the master machine transitions out of the Parked state and issues on the system fabric a memory access request corresponding to that received from the AP.	05-29-2014
20140149688	COHERENT ATTACHED PROCESSOR PROXY SUPPORTING MASTER PARKING - In response to receiving a memory access request and expected coherence state at an attached processor at a coherent attached processor proxy (CAPP), the CAPP determines that a conflicting request is being serviced. In response to determining that the CAPP is servicing a conflicting request and that the expected state matches, a master machine of the CAPP is allocated in a Parked state to service the memory access request after completion of service of the conflicting request. The Parked state prevents servicing by the CAPP of a further conflicting request snooped on the system fabric. In response to completion of service of the conflicting request, the master machine transitions out of the Parked state and issues on the system fabric a memory access request corresponding to that received from the AP.	05-29-2014
20140149689	COHERENT PROXY FOR ATTACHED PROCESSOR - A coherent attached processor proxy (CAPP) of a primary coherent system receives a memory access request from an attached processor (AP) and an expected coherence state of a target address of the memory access request with respect to a cache memory of the AP. In response, the CAPP determines a coherence state of the target address and whether or not the expected state matches the determined coherence state. In response to determining that the expected state matches the determined coherence state, the CAPP issues a memory access request corresponding to that received from the AP on a system fabric of the primary coherent system. In response to determining that the expected state does not match the coherence state determined by the CAPP, the CAPP transmits a failure message to the AP without issuing on the system fabric a memory access request corresponding to that received from the AP.	05-29-2014
20140201460	DATA RECOVERY FOR COHERENT ATTACHED PROCESSOR PROXY - A coherent attached processor proxy (CAPP) that participates in coherence communication in a primary coherent system on behalf of an attached processor external to the primary coherent system tracks delivery of data to destinations in the primary coherent system via one or more entries in a data structure. Each of the one or more entries specifies with a destination tag a destination in the primary coherent system to which data is to be delivered from the attached processor. In response to initiation of recovery operations for the CAPP, the CAPP performs data recovery operations, including transmitting, to at least one destination indicated by the destination tag of one or more entries, an indication of a data error in data to be delivered to that destination from the attached processor.	07-17-2014
20140201464	EPOCH-BASED RECOVERY FOR COHERENT ATTACHED PROCESSOR PROXY - A coherent attached processor proxy (CAPP) participates in coherence communication in a primary coherent system on behalf of an attached processor external to the primary coherent system. The CAPP includes an epoch timer that advances at regular intervals to define epochs of operation of the CAPP. Each of one or more entries in a data structure in the CAPP are associated with a respective epoch. Recovery operations for the CAPP are initiated based on a comparison of an epoch indicated by the epoch timer and the epoch associated with one of the one or more entries in the data structure.	07-17-2014
20140201465	ACCELERATED RECOVERY FOR SNOOPED ADDRESSES IN A COHERENT ATTACHED PROCESSOR PROXY - A coherent attached processor proxy (CAPP) that participates in coherence communication in a primary coherent system on behalf of an external attached processor maintains, in each of a plurality of entries of a CAPP directory, information regarding a respective associated cache line of data from the primary coherent system cached by the attached processor. In response to initiation of recovery operations, the CAPP transmits, in a generally sequential order with respect to the CAPP directory, multiple memory access requests indicating an error for addresses indicated by the plurality of entries. In response to a snooped memory access request that targets a particular address hitting in the CAPP directory during the transmitting, the CAPP performs a coherence recovery operation for the particular address prior to a time indicated by the generally sequential order.	07-17-2014
20140201466	DATA RECOVERY FOR COHERENT ATTACHED PROCESSOR PROXY - A coherent attached processor proxy (CAPP) that participates in coherence communication in a primary coherent system on behalf of an attached processor external to the primary coherent system tracks delivery of data to destinations in the primary coherent system via one or more entries in a data structure. Each of the one or more entries specifies with a destination tag a destination in the primary coherent system to which data is to be delivered from the attached processor. In response to initiation of recovery operations for the CAPP, the CAPP performs data recovery operations, including transmitting, to at least one destination indicated by the destination tag of one or more entries, an indication of a data error in data to be delivered to that destination from the attached processor.	07-17-2014
20140201467	EPOCH-BASED RECOVERY FOR COHERENT ATTACHED PROCESSOR PROXY - A coherent attached processor proxy (CAPP) participates in coherence communication in a primary coherent system on behalf of an attached processor external to the primary coherent system. The CAPP includes an epoch timer that advances at regular intervals to define epochs of operation of the CAPP. Each of one or more entries in a data structure in the CAPP are associated with a respective epoch. Recovery operations for the CAPP are initiated based on a comparison of an epoch indicated by the epoch timer and the epoch associated with one of the one or more entries in the data structure.	07-17-2014
20140201468	ACCELERATED RECOVERY FOR SNOOPED ADDRESSES IN A COHERENT ATTACHED PROCESSOR PROXY - A coherent attached processor proxy (CAPP) that participates in coherence communication in a primary coherent system on behalf of an external attached processor maintains, in each of a plurality of entries of a CAPP directory, information regarding a respective associated cache line of data from the primary coherent system cached by the attached processor. In response to initiation of recovery operations, the CAPP transmits, in a generally sequential order with respect to the CAPP directory, multiple memory access requests indicating an error for addresses indicated by the plurality of entries. In response to a snooped memory access request that targets a particular address hitting in the CAPP directory during the transmitting, the CAPP performs a coherence recovery operation for the particular address prior to a time indicated by the generally sequential order.	07-17-2014
20140365733	INTEGRATED CIRCUIT SYSTEM HAVING DECOUPLED LOGICAL AND PHYSICAL INTERFACES - An integrated circuit system including a first integrated circuit chip including first logic, a second integrated circuit chip, and second logic distributed across the first and second integrated circuit chips. The second logic includes a first unit integrated in the first integrated circuit chip and a second unit integrated in the second integrated circuit chip. The integrated circuit system further includes a physical communication link coupling the first unit in the first integrated circuit chip and the second unit in the second integrated circuit chip and a request interface between the first logic and first unit of the second logic. The request interface is implemented in the first integrated circuit such that communication via the request interface between the first logic and the first unit of the second logic has low latency and such that the request interface is decoupled from the physical communication link.	12-11-2014

Jeffrey Stuecheli, Austin, TX US

Patent application number	Description	Published
20110161587	PROACTIVE PREFETCH THROTTLING - According to a method of data processing, a memory controller receives a plurality of data prefetch requests from multiple processor cores in the data processing system, where the plurality of prefetch load requests include a data prefetch request issued by a particular processor core among the multiple processor cores. In response to receipt of the data prefetch request, the memory controller provides a coherency response indicating an excess number of data prefetch requests. In response to the coherency response, the particular processor core reduces a rate of issuance of data prefetch requests.	06-30-2011
20110161588	FORMATION OF AN EXCLUSIVE OWNERSHIP COHERENCE STATE IN A LOWER LEVEL CACHE - In response to a memory access request of a processor core that targets a target cache line, the lower level cache of a vertical cache hierarchy associated with the processor core supplies a copy of the target cache line to an upper level cache in the vertical cache hierarchy and retains a copy in a shared coherence state. The upper level cache holds the copy of the target cache line in a private shared ownership coherence state indicating that each cached copy of the target memory block is cached within the vertical cache hierarchy associated with the processor core. In response to the upper level cache signaling replacement of the copy of the target cache line in the private shared ownership coherence state, the lower level cache updates its copy of the target cache line to the exclusive ownership coherence state without coherency messaging with other vertical cache hierarchies.	06-30-2011
20120203973	SELECTIVE CACHE-TO-CACHE LATERAL CASTOUTS - A data processing system includes first and second processing units and a system memory. The first processing unit has first upper and first lower level caches, and the second processing unit has second upper and lower level caches. In response to a data request, a victim cache line to be castout from the first lower level cache is selected, and the first lower level cache selects between performing a lateral castout (LCO) of the victim cache line to the second lower level cache and a castout of the victim cache line to the system memory based upon a confidence indicator associated with the victim cache line. In response to selecting an LCO, the first processing unit issues an LCO command on the interconnect fabric and removes the victim cache line from the first lower level cache, and the second lower level cache holds the victim cache line.	08-09-2012

Jeffrey A. Stuecheli, Round Rock, TX US

Patent application number	Description	Published
20110276762	COORDINATED WRITEBACK OF DIRTY CACHELINES - A data processing system includes a processor core and a cache memory hierarchy coupled to the processor core. The cache memory hierarchy includes at least one upper level cache and a lowest level cache. A memory controller is coupled to the lowest level cache and to a system memory and includes a physical write queue from which the memory controller writes data to the system memory. The memory controller initiates accesses to the lowest level cache to place into the physical write queue selected cachelines having spatial locality with data present in the physical write queue.	11-10-2011
20110276763	MEMORY BUS WRITE PRIORITIZATION - A data processing system includes a multi-level cache hierarchy including a lowest level cache, a processor core coupled to the multi-level cache hierarchy, and a memory controller coupled to the lowest level cache and to a memory bus of a system memory. The memory controller includes a physical read queue that buffers data read from the system memory via the memory bus and a physical write queue that buffers data to be written to the system memory via the memory bus. The memory controller grants priority to write operations over read operations on the memory bus based upon a number of dirty cachelines in the lowest level cache memory.	11-10-2011
20120203968	COORDINATED WRITEBACK OF DIRTY CACHELINES - A data processing system includes a processor core and a cache memory hierarchy coupled to the processor core. The cache memory hierarchy includes at least one upper level cache and a lowest level cache. A memory controller is coupled to the lowest level cache and to a system memory and includes a physical write queue from which the memory controller writes data to the system memory. The memory controller initiates accesses to the lowest level cache to place into the physical write queue selected cachelines having spatial locality with data present in the physical write queue.	08-09-2012
20120203969	MEMORY BUS WRITE PRIORITIZATION - A data processing system includes a multi-level cache hierarchy including a lowest level cache, a processor core coupled to the multi-level cache hierarchy, and a memory controller coupled to the lowest level cache and to a memory bus of a system memory. The memory controller includes a physical read queue that buffers data read from the system memory via the memory bus and a physical write queue that buffers data to be written to the system memory via the memory bus. The memory controller grants priority to write operations over read operations on the memory bus based upon a number of dirty cachelines in the lowest level cache memory.	08-09-2012

Jeffrey Adam Stuecheli, Austin, TX US

Patent application number	Description	Published
20080235459	PROCESSOR, METHOD, AND DATA PROCESSING SYSTEM EMPLOYING A VARIABLE STORE GATHER WINDOW - A processor includes at least one instruction execution unit that executes store instructions to obtain store operations and a store queue coupled to the instruction execution unit. The store queue includes a queue entry in which the store queue gathers multiple store operations during a store gathering window to obtain a data portion of a write transaction directed to lower level memory. In addition, the store queue includes dispatch logic that varies a size of the store gathering window to optimize store performance for different store behaviors and workloads.	09-25-2008
20080244187	PIPELINING D STATES FOR MRU STEERAGE DURING MRU-LRU MEMBER ALLOCATION - A method and apparatus for preventing selection of Deleted (D) members as an LRU victim during LRU victim selection. During each cache access targeting the particular congruence class, the deleted cache line is identified from information in the cache directory. A location of a deleted cache line is pipelined through the cache architecture during LRU victim selection. The information is latched and then passed to MRU vector generation logic. An MRU vector is generated and passed to the MRU update logic, which is selects/tags the deleted member as a MRU member. The make MRU operation affects only the lower level LRU state bits arranged in a tree-based structure state bits so that the make MRU operation only negates selection of the specific member in the D state, without affecting LRU victim selection of the other members.	10-02-2008
20090006758	SYSTEM BUS STRUCTURE FOR LARGE L2 CACHE ARRAY TOPOLOGY WITH DIFFERENT LATENCY DOMAINS - A cache memory which loads two memory values into two cache lines by receiving separate portions of a first requested memory value from a first data bus over a first time span of successive clock cycles and receiving separate portions of a second requested memory value from a second data bus over a second time span of successive clock cycles which overlaps with the first time span. In the illustrative embodiment a first input line is used for loading both a first byte array of the first cache line and a first byte array of the second cache line, a second input line is used for loading both a second byte array of the first cache line and a second byte array of the second cache line, and the transmission of the separate portions of the first and second memory values is interleaved between the first and second data busses.	01-01-2009
20090006759	SYSTEM BUS STRUCTURE FOR LARGE L2 CACHE ARRAY TOPOLOGY WITH DIFFERENT LATENCY DOMAINS - A cache memory which loads two memory values into two cache lines by receiving separate portions of a first requested memory value from a first data bus over a first time span of successive clock cycles and receiving separate portions of a second requested memory value from a second data bus over a second time span of successive clock cycles which overlaps with the first time span. In the illustrative embodiment a first input line is used for loading both a first byte array of the first cache line and a first byte array of the second cache line, a second input line is used for loading both a second byte array of the first cache line and a second byte array of the second cache line, and the transmission of the separate portions of the first and second memory values is interleaved between the first and second data busses.	01-01-2009
20090070556	STORE STREAM PREFETCHING IN A MICROPROCESSOR - In a microprocessor having a load/store unit and prefetch hardware, the prefetch hardware includes a prefetch queue containing entries indicative of allocated data streams. A prefetch engine receives an address associated with a store instruction executed by the load/store unit. The prefetch engine determines whether to allocate an entry in the prefetch queue corresponding to the store instruction by comparing entries in the queue to a window of addresses encompassing multiple cache blocks, where the window of addresses is derived from the received address. The prefetch engine compares entries in the prefetch queue to a window of 2	03-12-2009

Patent applications by Jeffrey Adam Stuecheli, Austin, TX US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Stuecheli, TX

Erica Stuecheli, Austin, TX US

Jeff A. Stuecheli, Austin, TX US

Jeffrey Stuecheli, Austin, TX US

Jeffrey A. Stuecheli, Round Rock, TX US

Jeffrey Adam Stuecheli, Austin, TX US