Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Hasenplaugh

William Hasenplaugh, Boston, MA US

Patent application number	Description	Published
20100138609	Technique for controlling computing resources - A technique to enable resource allocation optimization within a computer system. In one embodiment, a gradient partition algorithm (GPA) module is used to continually measure performance and adjust allocation to shared resources among a plurality of data classes in order to achieve optimal performance.	06-03-2010
20110106872	Method and apparatus for providing an area-efficient large unsigned integer multiplier - An area efficient multiplier having high performance at modest clock speeds is presented. The performance of the multiplier is based on optimal choice of a number of levels of Karatsuba decomposition. The multiplier may be used to perform efficient modular reduction of large numbers greater than the size of the multiplier.	05-05-2011

William C. Hasenplaugh, Jamaica Plain, MA US

Patent application number	Description	Published
20080235457	Dynamic quality of service (QoS) for a shared cache - In one embodiment, the present invention includes a method for associating a first priority indicator with data stored in a first entry of a shared cache memory by a core to indicate a priority level of a first thread, and associating a second priority indicator with data stored in a second entry of the shared cache memory by a graphics engine to indicate a priority level of a second thread. Other embodiments are described and claimed.	09-25-2008
20110264720	CRYPTOGRAPHIC SYSTEM, METHOD AND MULTIPLIER - In general, in one aspect, the disclosure describes a multiplier that includes a set of multiple multipliers configured in parallel where the set of multiple multipliers have access to a first operand and a second operand to multiply, the first operand having multiple segments and the second operand having multiple segments. The multiplier also includes logic to repeatedly supply a single segment of the second operand to each multiplier of the set of multiple multipliers and to supply multiple respective segments of the first operand to the respective ones of the set of multiple multipliers until each segment of the second operand has been supplied with each segment of the first operand. The logic shifts the output of different ones of the set of multiple multipliers based, at least in part, on the position of the respective segments within the first operand. The multiplier also includes an accumulator coupled to the logic.	10-27-2011

Patent applications by William C. Hasenplaugh, Jamaica Plain, MA US

William C. Hasenplaugh, Boston, MA US

Patent application number	Description	Published
20110145501	CACHE SPILL MANAGEMENT TECHNIQUES - An apparatus and method is described herein for intelligently spilling cache lines. Usefulness of cache lines previously spilled from a source cache is learned, such that later evictions of useful cache lines from a source cache are intelligently selected for spill. Furthermore, another learning mechanism—cache spill prediction—may be implemented separately or in conjunction with usefulness prediction. The cache spill prediction is capable of learning the effectiveness of remote caches at holding spilled cache lines for the source cache. As a result, cache lines are capable of being intelligently selected for spill and intelligently distributed among remote caches based on the effectiveness of each remote cache in holding spilled cache lines for the source cache.	06-16-2011
20110248755	CROSS-FEEDBACK PHASE-LOCKED LOOP FOR DISTRIBUTED CLOCKING SYSTEMS - According to various embodiments, a cross-feedback phase-locked loop (XF-PLL) may include a secondary phase/frequency detector to detect the phase/frequency differences between two adjacent domains and feed the phase/frequency differences back into the main feedback loop of the XF-PLL, thereby reducing accumulated jitter and inter-domain clock skew in a distributed clocking system. Other embodiments may be described and claimed.	10-13-2011
20120079208	PROBE SPECULATIVE ADDRESS FILE - An apparatus to resolve cache coherency is presented. In one embodiment, the apparatus includes a microprocessor comprising one or more processing cores. The apparatus also includes a probe speculative address file unit, coupled to a cache memory, comprising a plurality of entries. Each entry includes a timer and a tag associated with a memory line. The apparatus further includes control logic to determine whether to service an incoming probe based at least in part on a timer value.	03-29-2012
20120159077	METHOD AND APPARATUS FOR OPTIMIZING THE USAGE OF CACHE MEMORIES - A method and apparatus to reduce unnecessary write backs of cached data to a main memory and to optimize the usage of a cache memory tag directory. In one embodiment of the invention, the power consumption of a processor can be saved by eliminating write backs of cache memory lines that has information that has reached its end-of-life. In one embodiment of the invention, when a processing unit is required to clear one or more cache memory lines, it uses a write-zero command to clear the one or more cache memory lines. The processing unit does not perform a write operation to move or pass data values of zero to the one or more cache memory lines. By doing so, it reduces the power consumption of the processing unit.	06-21-2012
20130173893	HARDWARE COMPILATION AND/OR TRANSLATION WITH FAULT DETECTION AND ROLL BACK FUNCTIONALITY - Hardware compilation and/or translation with fault detection and roll back functionality are disclosed. Compilation and/or translation logic receives programs encoded in one language, and encodes the programs into a second language including instructions to support processor features not encoded into the original language encoding of the programs. In one embodiment, an execution unit executes instructions of the second language including an operation-check instruction to perform a first operation and record the first operation result for a comparison, and an operation-test instruction to perform a second operation and a fault detection operation by comparing the second operation result to the recorded first operation result. In some embodiments, an execution unit executes instructions of the second language including commit instructions to record execution checkpoint states of registers mapped to architectural registers, and roll-back instructions to restore the registers mapped to architectural registers to previously recorded execution checkpoint states.	07-04-2013
20130297883	EFFICIENT SUPPORT OF SPARSE DATA STRUCTURE ACCESS - Method and apparatus to efficiently organize data in caches by storing/accessing data of varying sizes in cache lines. A value may be assigned to a field indicating the size of usable data stored in a cache line. If the field indicating the size of the usable data in the cache line indicates a size less than the maximum storage size, a value may be assigned to a field in the cache line indicating which subset of the data in the field to store data is usable data. A cache request may determine whether the size of the usable data in a cache line is equal to the maximum data storage size. If the size of the usable data in the cache line is equal to the maximum data storage size the entire stored data in the cache line may be returned.	11-07-2013
20130326147	SHORT CIRCUIT OF PROBES IN A CHAIN - A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining that a local last accessor of the memory address may have a copy of the requested data up to date with the memory. The local last accessor may be within a local domain that the requester belongs to. The method may further comprise sending a cache probe to the local last accessor and retrieving a latest value of the requested data from the local last accessor to the requester.	12-05-2013
20130339621	ADDRESS RANGE PRIORITY MECHANISM - Method and apparatus to efficiently manage data in caches. Data in caches may be managed based on priorities assigned to the data. Data may be requested by a process using a virtual address of the data. The requested data may be assigned a priority by a component in a computer system called an address range priority assigner (ARP). The ARP may assign a particular priority to the requested data if the virtual address of the requested data is within a particular range of virtual addresses. The particular priority assigned may be high priority and the particular range of virtual addresses may be smaller than a cache's capacity.	12-19-2013
20140006716	DATA CONTROL USING LAST ACCESSOR INFORMATION	01-02-2014
20140006717	SIGNATURE BASED HIT-PREDICTING CACHE	01-02-2014
20140052920	DOMAIN STATE - Method and apparatus to efficiently maintain cache coherency by reading/writing a domain state field associated with a tag entry within a cache tag directory. A value may be assigned to a domain state field of a tag entry in a cache tag directory. The cache tag directory may belong to a hierarchy of cache tag directories.	02-20-2014
20140189251	UPDATE MASK FOR HANDLING INTERACTION BETWEEN FILLS AND UPDATES - A multi core processor implements a cash coherency protocol in which probe messages are address-ordered on a probe channel while responses are un-ordered on a response channel. When a first core generates a read of an address that misses in the first core's cache, a line fill is initiated. If a second core is writing the same address, the second core generates an update on the addressed ordered probe channel. The second core's update may arrive before or after the first core's line fill returns. If the update arrived before the fill returned, a mask is maintained to indicate which portions of the line were modified by the update so that the late arriving line fill only modifies portions of the line that were unaffected by the earlier-arriving update.	07-03-2014
20140189413	DISTRIBUTED POWER MANAGEMENT FOR MULTI-CORE PROCESSORS - A system and method for performing distributed power control in a processor comprising an array of cores enables each core to regulate power at least partially independently. Global power management settings are made accessible to all cores and communication between cores propagates power consumption information between nearest neighbors in the array. Each core attempts to best regulate its own power consumption in accordance with global power consumption information and/or specific instructions from a global power manager. In this manner local opportunistic load balancing may be achieved in a scalable manner suitable for a large array of cores.	07-03-2014
20140201446	HIGH BANDWIDTH FULL-BLOCK WRITE COMMANDS - A micro-architecture may provide a hardware and software of a high bandwidth write command. The micro-architecture may invoke a method to perform the high bandwidth write command. The method may comprise sending a write request from a requester to a record keeping structure. The write request may have a memory address of a memory that stores requested data. The method may further determine copies of the requested data being present in a distributed cache system outside the memory, sending invalidation requests to elements holding copies of the requested data in the distributed cache system, sending a notification to the requester to inform presence of copies of the requested data and sending a write response message after a latest value of the requested data and all invalidation acknowledgements have been received.	07-17-2014
20140215162	RETRIEVAL OF PREVIOUSLY ACCESSED DATA IN A MULTI-CORE PROCESSOR - A multi-core processing apparatus may provide a cache probe and data retrieval method. The method may comprise sending a memory request from a requester to a record keeping structure. The memory request may have a memory address of a memory that stores requested data. The method may further comprise determining a last accessor of the memory address, sending a cache probe to the last accessor, determining the last accessor no longer has a copy of the line; and sending a request for the previously accessed version of the line. The request may bypass the tag-directories and obtain the requested data from memory.	07-31-2014
20150046910	HARDWARE COMPILATION AND/OR TRANSLATION WITH FAULT DETECTION AND ROLL BACK FUNCTIONALITY - Hardware compilation and/or translation with fault detection and roll back functionality are disclosed. Compilation and/or translation logic receives programs encoded in one language, and encodes the programs into a second language including instructions to support processor features not encoded into the original language encoding of the programs. In one embodiment, an execution unit executes instructions of the second language including an operation-check instruction to perform a first operation and record the first operation result for a comparison, and an operation-test instruction to perform a second operation and a fault detection operation by comparing the second operation result to the recorded first operation result. In some embodiments, an execution unit executes instructions of the second language including commit instructions to record execution checkpoint states of registers mapped to architectural registers, and roll-back instructions to restore the registers mapped to architectural registers to previously recorded execution checkpoint states.	02-12-2015

Patent applications by William C. Hasenplaugh, Boston, MA US

William C. Hasenplaugh, Cambridge, MA US

Patent application number	Description	Published
20120246407	METHOD AND SYSTEM TO IMPROVE UNALIGNED CACHE MEMORY ACCESSES - A method and system to improve unaligned cache memory accesses. In one embodiment of the invention, a processing unit has logic to facilitate access of at least two cache memory lines of a cache memory in a single read operation. By doing so, it avoids additional read operations or cycles to read the required data that is cached in more than one cache memory line. Embodiments of the invention facilitate the streaming of unaligned vector loads that does not require substantially more power than streaming aligned vector loads. For example, in one embodiment of the invention, the streaming of unaligned vector loads consumes less than two times the power requirements of streaming aligned vector loads.	09-27-2012
20140188968	VARIABLE PRECISION FLOATING POINT MULTIPLY-ADD CIRCUIT - Embodiments of the present invention may provide methods and circuits for energy efficient floating point multiply and/or add operations. A variable precision floating point circuit may determine the certainty of the result of a multiply-add floating point calculation in parallel with the floating-point calculation. The variable precision floating point circuit may use the certainty of the inputs in combination with information from the computation, such as, binary digits that cancel, normalization shifts, and rounding, to perform a calculation of the certainty of the result. A floating point multiplication circuit may determine whether a lowest portion of a multiplication result could affect the final result and may induce a replay of the multiplication operation when it is determined that the result could affect the final result.	07-03-2014