Paul E. Mckenney, Beaverton US

Paul E. Mckenney, Beaverton, OR US

Patent application number	Description	Published
20080209433	Adaptive Reader-Writer Lock - A method and computer system for dynamically selecting an optimal synchronization mechanism for a data structure in a multiprocessor environment. The method determines a quantity of read-side and write-side acquisitions, and evaluates the data to determine an optimal mode for efficiently operating the computer system while maintaining reduced overhead. The method incorporates data received from the individual units within a central processing system, the quantity of write-side acquisitions in the system, and data which has been subject to secondary measures, such as formatives of digital filters. The data subject to secondary measures includes, but is not limited to, a quantity of read-side acquisitions, a quantity of write-side acquisitions, and a quantity of read-hold durations. Based upon the individual unit data and the system-wide data, including the secondary measures, the operating system may select the most efficient synchronization mechanism from among the mechanisms available. Accordingly, efficiency of a computer system may be enhanced with the ability to selectively choose an optimal synchronization mechanism based upon selected and calculated parameters.	08-28-2008
20080215784	REALTIME-SAFE READ COPY UPDATE WITH PER-PROCESSOR READ/WRITE LOCKS - A technique for realtime-safe detection of a grace period for deferring the destruction of a shared data element until pre-existing references to the data element have been removed. A per-processor read/write lock is established for each of one or more processors. When reading a shared data element at a processor, the processor's read/write lock is acquired for reading, the shared data element is referenced, and the read/write lock that was acquired for reading is released. When starting a new grace period, all of the read/write locks are acquired for writing, a new grace period is started, and all of the read/write locks are released.	09-04-2008
20080229309	REALTIME-SAFE READ COPY UPDATE WITH LOCK-FREE READERS - A technique for realtime-safe detection of a grace period for deferring the destruction of a shared data element until pre-existing references to the data element have been removed. A pair of counters is established for each of one or more processors. A global counter selector determines which counter of each per-processor counter pair is a current counter. When reading a shared data element at a processor, the processor's current counter is incremented. Following counter incrementation, the processor's counter pair is tested for reversal to ensure that the incremented counter is still the current counter. If a counter reversal has occurred, such that the incremented counter is no longer current, the processor's other counter is incremented. Following referencing of the shared data element, any counter that remains incremented is decremented. Following an update to the shared data element wherein a pre-update version of the element is maintained, the global counter selector is switched to establish a new current counter of each per-processor counter pair. The non-current counter of each per-processor counter pair is tested for zero. The shared data element's pre-update version is destroyed upon the non-current counter of each per-processor counter pair being zero.	09-18-2008
20080263337	INSTRUCTIONS FOR ORDERING EXECUTION IN PIPELINED PROCESSES - Ordering instructions for specifying the execution order of other instructions improve throughput in a pipelined multiprocessor. Memory write operations local to a CPU are allowed to occur in an arbitrary order, and constraints are placed on shared memory operations. Multiple sets of instructions are provided in which order of execution of the instructions is maintained through the use of CPU registers, write buffers in conjunction with assignment of sequence numbers to the instruction, or a hierarchical ordering system. The system ensures that an earlier designated instruction has reach a specified state of execution prior to a latter instruction reaching a specified state of execution. The ordering of operations allows memory operations local to a CPU to occur in conjunction with other memory operations that are not affected by such execution.	10-23-2008
20080288749	READ-COPY UPDATE GRACE PERIOD DETECTION WITHOUT ATOMIC INSTRUCTIONS THAT GRACEFULLY HANDLES LARGE NUMBERS OF PROCESSORS - A method, system and computer program product for avoiding unnecessary grace period token processing while detecting a grace period without atomic instructions in a read-copy update subsystem or other processing environment that requires deferring removal of a shared data element until pre-existing references to the data element are removed. Detection of the grace period includes establishing a token to be circulated between processing entities sharing access to the data element. A grace period elapses whenever the token makes a round trip through the processing entities. A distributed indicator associated with each processing entity indicates whether there is a need to perform removal processing on any shared data element. The distributed indicator is processed at each processing entity before the latter engages in token processing. Token processing is performed only when warranted by the distributed indicator. In this way, unnecessary token processing can be avoided when the distributed indicator does not warrant such processing.	11-20-2008
20080313238	Read-Copy Update System And Method - A method, system and computer program product for managing requests for deferred updates to shared data elements while minimizing grace period detection overhead associated with determining whether pre-existing references to the data elements have been removed. Plural update requests that are eligible for grace period detection are buffered without performing grace period detection processing. One or more conditions that could warrant commencement of grace period detection processing are monitored while the update requests are buffered. If warranted by such a condition, grace period detection is performed relative to the update requests so that they can be processed. In this way, grace period detection overhead can be amortized over plural update requests while being sensitive to conditions warranting prompt grace period detection.	12-18-2008
20080320262	READ/WRITE LOCK WITH REDUCED READER LOCK SAMPLING OVERHEAD IN ABSENCE OF WRITER LOCK ACQUISITION - An improved reader-writer locking for synchronizing access to shared data. When writing the shared data, a writer flag is set and a lock is acquired on the shared data. The shared data may be accessed following the expiration of a grace period and a determination that there are no data readers accessing the shared data. When reading the shared data, the writer flag is tested that indicates whether a data writer is attempting to access the shared data. If the writer flag is not set, the shared data is accessed using a relatively fast read mechanism. If the writer flag is set, the shared data is accessed using a relatively slow read mechanism.	12-25-2008
20090006403	EFFICIENTLY BOOSTING PRIORITY OF READ-COPY UPDATE READERS WHILE RESOLVING RACES WITH EXITING AND UNLOCKING PROCESSES - A technique for efficiently boosting the priority of a preemptable data reader while resolving races between the priority boosting and the reader exiting a critical section or terminating in order to eliminate impediments to grace period processing that defers the destruction of one or more shared data elements that may be referenced by the reader until the reader is no longer capable of referencing the one or more data elements. A determination is made that the reader is in a read-side critical section and the reader is designated as a candidate for priority boosting. A verification is made that the reader has not exited its critical section or terminated, and the reader's priority is boosted to expedite its completion of the critical section. The reader's priority is decreased following its completion of the critical section.	01-01-2009
20090055601	Efficient Sharing Of Memory Between Applications Running Under Different Operating Systems On A Shared Hardware System - A system, method and computer program product for efficient sharing of memory between first and second applications running under first and second operating systems on a shared hardware system. The hardware system runs a hypervisor that supports concurrent execution of the first and second operating systems, and further includes a region of shared memory managed on behalf of the first and second applications. Techniques are used to avoid preemption when the first application is accessing the shared memory region. In this way, the second application will not be unduly delayed when attempting to access the shared memory region due to delays stemming from the first application's access of the shared memory region. This is especially advantageous when the second application and operating system are adapted for real-time processing. Additional benefits can be obtained by taking steps to minimize memory access faults.	02-26-2009
20090063826	Quad aware Locking Primitive - A method and computer system for efficiently handling high contention locking in a multiprocessor computer system. At least some of the processors in the system are organized into a hierarchy, and process an interruptible lock in response to the hierarchy. The method utilizes two alternative methods of acquiring the lock, including a conditional lock acquisition primitive and an unconditional lock acquisition primitive, and an unconditional lock release primitive for releasing the lock from a particular processor. To prevent races between processors requesting a lock acquisition and a processor releasing the lock, a release flag is utilized. Furthermore, in order to ensure that the a processor utilizing the unconditional lock acquisition primitive is granted the lock, a handoff flag is utilized.	03-05-2009
20090077080	FAST PATH FOR GRACE-PERIOD DETECTION FOR READ-COPY UPDATE SYSTEM - A technique for implementing fast path grace period detection for deferring the destruction of a shared data element until pre-existing references to the data element are removed. A check is made, without using locks to exclude other updaters, for the presence of readers that are accessing the shared data elements. Grace period detection is terminated to initiate deferred destruction of the data element if there are no readers accessing the shared data element. If there are readers accessing the shared data element, a lock is implemented and another check is made for the presence of the readers.	03-19-2009
20090138896	Providing a computing system with real-time capabilities - A computing system is provided with real-time capabilities so that the system is capable of running applications such that one or more real-time criteria are satisfied. An interrupt architecture of the computing system is disabled. The interrupt architecture generates interrupts sent to a firmware of the computing system in response to events. A different architecture is substituted within the computing system for the interrupt architecture. The different architecture is responsive to the events without violating the real-time criteria. In response to the events occurring, the different architecture causes one or more corrective actions to be performed.	05-28-2009
20090147612	SYSTEM FOR CONTROLLING MEMORY POWER CONSUMPTION IN INTEGRATED DEVICES - A method for reducing power consumption in integrated devices is provided. The method comprising: locating a plurality of unused blocks of memory and a plurality of used blocks of memory in a memory device tlhrough a tracking mechanism; performing a refresh operation on the plurality of used blocks of memory at a constant frequency and suppressing the refresh operation on the plurality of unused blocks of memory through a memory controller; and suppressing error correction codes or parity errors on at least one of the plurality of unused blocks of memory when the at least one of the plurality of unused blocks of memory is accessed through the memory controller.	06-11-2009
20090254764	Optimizing Preemptible Read-Copy Update for Low-Power Usage by Avoiding Unnecessary Wakeups - A technique for low-power detection of a grace period for deferring the destruction of a shared data element until pre-existing references to the data element have been removed. A grace period processing action is implemented that requires a response from a processor that may be running a preemptible reader of said shared data element before further grace period processing can proceed. A power and reader status of the processor is also determined. Grace period processing may proceed despite the absence of a response from the processor if the power and reader status indicates that an actual response from the processor is unnecessary.	10-08-2009
20090292705	EFFICIENT SUPPORT OF CONSISTENT CYCLIC SEARCH WITH READ-COPY UPDATE AND PARALLEL UPDATES - A method, system and computer program product for supporting concurrent updates to a shared data element group while preserving group integrity on behalf of one or more readers that are concurrently referencing group data elements without using locks or atomic instructions. Two or more updaters may be invoked to generate new group data elements. Each new data element created by the same updater is assigned a new generation number that is different than a global generation number associated with the data element group and which allows a reader of the data element group to determine whether the new data element is a correct version for the reader. The new generation numbers are different for each updater and assigned according to an order in which the updaters respectively begin update operations. The global generation number is updated so that when all of the updaters have completed data element update processing, the global generation number will correspond to the new generation number that is associated with the last of the updaters to begin update operations.	11-26-2009
20100010783	Moving physical objects from original physical site to user-specified locations at destination physical site - Dimensions of each physical object to be moved from an original physical site to a destination physical site are at least approximately determined. Each physical object is tagged with an identifier. A virtual layout of the destination physical site and the dimensions and the identifier of each physical object are input into a computer program. A user-specified location of where within the destination physical site each physical object is to be placed when moved to the destination physical site is input using the computer program, based on the virtual layout of the destination physical site and the dimensions and the identifier of each physical object. At the destination physical site, each physical object is looked up using the identifier of the physical object to determine the user-specified location of where to place the physical object within the destination physical site.	01-14-2010
20100023946	USER-LEVEL READ-COPY UPDATE THAT DOES NOT REQUIRE DISABLING PREEMPTION OR SIGNAL HANDLING - A user-level read-copy update (RCU) technique. A user-level RCU subsystem executes within threads of a user-level multithreaded application. The multithreaded application may include reader threads that read RCU-protected data elements in a shared memory and updater threads that update such data elements. The reader and updater threads may be preemptible and comprise signal handlers that process signals. Reader registration and unregistration components in the RCU subsystem respectively register and unregister the reader threads for RCU critical section processing. These operations are performed while the reader threads remain preemptible and with their signal handlers being operational. A grace period detection component in the RCU subsystem considers a registration status of the reader threads and determines when it is safe to perform RCU second-phase update processing to remove stale versions of updated data elements that are being referenced by the reader threads, or take other RCU second-phase update processing actions.	01-28-2010
20100318502	UPDATING FIRST DATA VALUE BEFORE SECOND DATA VALUE - A flag and a wait period are used to guarantee that readers of two data values see the updated first value before they see the updated second value, where the second value has to be updated after the first value is updated and thus is dependent on the first value. The first value is updated, and a flag associated with the first data value is set. The flag effectively prevents further updating of the first data value until it has been cleared. A length of time is waited for, such that any reading of the first data value and the second data value is guaranteed to not see the second data value as updated unless the first data value is also seen as updated. The flag is then cleared, such that further updating of the first data value can again occur. The second data value is finally updated.	12-16-2010
20110055183	High Performance Real-Time Read-Copy Update - A technique for reducing reader overhead when referencing a shared data element while facilitating realtime-safe detection of a grace period for deferring destruction of the shared data element. The grace period is determined by a condition in which all readers that are capable of referencing the shared data element have reached a quiescent state subsequent to a request for a quiescent state. Common case local quiescent state tracking may be performed using only local per-reader state information for all readers that have not blocked while in a read-side critical section in which the data element is referenced. Uncommon case non-local quiescent state tracking may be performed using non-local multi-reader state information for all readers that have blocked while in their read-side critical section. The common case local quiescent state tracking requires less processing overhead than the uncommon case non-local quiescent state tracking.	03-03-2011
20110055630	Safely Rolling Back Transactions In A Transactional Memory System With Concurrent Readers - A technique for safely rolling back transactional memory transactions without impacting concurrent readers of the uncommitted transaction data. An updater uses a transactional memory technique to perform an data update on data that is shared with a reader. The update is implemented as a transaction in which the updated data is initially uncommitted due to the transaction being subject to roll back. The reader is allowed to perform a data read on the uncommitted data during the transaction. Upon a rollback of the transaction, reclamation of memory locations used by the uncommitted data is deferred until a grace period has elapsed after which the reader can no longer be referencing the uncommitted data.	03-03-2011
20110137962	Applying Limited-Size Hardware Transactional Memory To Arbitrarily Large Data Structure - A technique for applying hardware transaction memory to an arbitrarily large data structure is disclosed. A data updater traverses the data structure to locate an update point using a lockless synchronization technique that synchronizes the data updater with other updaters that may be concurrently updating the data structure. At the update point, the updater performs an update on the data structure using a hardware transactional memory transaction that operates at the update point.	06-09-2011
20110283082	Scalable, Concurrent Resizing Of Hash Tables - A system, method and computer program product for resizing a hash table while supporting hash table scalability and concurrency. The hash table has one or more hash buckets each containing one or more items that are chained together in a linked list. Each item in the hash table is processed to determine if the item requires relocation from a first bucket associated with a first table size to second bucket associated with a second table size. If the item requires relocation, it is linked to the second bucket without moving or copying the item in memory. The item is unlinked from the first bucket after waiting until there is no current hash table reader whose search of the hash table could be affected by the unlinking, again without moving or copying the item in memory.	11-17-2011
20120047140	Cluster-Wide Read-Copy Update System And Method - A system, method and computer program product for synchronizing updates to shared mutable data in a clustered data processing system. A data element update operation is performed at each node of the cluster while preserving a pre-update view of the shared mutable data, or an associated operational mode, on behalf of readers that may be utilizing the pre-update view. A request is made for detection of a grace period, and grace period detection processing is performed for detecting when the cluster-wide grace period has occurred. When it does, a deferred action associated with the update operation it taken, such as removal of a pre-update view of the data element or termination of an associated mode of operation.	02-23-2012
20120079301	Making Read-Copy Update Free-Running Grace Period Counters Safe Against Lengthy Low Power State Sojourns - A technique for making a free-running grace period counter safe against lengthy low power state processor sojourns. The grace period counter tracks grace periods that determine when processors that are capable executing read operations have passed through a quiescent state that guarantees the readers will no longer maintain references to shared data. Periodically, one or more processors may be placed in a low power state in which the processors discontinue performing grace period detection operations. Such processors may remain in the low power state for a complete cycle of the grace period counter. This scenario can potentially disrupt grace period detection operations if the processors awaken to see the same grace period counter value. To rectify this situation, processors in a low power state may be periodically awakened at a predetermined point selected prevent the low power state from extending for an entire roll over of the grace period counter.	03-29-2012
20120144129	High Performance Real-Time Read-Copy Update - A technique for reducing reader overhead when referencing a shared data element while facilitating realtime-safe detection of a grace period for deferring destruction of the shared data element. The grace period is determined by a condition in which all readers that are capable of referencing the shared data element have reached a quiescent state subsequent to a request for a quiescent state. Common case local quiescent state tracking may be performed using only local per-reader state information for all readers that have not blocked while in a read-side critical section in which the data element is referenced. Uncommon case non-local quiescent state tracking may be performed using non-local multi-reader state information for all readers that have blocked while in their read-side critical section. The common case local quiescent state tracking requires less processing overhead than the uncommon case non-local quiescent state tracking.	06-07-2012
20120324170	Read-Copy Update Implementation For Non-Cache-Coherent Systems - A technique for implementing read-copy update in a shared-memory computing system having two or more processors operatively coupled to a shared memory and to associated incoherent caches that cache copies of data stored in the memory. According to example embodiments disclosed herein, cacheline information for data that has been rendered obsolete due to a data update being performed by one of the processors is recorded. The recorded cacheline information is communicated to one or more of the other processors. The one or more other processors use the communicated cacheline information to flush the obsolete data from all incoherent caches that may be caching such data.	12-20-2012
20120324461	Effective Management Of Blocked-Tasks In Preemptible Read-Copy Update - A technique for managing read-copy update readers that have been preempted while executing in a read-copy update read-side critical section. A single blocked-tasks list is used to track preempted reader tasks that are blocking an asynchronous grace period, preempted reader tasks that are blocking an expedited grace period, and preempted reader tasks that require priority boosting. In example embodiments, a first pointer may be used to segregate the blocked-tasks list into preempted reader tasks that are and are not blocking a current asynchronous grace period. A second pointer may be used to segregate the blocked-tasks list into preempted reader tasks that are and are not blocking an expedited grace period. A third pointer may be used to segregate the blocked-tasks list into preempted reader tasks that do and do not require priority boosting.	12-20-2012
20120324473	Effective Management Of Blocked-Tasks In Preemptible Read-Copy Update - A technique for managing read-copy update readers that have been preempted while executing in a read-copy update read-side critical section. A single blocked-tasks list is used to track preempted reader tasks that are blocking an asynchronous grace period, preempted reader tasks that are blocking an expedited grace period, and preempted reader tasks that require priority boosting. In example embodiments, a first pointer may be used to segregate the blocked-tasks list into preempted reader tasks that are and are not blocking a current asynchronous grace period. A second pointer may be used to segregate the blocked-tasks list into preempted reader tasks that are and are not blocking an expedited grace period. A third pointer may be used to segregate the blocked-tasks list into preempted reader tasks that do and do not require priority boosting.	12-20-2012
20120331237	Asynchronous Grace-Period Primitives For User-Space Applications - A technique for implementing user-level read-copy update (RCU) with support for asynchronous grace periods. In an example embodiment, a user-level RCU subsystem is established that executes within threads of a user-level multithreaded application. The multithreaded application may comprise one or more reader threads that read RCU-protected data elements in a shared memory. The multithreaded application may further comprise one or more updater threads that perform updates to the RCU-protected data elements in the shared memory and register callbacks to be executed following a grace period in order to free stale data resulting from the updates. The RCU subsystem may implement two or more helper threads (helpers) that are created or selected as needed to track grace periods and execute the callbacks on behalf of the updaters instead of the updaters performing such work themselves.	12-27-2012
20120331238	Asynchronous Grace-Period Primitives For User-Space Applications - A technique for implementing user-level read-copy update (RCU) with support for asynchronous grace periods. In an example embodiment, a user-level RCU subsystem is established that executes within threads of a user-level multithreaded application. The multithreaded application may comprise one or more reader threads that read RCU-protected data elements in a shared memory. The multithreaded application may further comprise one or more updater threads that perform updates to the RCU-protected data elements in the shared memory and register callbacks to be executed following a grace period in order to free stale data resulting from the updates. The RCU subsystem may implement two or more helper threads (helpers) that are created or selected as needed to track grace periods and execute the callbacks on behalf of the updaters instead of the updaters performing such work themselves.	12-27-2012
20130061071	Energy Efficient Implementation Of Read-Copy Update For Light Workloads Running On Systems With Many Processors - A technique for determining if a processor in a multiprocessor system implementing a read-copy update (RCU) subsystem may be placed in low power state. The technique may include determining whether the processor has any RCU callbacks that are ready for invocation or the RCU subsystem requires grace period advancement processing from the processor. The processor may be placed in a low power state if either (1) a first condition holds wherein the processor has one or more pending RCU callbacks, but does not have any RCU callbacks that are ready for invocation and the RCU subsystem does not require grace period advancement processing from the processor, (2) a second condition holds wherein the processor does not have any pending RCU callbacks.	03-07-2013
20130138896	Reader-Writer Synchronization With High-Performance Readers And Low-Latency Writers - Data writers desiring to update data without unduly impacting concurrent readers perform a synchronization operation with respect to plural processors or execution threads. The synchronization operation is parallelized using a hierarchical tree having a root node, one or more levels of internal nodes and as many leaf nodes as there are processors or threads. The tree is traversed from the root node to a lowest level of the internal nodes and the following node processing is performed for each node: (1) check the node's children, (2) if the children are leaf nodes, perform the synchronization operation relative to each leaf node's associated processor or thread, and (3) if the children are internal nodes, fan out and repeat the node processing with each internal node representing a new root node. The foregoing node processing is continued until all processors or threads associated with the leaf nodes have performed the synchronization operation.	05-30-2013
20130151488	Optimized Resizing For RCU-Protected Hash Tables - A technique for resizing a first RCU-protected hash table stored in a memory. A second RCU-protected hash table is allocated in the memory as a resized version of the first hash table having a different number of hash buckets, with the hash buckets being defined but initially having no hash table elements. The second hash table is populated by linking each hash bucket thereof to all hash buckets of the first hash table containing elements that hash to the second hash bucket. The second hash table is then published so that it is available for searching by hash table readers. The first table is freed from memory after waiting for a grace period which guarantees that no readers searching the first hash table will be affected by the freeing.	06-13-2013
20130151489	Optimized Deletion And Insertion For High-Performance Resizable RCU-Protected Hash Tables - Concurrent resizing and modification of a first RCU-protected hash table includes allocating a second RCU-protected hash table, populating it by linking each hash bucket of the second hash table to all hash buckets of the first hash table containing elements that hash to the second hash table bucket, and publishing the second hash table. If the modifying comprises insertion, a new element is inserted at the head of a corresponding bucket in the second hash table. If the modifying comprises deletion, then within an RCU read-side critical section: (1) all pointers in hash buckets of the first and second hash tables that reference the element being deleted are removed or redirected, and (2) the element is freed following a grace period that protects reader references to the deleted element. The first table is freed from memory after awaiting a grace period that protects reader references to the first hash table.	06-13-2013
20130151524	Optimized Resizing For RCU-Protected Hash Tables - A technique for resizing a first RCU-protected hash table stored in a memory. A second RCU-protected hash table is allocated in the memory as a resized version of the first hash table having a different number of hash buckets, with the hash buckets being defined but initially having no hash table elements. The second hash table is populated by linking each hash bucket thereof to all hash buckets of the first hash table containing elements that hash to the second hash bucket. The second hash table is then published so that it is available for searching by hash table readers. The first table is freed from memory after waiting for a grace period which guarantees that no readers searching the first hash table will be affected by the freeing.	06-13-2013
20130151798	Expedited Module Unloading For Kernel Modules That Execute Read-Copy Update Callback Processing Code - A technique for expediting the unloading of an operating system kernel module that executes read-copy update (RCU) callback processing code in a computing system having one or more processors. According to embodiments of the disclosed technique, an RCU callback is enqueued so that it can be processed by the kernel module's callback processing code following completion of a grace period in which each of the one or more processors has passed through a quiescent state. An expediting operation is performed to expedite processing of the RCU callback. The RCU callback is then processed and the kernel module is unloaded.	06-13-2013
20130151811	Optimized Deletion And Insertion For High-Performance Resizable RCU-Protected Hash Tables - Concurrent resizing and modification of a first RCU-protected hash table includes allocating a second RCU-protected hash table, populating it by linking each hash bucket of the second hash table to all hash buckets of the first hash table containing elements that hash to the second hash table bucket, and publishing the second hash table. If the modifying comprises insertion, a new element is inserted at the head of a corresponding bucket in the second hash table. If the modifying comprises deletion, then within an RCU read-side critical section: (1) all pointers in hash buckets of the first and second hash tables that reference the element being deleted are removed or redirected, and (2) the element is freed following a grace period that protects reader references to the deleted element. The first table is freed from memory after awaiting a grace period that protects reader references to the first hash table.	06-13-2013
20130152095	Expedited Module Unloading For Kernel Modules That Execute Read-Copy Update Callback Processing Code - A technique for expediting the unloading of an operating system kernel module that executes read-copy update (RCU) callback processing code in a computing system having one or more processors. According to embodiments of the disclosed technique, an RCU callback is enqueued so that it can be processed by the kernel module's callback processing code following completion of a grace period in which each of the one or more processors has passed through a quiescent state. An expediting operation is performed to expedite processing of the RCU callback. The RCU callback is then processed and the kernel module is unloaded.	06-13-2013
20130311995	Resolving RCU-Scheduler Deadlocks - A technique for resolving deadlocks between an RCU subsystem and an operating system scheduler. An RCU reader manipulates a counter when entering and exiting an RCU read-side critical section. At the entry, the counter is incremented. At the exit, the counter is manipulated differently depending on the counter value. A first counter manipulation path is taken when the counter indicates a task-context RCU reader is exiting an outermost RCU read-side critical section. This path includes condition-based processing that may result in invocation of the operating system scheduler. The first path further includes a deadlock protection operation that manipulates the counter to prevent an intervening RCU reader from taking the same path. The second manipulation path is taken when the counter value indicates a task-context RCU reader is exiting a non-outermost RCU read-side critical section, or an RCU reader is nested within the first path. This path bypasses the condition-based processing.	11-21-2013
20140006820	Energy Efficient Implementation Of Read-Copy Update For Light Workloads Running On Systems With Many Processors	01-02-2014
20140089596	Read-Copy Update Implementation For Non-Cache-Coherent Systems - A technique for implementing read-copy update in a shared-memory computing system having two or more processors operatively coupled to a shared memory and to associated incoherent caches that cache copies of data stored in the memory. According to example embodiments disclosed herein, cacheline information for data that has been rendered obsolete due to a data update being performed by one of the processors is recorded. The recorded cacheline information is communicated to one or more of the other processors. The one or more other processors use the communicated cacheline information to flush the obsolete data from all incoherent caches that may be caching such data.	03-27-2014
20140089606	Reader-Writer Synchronization With High-Performance Readers And Low-Latency Writers - Data writers desiring to update data without unduly impacting concurrent readers perform a synchronization operation with respect to plural processors or execution threads. The synchronization operation is parallelized using a hierarchical tree having a root node, one or more levels of internal nodes and as many leaf nodes as there are processors or threads. The tree is traversed from the root node to a lowest level of the internal nodes and the following node processing is performed for each node: (1) check the node's children, (2) if the children are leaf nodes, perform the synchronization operation relative to each leaf node's associated processor or thread, and (3) if the children are internal nodes, fan out and repeat the node processing with each internal node representing a new root node. The foregoing node processing is continued until all processors or threads associated with the leaf nodes have performed the synchronization operation.	03-27-2014
20140089939	Resolving RCU-Scheduler Deadlocks - A technique for resolving deadlocks between an RCU subsystem and an operating system scheduler. An RCU reader manipulates a counter when entering and exiting an RCU read-side critical section. At the entry, the counter is incremented. At the exit, the counter is manipulated differently depending on the counter value. A first counter manipulation path is taken when the counter indicates a task-context RCU reader is exiting an outermost RCU read-side critical section. This path includes condition-based processing that may result in invocation of the operating system scheduler. The first path further includes a deadlock protection operation that manipulates the counter to prevent an intervening RCU reader from taking the same path. The second manipulation path is taken when the counter value indicates a task-context RCU reader is exiting a non-outermost RCU read-side critical section, or an RCU reader is nested within the first path. This path bypasses the condition-based processing.	03-27-2014
20140108365	Performance Of RCU-Based Searches And Updates Of Cyclic Data Structures - A technique for improving the performance of RCU-based searches and updates to a shared data element group where readers must see consistent data with respect to the group as a whole. An updater creates one or more new group data elements and assigns each element a new generation number that is different than a global generation number associated with the data element group, allowing readers to track update versions. The updater links the new data elements into the data element group and then updates the global generation number so that referential integrity is maintained. This is done using a generation number element that is referenced by a header pointer for the data element group, and which in turn references or forms part of one of the data elements. After a grace period has elapsed, the any prior version of the generation number element may be freed.	04-17-2014
20140108366	Performance Of RCU-Based Searches And Updates Of Cyclic Data Structures - A technique for improving the performance of RCU-based searches and updates to a shared data element group where readers must see consistent data with respect to the group as a whole. An updater creates one or more new group data elements and assigns each element a new generation number that is different than a global generation number associated with the data element group, allowing readers to track update versions. The updater links the new data elements into the data element group and then updates the global generation number so that referential integrity is maintained. This is done using a generation number element that is referenced by a header pointer for the data element group, and which in turn references or forms part of one of the data elements. After a grace period has elapsed, the any prior version of the generation number element may be freed.	04-17-2014
20140223119	In-Kernel SRCU Implementation With Reduced OS Jitter - A technique for implementing SRCU with reduced OS jitter may include: (1) providing a pair of critical section counters for each CPU; (2) when entering an SRCU read-side critical section, incrementing one of the critical section counters associated with a first grace period; (3) when exiting an SRCU read-side critical section, decrementing one of the critical section counters associated with the first grace period; (4) when performing a data update, initiating the second grace period and performing a counter summation operation that sums the critical section counters associated with the first grace period to generate a critical section counter sum; (5) storing a snapshot value for each critical section counter during the summing; and (6) if the critical section counter sum indicates there are no active SRCU read-side critical sections for the first grace period, rechecking by comparing the snapshot values to current values of the critical section counters.	08-07-2014
20140223242	Motivating Lazy RCU Callbacks Under Out-Of-Memory Conditions - A technique for motivating lazy RCU callbacks under out-of-memory conditions. In response to detecting an actual or potential OOM condition, non-lazy callback processing is performed for all processors whose RCU callback lists are non-empty due to at least one callback permitting lazy callback processing being present.	08-07-2014
20140281267	Enabling Hardware Transactional Memory To Work More Efficiently With Readers That Can Tolerate Stale Data - A technique for enabling hardware transactional memory (HTM) to work more efficiently with readers that can tolerate stale data. In an embodiment, a pre-transaction load request is received from one of the readers, the pre-transaction load request signifying that the reader can tolerate pre-transaction data. A determination is made whether the pre-transaction load request comprises data that has been designated for update by a concurrent HTM transaction. If so, a cache line containing the data is marked as pre-transaction data. The concurrent HTM transaction proceeds without aborting notwithstanding the pre-transaction load request.	09-18-2014
20140281268	Enabling Hardware Transactional Memory To Work More Efficiently With Readers That Can Tolerate Stale Data - A technique for enabling hardware transactional memory (HTM) to work more efficiently with readers that can tolerate stale data. In an embodiment, a pre-transaction load request is received from one of the readers, the pre-transaction load request signifying that the reader can tolerate pre-transaction data. A determination is made whether the pre-transaction load request comprises data that has been designated for update by a concurrent HTM transaction. If so, a cache line containing the data is marked as pre-transaction data. The concurrent HTM transaction proceeds without aborting notwithstanding the pre-transaction load request.	09-18-2014
20140281295	Expediting RCU Grace Periods Under User Mode Control - A technique for supporting user mode specification of RCU grace period latency to an operating system kernel-level RCU implementation. Non-expedited and expedited RCU grace period mechanisms are provided for invocation by RCU updaters performing RCU update operations to respectively initiate non-expedited and expedited grace periods. An expedited grace period indicator in a kernel memory space is provided for indicating whether a non-expedited RCU grace period or an expedited RCU grace period should be invoked. The non-expedited RCU grace period mechanism is adapted to check the expedited grace period indicator, and if an expedited RCU grace period is indicated, to invoke the expedited grace period mechanism. A communication mechanism is provided for use by a user mode application executing in a user memory space to manipulate the expedited grace period indicator in the kernel memory space, and thereby control whether an expedited or non-expedited RCU grace period should be used.	09-18-2014
20140351231	Low Overhead Contention-Based Switching Between Ticket Lock And Queued Lock - A technique for low overhead contention-based switching between ticket locking and queued locking to access shared data may include establishing a ticket lock, establishing a queue lock, operating in ticket lock mode using the ticket lock to access the shared data during periods of relatively low data contention, and operating in queue lock mode using the queue lock to access the shared data during periods of relatively high data contention.	11-27-2014
20140379676	Highly Scalable Tree-Based Trylock - A tree-based trylock technique for reducing contention on a root trylock includes attempting to acquire a trylock at each node of a tree-based hierarchical node structure while following a traversal path that begins at a leaf node, passes through one or more of internal nodes, and ends at a root node having the root trylock. The trylock acquisition operation succeeds if each trylock on the traversal path is acquired, and fails if any trylock on the traversal path cannot be acquired. A trylock housekeeping operation releases all non-root trylocks visited by the trylock acquisition operation, such that if the trylock acquisition operation succeeds, only the root trylock will be remain acquired at the end of the operation, and if the trylock acquisition operation fails, none of the trylocks will be remain acquired at the end of the operation.	12-25-2014
20140379678	Highly Scalable Tree-Based Trylock - A tree-based trylock technique for reducing contention on a root trylock includes attempting to acquire a trylock at each node of a tree-based hierarchical node structure while following a traversal path that begins at a leaf node, passes through one or more of internal nodes, and ends at a root node having the root trylock. The trylock acquisition operation succeeds if each trylock on the traversal path is acquired, and fails if any trylock on the traversal path cannot be acquired. A trylock housekeeping operation releases all non-root trylocks visited by the trylock acquisition operation, such that if the trylock acquisition operation succeeds, only the root trylock will be remain acquired at the end of the operation, and if the trylock acquisition operation fails, none of the trylocks will be remain acquired at the end of the operation.	12-25-2014
20140380084	Detecting Full-System Idle State In Adaptive-Tick Kernels - A technique for detecting full-system idle state in an adaptive-tick kernel includes detecting non-timekeeping CPU idle state, initiating a hysteresis period, waiting for the hysteresis period to end, manipulating a data structure whose state indicates whether a scheduling clock tick may be disabled on all CPUs, and disabling the scheduling clock tick if the data structure is in an appropriate state. In a first embodiment, non-timekeeping CPUs manipulate a global counter when entering an idle state, but add hysteresis to avoid thrashing the counter. Timekeeping is turned off based on the count maintained on the global counter. In a second embodiment, a Read-Copy Update (RCU) dynticks-idle subsystem running on a timekeeping CPU manipulates a global state variable whose states indicate whether all non-timekeeping CPUs are in an idle state, and if so, for how long. Timekeeping is turned off based on the state of the global state variable.	12-25-2014
20150074309	SIGNAL INTERRUPTS IN A TRANSACTIONAL MEMORY SYSTEM - In some embodiments, an apparatus includes a processor that is configured to execute computer usable program code to perform operations. The operations include executing an atomic transaction in a system having a transactional memory. The operations include receiving a signal interrupt during executing of the atomic transaction. The operations include storing a state of the signal interrupt to enable subsequent execution of the signal interrupt. The operations include returning to executing the atomic transaction until the atomic transaction is at least one of completed and aborted. The operations include after executing the atomic transaction is at least one of completed and aborted, determining whether the signal interrupt is received during executing of the atomic transaction. The operations include after determining that the signal interrupt is received during executing of the atomic transaction, retrieving the state of the signal interrupt.	03-12-2015
20150074311	SIGNAL INTERRUPTS IN A TRANSACTIONAL MEMORY SYSTEM - In some embodiments, a method includes executing an atomic transaction in a system having a transactional memory. The method includes receiving a signal interrupt during executing of the atomic transaction. The method includes storing a state of the signal interrupt to enable subsequent execution of the signal interrupt. The method includes returning to executing the atomic transaction until the atomic transaction is at least one of completed and aborted. The method includes after executing the atomic transaction is at least one of completed and aborted, determining whether the signal interrupt is received during executing of the atomic transaction. The method includes after determining that the signal interrupt is received during executing of the atomic transaction, retrieving the state of the signal interrupt. The method includes executing an interrupt handler for processing the signal interrupt and returning from executing of the atomic transaction.	03-12-2015

Patent applications by Paul E. Mckenney, Beaverton, OR US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Paul E. Mckenney, Beaverton US

Paul E. Mckenney, Beaverton, OR US