Tong Chen, Yorktown Heights US

Tong Chen, Yorktown Heights, NY US

Patent application number	Description	Published
20080229291	Compiler Implemented Software Cache Apparatus and Method in which Non-Aliased Explicitly Fetched Data are Excluded - A compiler implemented software cache apparatus and method in which non-aliased explicitly fetched data are excluded are provided. With the mechanisms of the illustrative embodiments, a compiler uses a forward data flow analysis to prove that there is no alias between the cached data and explicitly fetched data. Explicitly fetched data that has no alias in the cached data are excluded from the software cache. Explicitly fetched data that has aliases in the cached data are allowed to be stored in the software cache. In this way, there is no runtime overhead to maintain the correctness of the two copies of data. Moreover, the number of lines of the software cache that must be protected from eviction is decreased. This leads to a decrease in the amount of computation cycles required by the cache miss handler when evicting cache lines during cache miss handling.	09-18-2008
20090070753	INCREASE THE COVERAGE OF PROFILING FEEDBACK WITH DATA FLOW ANALYSIS - The present invention provides a system and method for profiling based optimization of a computer program. The system includes an optimization module that profiles feedback from profiled part of a program to a part of the program that was not reached, an identical expressions model that identifies at least one identical expression in the program that have not been profiled and copies alias profiling result from a profiled reference to the reference that has not been profiled, a speculative identical expressions model that identifies at least one speculative identical expression in the program that have not been profiled and copies alias profiling result from a profiled speculative identical reference to the speculative identical reference that has not been profiled, and a similar expressions model that identifies at least one similar expression in the program that have not been profiled and copies alias profiling result from a similar profiled reference to the similar reference that has not been profiled.	03-12-2009
20090254711	Reducing Cache Pollution of a Software Controlled Cache - Reducing cache pollution of a software controlled cache is provided. A request is received to prefetch data into the software controlled cache. A first designator is set for a first cache access to a first value. If there is the second cache access to prefetch, a determination is made as to whether data associated with the second cache access exists in the software controlled cache. If the data is in the software controlled cache, a determination is made as to whether a second value of a second designator is greater than the first value of the first cache access. If the second value fails to be greater than the first value, the position of the first cache access and the second cache access in a cache line is swapped. The first value is decremented by a predetermined amount and the second value is replaced to equal the first value.	10-08-2009
20090254733	Dynamically Controlling a Prefetching Range of a Software Controlled Cache - Dynamically controlling a prefetching range of a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain irregular memory references. For each irregular memory reference in the source code, the compiler determines whether the irregular memory reference is a candidate for optimization. Responsive to identifying an irregular memory reference that may be optimized, the complier determines whether the irregular memory reference is valid for prefetching. If the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library to dynamically prefetch the irregular memory references. Data associated with the irregular memory references are dynamically prefetched into the software controlled cache when the runtime library call is invoked.	10-08-2009
20090254895	Prefetching Irregular Data References for Software Controlled Caches - Prefetching irregular memory references into a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain an irregular memory reference. The compiler determines if the irregular memory reference within the at least one loop is a candidate for optimization. Responsive to an indication that the irregular memory reference may be optimized, the compiler determines if the irregular memory reference is valid for prefetching. Responsive to an indication that the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library for the irregular memory reference. Data associated with the irregular memory reference is prefetched into the software controlled cache when the runtime library call is invoked.	10-08-2009
20090293047	Reducing Runtime Coherency Checking with Global Data Flow Analysis - Reducing runtime coherency checking using global data flow analysis is provided. A determination is made as to whether a call is for at least one of a DMA get operation or a DMA put operation in response to the call being issued during execution of a compiled and optimized code. A determination is made as to whether a software cache write operation has been issued since a last flush operation in response to the call being the DMA get operation. A DMA get runtime coherency check is then performed in response to the software cache write operation being issued since the last flush operation.	11-26-2009
20090293048	Computer Analysis and Runtime Coherency Checking - Compiler analysis and runtime coherency checking for reducing coherency problems is provided. Source code is analyzed to identify at least one of a plurality of loops that contains a memory reference. A determination is made as to whether the memory reference is an access to a global memory that should be handled by at least one of a software controlled cache or a direct buffer. A determination is made as to whether there is a data dependence between the memory reference and at least one reference from at least one of other direct buffers or other software controlled caches in response to an indication that the memory reference is an access to the global memory that should be handled by either the software controlled cache or the direct buffer. A direct buffer transformation is applied to the memory reference in response to a negative indication of the data dependence.	11-26-2009
20100023700	Dynamically Maintaining Coherency Within Live Ranges of Direct Buffers - Reducing coherency problems in a data processing system is provided. Source code that is to be compiled is received and analyzed to identify at least one of a plurality of loops that contain a memory reference. A determination is made as to whether the memory reference is an access to a global memory that should be handled by a direct buffer. Responsive to an indication that the memory reference is an access to the global memory that should be handled by the direct buffer, the memory reference is marked for direct buffer transformation. The direct buffer transformation is then applied to the memory reference.	01-28-2010
20100088673	Optimized Code Generation Targeting a High Locality Software Cache - Mechanisms for optimized code generation targeting a high locality software cache are provided. Original computer code is parsed to identify memory references in the original computer code. Memory references are classified as either regular memory references or irregular memory references. Regular memory references are controlled by a high locality cache mechanism. Original computer code is transformed, by a compiler, to generate transformed computer code in which the regular memory references are grouped into one or more memory reference streams, each memory reference stream having a leading memory reference, a trailing memory reference, and one or more middle memory references. Transforming of the original computer code comprises inserting, into the original computer code, instructions to execute initialization, lookup, and cleanup operations associated with the leading memory reference and trailing memory reference in a different manner from initialization, lookup, and cleanup operations for the one or more middle memory references.	04-08-2010
20110093687	MANAGING MULTIPLE SPECULATIVE ASSIST THREADS AT DIFFERING CACHE LEVELS - An illustrative embodiment provides a computer-implemented process for managing multiple speculative assist threads for data pre-fetching that sends a command from an assist thread of a first processor to second processor and a memory, wherein parameters of the command specify a processor identifier of the second processor, responsive to receiving the command, reply by the second processor indicating an ability to receive a cache line that is a target of a pre-fetch, responsive to receiving the command replying by the memory indicating a capability to provide the cache line, responsive to receiving replies from the second processor and the memory, sending, by the first processor, a combined response to the second processor and the memory, wherein the combined response indicates an action, and responsive to the action indicating a transaction can continue sending the requested cache line, by the memory, to the second processor into a target cache level on the second processor.	04-21-2011
20110093838	MANAGING SPECULATIVE ASSIST THREADS - An illustrative embodiment provides a computer-implemented process for managing speculative assist threads for data pre-fetching that analyzes collected source code and cache profiling information to identify a code region containing a delinquent load instruction and generates an assist thread, including a value for a local version number, at a program entry point within the identified code region. Upon activation of the assist thread the local version number of the assist thread is compared to the global unique version number of the main thread for the identified code region and an iteration distance between the assist thread relative to the main thread is compared to a predefined value. The assist thread is executed when the local version number of the assist thread matches the global unique version number of the main thread, and the iteration distance between the assist thread relative to the main thread is within a predefined range of values.	04-21-2011
20110283067	Target Memory Hierarchy Specification in a Multi-Core Computer Processing System - Target memory hierarchy specification in a multi-core computer processing system is provided including a system for implementing prefetch instructions. The system includes a first core processor, a dedicated cache corresponding to the first core processor, and a second core processor. The second core processor includes instructions for executing a prefetch instruction that specifies a memory location and the dedicated local cache corresponding to the first core processor. Executing the prefetch instruction includes retrieving data from the memory location and storing the retrieved data on the dedicated local cache corresponding to the first core processor.	11-17-2011
20110320785	Binary Rewriting in Software Instruction Cache - Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache.	12-29-2011
20110320786	Dynamically Rewriting Branch Instructions in Response to Cache Line Eviction - Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.	12-29-2011
20110321002	Rewriting Branch Instructions Using Branch Stubs - Mechanisms are provided for rewriting branch instructions in a portion of code. The mechanisms receive a portion of source code having an original branch instruction. The mechanisms generate a branch stub for the original branch instruction. The branch stub stores information about the original branch instruction including an original target address of the original branch instruction. Moreover, the mechanisms rewrite the original branch instruction so that a target of the rewritten branch instruction references the branch stub. In addition, the mechanisms output compiled code including the rewritten branch instruction and the branch stub for execution by a computing device. The branch stub is utilized by the computing device at runtime to determine if execution of the rewritten branch instruction can be redirected directly to a target instruction corresponding to the original target address in an instruction cache of the computing device without intervention by an instruction cache runtime system.	12-29-2011
20110321021	Arranging Binary Code Based on Call Graph Partitioning - Mechanisms are provided for arranging binary code to reduce instruction cache conflict misses. These mechanisms generate a call graph of a portion of code. Nodes and edges in the call graph are weighted to generate a weighted call graph. The weighted call graph is then partitioned according to the weights, affinities between nodes of the call graph, and the size of cache lines in an instruction cache of the data processing system, so that binary code associated with one or more subsets of nodes in the call graph are combined into individual cache lines based on the partitioning. The binary code corresponding to the partitioned call graph is then output for execution in a computing device.	12-29-2011
20120005457	USING SOFTWARE-CONTROLLED SMT PRIORITY TO OPTIMIZE DATA PREFETCH WITH ASSIST THREAD - A method for optimizing data prefetch using assist threads is disclosed herein. In one embodiment, such a method includes executing a main application thread substantially simultaneously with an assist thread. The assist thread is configured to prefetch data for the main application thread. The method further includes monitoring, at runtime, the progress of the main application thread and the assist thread. Depending on the progress of the main application thread and the assist thread, the method dynamically adjusts, at runtime, the priority of the main application thread, the priority of the assist thread, or both. This will help to ensure that the progress of the main application thread and the assist thread are substantially synchronized while executing so that the assist thread increases the performance of the main application thread as initially intended. A corresponding computer program product, apparatus, and system and also disclosed herein.	01-05-2012
20120198169	Binary Rewriting in Software Instruction Cache - Mechanisms are provided for dynamically rewriting branch instructions in a portion of code. The mechanisms execute a branch instruction in the portion of code. The mechanisms determine if a target instruction of the branch instruction, to which the branch instruction branches, is present in an instruction cache associated with the processor. Moreover, the mechanisms directly branch execution of the portion of code to the target instruction in the instruction cache, without intervention from an instruction cache runtime system, in response to a determination that the target instruction is present in the instruction cache. In addition, the mechanisms redirect execution of the portion of code to the instruction cache runtime system in response to a determination that the target instruction cannot be determined to be present in the instruction cache.	08-02-2012
20120198170	Dynamically Rewriting Branch Instructions in Response to Cache Line Eviction - Mechanisms are provided for evicting cache lines from an instruction cache of the data processing system. The mechanisms store, for a portion of code in a current cache line, a linked list of call sites that directly or indirectly target the portion of code in the current cache line. A determination is made as to whether the current cache line is to be evicted from the instruction cache. The linked list of call sites is processed to identify one or more rewritten branch instructions having associated branch stubs, that either directly or indirectly target the portion of code in the current cache line. In addition, the one or more rewritten branch instructions are rewritten to restore the one or more rewritten branch instructions to an original state based on information in the associated branch stubs.	08-02-2012
20120198429	Arranging Binary Code Based on Call Graph Partitioning - Mechanisms are provided for arranging binary code to reduce instruction cache conflict misses. These mechanisms generate a call graph of a portion of code. Nodes and edges in the call graph are weighted to generate a weighted call graph. The weighted call graph is then partitioned according to the weights, affinities between nodes of the call graph, and the size of cache lines in an instruction cache of the data processing system, so that binary code associated with one or more subsets of nodes in the call graph are combined into individual cache lines based on the partitioning. The binary code corresponding to the partitioned call graph is then output for execution in a computing device.	08-02-2012
20120204016	Rewriting Branch Instructions Using Branch Stubs - Mechanisms are provided for rewriting branch instructions in a portion of code. The mechanisms receive a portion of source code having an original branch instruction. The mechanisms generate a branch stub for the original branch instruction. The branch stub stores information about the original branch instruction including an original target address of the original branch instruction. Moreover, the mechanisms rewrite the original branch instruction so that a target of the rewritten branch instruction references the branch stub. In addition, the mechanisms output compiled code including the rewritten branch instruction and the branch stub for execution by a computing device. The branch stub is utilized by the computing device at runtime to determine if execution of the rewritten branch instruction can be redirected directly to a target instruction corresponding to the original target address in an instruction cache of the computing device without intervention by an instruction cache runtime system.	08-09-2012
20120265941	Prefetching Irregular Data References for Software Controlled Caches - Prefetching irregular memory references into a software controlled cache is provided. A compiler analyzes source code to identify at least one of a plurality of loops that contain an irregular memory reference. The compiler determines if the irregular memory reference within the at least one loop is a candidate for optimization. Responsive to an indication that the irregular memory reference may be optimized, the compiler determines if the irregular memory reference is valid for prefetching. Responsive to an indication that the irregular memory reference is valid for prefetching, a store statement for an address of the irregular memory reference is inserted into the at least one loop. A runtime library call is inserted into a prefetch runtime library for the irregular memory reference. Data associated with the irregular memory reference is prefetched into the software controlled cache when the runtime library call is invoked.	10-18-2012
20120303907	Dynamically Maintaining Coherency Within Live Ranges of Direct Buffers - Reducing coherency problems in a data processing system is provided. Source code that is to be compiled is received and analyzed to identify at least one of a plurality of loops that contain a memory reference. A determination is made as to whether the memory reference is an access to a global memory that should be handled by a direct buffer. Responsive to an indication that the memory reference is an access to the global memory that should be handled by the direct buffer, the memory reference is marked for direct buffer transformation. The direct buffer transformation is then applied to the memory reference.	11-29-2012
20140068581	OPTIMIZED DIVISION OF WORK AMONG PROCESSORS IN A HETEROGENEOUS PROCESSING SYSTEM - A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.	03-06-2014
20140068582	OPTIMIZED DIVISION OF WORK AMONG PROCESSORS IN A HETEROGENEOUS PROCESSING SYSTEM - A compiler implemented by a computer performs optimized division of work across heterogeneous processors. The compiler divides source code into code sections and characterizes each of the code sections based on pre-defined criteria. Each of the code sections is characterized as at least one of: allocate to a main processor, allocate to a processing element, allocate to one of a parameterized main processor and a parameterized processing element, and indeterminate. The compiler analyzes side-effects and costs of executing the code sections on allocated processors, and transforms the code sections based on results of the analyzing. The transforming includes re-characterizing the code sections for alternate execution in a runtime environment.	03-06-2014
20140129787	DATA PLACEMENT FOR EXECUTION OF AN EXECUTABLE - According to one embodiment, a method for a compiler to produce an executable module to be executed by a computer system including a main processor and active memory devices includes dividing source code into code sections, identifying a first code section to be executed by the active memory devices, wherein the first code section is one of the code sections and identifying data structures that are used by the first code section. The method also includes classifying the data structures based on pre-defined attributes, formulating, by the compiler, a storage mapping plan for the data structures based on the classifying and generating, by the compiler, mapping code that implements the storage mapping plan, wherein the mapping code is part of the executable module and wherein the mapping code maps storing of the data structures to storage locations in the active memory devices.	05-08-2014
20140130022	PLACEMENT OF INSTRUCTIONS IN A MEMORY SYSTEM - According to one embodiment of the present invention, a method for operation of a computer system including a main processor, a first and a second active memory device includes receiving an executable module generated by a compiler, wherein the executable module includes a code section identified as executable by a first processing element in the first active memory device and a second processing element in the second active memory device. The method further includes copying the code section to memory in the first device based on the code section being executable on the first device, copying the code section from the memory in the first active memory device to an instruction buffer of the first processing element and copying the code section from the memory in the first device to the second device based on the code section being executable on the second device.	05-08-2014
20140130023	PLACEMENT OF INSTRUCTIONS IN A MEMORY SYSTEM - According to one embodiment of the present invention, a computer system is provided where the computer system includes a main processor, first and second active memory device. The computer system is configured to perform a method including receiving an executable module generated by a compiler, wherein the executable module includes a code section identified as executable by a first processing element in the first active memory device and a second processing element in the second active memory device. The method includes copying the code section to memory in the first device based on the code section being executable on the first device, copying the code section from the first active memory device to an instruction buffer of the first processing element and copying the code section from the first device to the second device based on the code section being executable on the second device.	05-08-2014
20140130027	DATA PLACEMENT FOR EXECUTION OF AN EXECUTABLE - According to one embodiment, a system including a compiler to produce an executable module to be executed by a computer system including a main processor and active memory devices is provided. The system configured to perform a method including dividing source code into code sections, identifying a first code section to be executed by the active memory devices and identifying data structures that are used by the first code section. The method also includes classifying the data structures based on pre-defined attributes, formulating, by the compiler, a storage mapping plan for the data structures based on the classifying and generating, by the compiler, mapping code that implements the storage mapping plan, wherein the mapping code is part of the executable module and wherein the mapping code maps storing of the data structures to storage locations in the active memory devices.	05-08-2014

Patent applications by Tong Chen, Yorktown Heights, NY US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Tong Chen, Yorktown Heights US

Tong Chen, Yorktown Heights, NY US