Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Yonghong Song, Palo Alto US

Yonghong Song, Palo Alto, CA US

Patent application number	Description	Published
20090144746	ADJUSTING WORKLOAD TO ACCOMMODATE SPECULATIVE THREAD START-UP COST - Methods and apparatus provide for a workload adjuster to estimate the startup cost of one or more non-main threads of loop execution and to estimate the amount of workload to be migrated between different threads. Upon deciding to parallelize the execution of a loop, the workload adjuster creates a scheduling policy with a workload for a main thread and workloads for respective non-main threads. The scheduling policy distributes iterations of a parallelized loop to the workload of the main thread and iterations of the parallelized loop to the workloads of the non-main threads. The workload adjuster evaluates a start-up cost of the workload of a non-main thread and, based on the start-up cost, migrates a portion of the workload for that non-main thread to the main thread's workload.	06-04-2009
20090217253	COMPILER FRAMEWORK FOR SPECULATIVE AUTOMATIC PARALLELIZATION WITH TRANSACTIONAL MEMORY - A computer program is speculatively parallelized with transactional memory by scoping program variables at compile time, and inserting code into the program at compile time. Determinations of the scoping can be based on whether scalar variables being scoped are involved in inter-loop non-reduction data dependencies, are used outside loops in which they were defined, and at what point in a loop a scalar variable is defined. The inserted code can include instructions for execution at a run time of the program to determine loop boundaries of the program, and issue checkpoint instructions and commit instructions that encompass transaction regions in the program. A transaction region can include an original function of the program and a spin-waiting loop with a non-transactional load, wherein the spin-waiting loop is configured to wait for a previous thread to commit before the current transaction commits.	08-27-2009
20090276758	STATIC PROFITABILITY CONTROL FOR SPECULATIVE AUTOMATIC PARALLELIZATION - A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. Having identified one or more suitable candidates, the profitability of parallelizing the candidate code is determined. If the profitability determination meets a predetermined criteria, then the candidate code may be parallelized. If, however, the profitability determination does not meet the predetermined criteria, then the candidate code may not be parallelized. Candidate code may comprises a loop, and determining profitability of parallelization may include computing a probability of transaction failure for the loop. Additionally, a determination of an execution time of a parallelized version of the loop is made. If the determined execution time is less than an execution time of a non-parallelized version of said loop by at least a given amount, then the loop may be parallelized. If the determined execution time is not less than an execution time of a non-parallelized version of said loop by at least a given amount, then the loop may not be parallelized.	11-05-2009
20090276766	RUNTIME PROFITABILITY CONTROL FOR SPECULATIVE AUTOMATIC PARALLELIZATION - A compilation method and mechanism for parallelizing program code. A method for compilation includes analyzing source code and identifying candidate code for parallelization. The method includes parallelizing the candidate code, in response to determining said profitability meets a predetermined criteria; and generating object code corresponding to the source code. The generated object code includes both a non-parallelized version of the candidate code and a parallelized version of the candidate code. During execution of the object code, a dynamic selection between execution of the non-parallelized version of the candidate code and the parallelized version of the candidate code is made. Changing execution from said parallelized version of the candidate code to the non-parallelized version of the candidate code, may be in response to determining a transaction failure count meets a pre-determined threshold. Additionally, changing execution from one version to the other may be in further response to determining an execution time of the parallelized version of the candidate code is greater than an execution time of the non-parallelized version of the candidate code.	11-05-2009
20090288075	PARALLELIZING NON-COUNTABLE LOOPS WITH HARDWARE TRANSACTIONAL MEMORY - A system and method for speculatively parallelizing non-countable loops in a multi-threaded application. A multi-core processor receives instructions for a multi-threaded application. The application may contain non-countable loops. Non-countable loops have an iteration count value that cannot be determined prior to the execution of the non-countable loop, a loop index value that cannot be non-speculatively determined prior to the execution of an iteration of the non-countable loop, and control that is not transferred out of the loop body by a code line in the loop body. The compiler replaces the non-countable loop with a parallelized loop pattern that uses outlined function calls defined in a parallelization library (PL) in order to speculatively execute iterations of the parallelized loop. The parallelized loop pattern is configured to squash and re-execute any speculative thread of the parallelized loop pattern that is signaled to have a transaction failure.	11-19-2009
20100146480	COMPILER IMPLEMENTATION OF LOCK/UNLOCK USING HARDWARE TRANSACTIONAL MEMORY - A system and method for automatic efficient parallelization of code combined with hardware transactional memory support. A software application may contain a transaction synchronization region (TSR) utilizing lock and unlock transaction synchronization function calls for a shared region of memory within a shared memory. The TSR is replaced with two portions of code. The first portion comprises hardware transactional memory primitives in place of lock and unlock function calls. Also, the first portion ensures no other transaction is accessing the shared region without disabling existing hardware transactional memory support. The second portion performs a fail routine, which utilizes lock and unlock transaction synchronization primitives in response to an indication that a failure occurs within said first portion.	06-10-2010
20100146495	METHOD AND SYSTEM FOR INTERPROCEDURAL PREFETCHING - A computing system has an amount of shared cache, and performs runtime automatic parallelization wherein when a parallelized loop is encountered, a main thread shares the workload with at least one other non-main thread. A method for providing interprocedural prefetching includes compiling source code to produce compiled code having a main thread including a parallelized loop. Prior to the parallelized loop in the main thread, the main thread includes prefetching instructions for the at least one other non-main thread that shares the workload of the parallelized loop. As a result, the main thread prefetches data into the shared cache for use by the at least one other non-main thread.	06-10-2010
20100153959	CONTROLLING AND DYNAMICALLY VARYING AUTOMATIC PARALLELIZATION - A system and method for automatically controlling run-time parallelization of a software application. A buffer is allocated during execution of program code of an application. When a point in program code near a parallelized region is reached, demand information is stored in the buffer in response to reaching a predetermined first checkpoint. Subsequently, the demand information is read from the buffer in response to reaching a predetermined second checkpoint. Allocation information corresponding to the read demand information is computed and stored the in the buffer for the application to later access. The allocation information is read from the buffer in response to reaching a predetermined third checkpoint, and the parallelized region of code is executed in a manner corresponding to the allocation information.	06-17-2010
20100325618	FAULT TOLERANT COMPILATION WITH AUTOMATIC ERROR CORRECTION - A compilation method is provided for automated user error correction. The method includes using a compiler driver run by a processor to receive a source file for compilation. With a compiler component invoked by the compiler driver, the method includes identifying an error in the source file such as a linking problem or syntax error in the user's program. The method includes receiving with the compiler driver an error message corresponding to the identified error. With an error corrector module run by the processor, the method includes processing the error message to determine an error correction for the identified error in the source file. The compiler driver modifies the source file based on the error correction and compiles the modified source file with the compiler component.	12-23-2010
20100325619	FAULT TOLERANT COMPILATION WITH AUTOMATIC OPTIMIZATION ADJUSTMENT - A compilation method is provided for correcting compiler errors that include compiler internal errors and errors produced by running a validation suite. The method includes running a compiler on a computer and storing a set of optimization levels in memory accessible by the compiler. The method includes receiving a source file with the compiler that includes a user-defined optimization level to be used in compiling the source file. The method includes identifying a set of functions within the source file and using compiler components to compile these functions using the original optimization level. When the compiling results in an internal error occurring and being reported for one or more of the functions, the method includes using an optimization adjustment module to process the internal error and assign an adjusted or lower optimization level to the one or more functions and recompiling of these functions again with the lower optimization level.	12-23-2010
20110067014	PIPELINED PARALLELIZATION WITH LOCALIZED SELF-HELPER THREADING - A system and method for automatically parallelizing a computer program for multi-threaded execution. A compiler identifies and parallelizes non-DOALL parallel regions, such as loops, within a computer program. The compiler determines enhanced helper thread instructions based upon the main body instructions of the non-DOALL region. These helper thread instructions are inserted ahead of the main body instructions within each of the plurality of threads, rather than within a single main thread. Next, synchronization instructions are inserted in one or more threads such that the main body of work of each thread is performed in a pipelined manner. The helper thread instructions within each thread may reduce the total execution time of each thread.	03-17-2011
20110161945	Minimizing Register Spills by Using Register Moves - A system and method for minimizing register spills during compilation. A compiler reallocates spilled variables from stack memory to other available registers. Although a corresponding register file may not have available registers for storage, the compiler identifies available registers in other locations for storage. The compiler identifies available registers in an alternate register file, wherein the alternate register file may be a floating-point register file which is then used for spilled integer variables. Other instruction type combinations between spilled variables and alternate register files are possible. When an available register within the alternate register file is identified, the compiler modifies the program instructions to allocate the corresponding spilled variable to the available register.	06-30-2011

Patent applications by Yonghong Song, Palo Alto, CA US