Mauricio Breternitz, Austin US

Mauricio Breternitz, Austin, TX US

Patent application number	Description	Published
20120291040	AUTOMATIC LOAD BALANCING FOR HETEROGENEOUS CORES - A system and method for efficient automatic scheduling of the execution of work units between multiple heterogeneous processor cores. A processing node includes a first processor core with a general-purpose micro-architecture and a second processor core with a single instruction multiple data micro-architecture. A computer program comprises one or more compute kernels, or function calls. A compiler computes pre-runtime information of the given function call. A runtime scheduler produces one or more work units by matching each of the one or more kernels with an associated record of data. The scheduler assigns work units either to the first or to the second processor core based at least in part on the computed pre-runtime information. In addition, the scheduler is able to change an original assignment for a waiting work unit based on dynamic runtime behavior of other work units corresponding to a same kernel as the waiting work unit.	11-15-2012
20120297163	AUTOMATIC KERNEL MIGRATION FOR HETEROGENEOUS CORES - A system and method for automatically migrating the execution of work units between multiple heterogeneous cores. A computing system includes a first processor core with a single instruction multiple data micro-architecture and a second processor core with a general-purpose micro-architecture. A compiler predicts execution of a function call in a program migrates at a given location to a different processor core. The compiler creates a data structure to support moving live values associated with the execution of the function call at the given location. An operating system (OS) scheduler schedules at least code before the given location in program order to the first processor core. In response to receiving an indication that a condition for migration is satisfied, the OS scheduler moves the live values to a location indicated by the data structure for access by the second processor core and schedules code after the given location to the second processor core.	11-22-2012
20120331278	BRANCH REMOVAL BY DATA SHUFFLING - A system and method for automatically optimizing parallel execution of multiple work units in a processor by reducing a number of branch instructions. A computing system includes a first processor core with a general-purpose micro-architecture and a second processor core with a same instruction multiple data (SIMD) micro-architecture. A compiler detects and evaluates branches within function calls with one or more records of data used to determine one or more outcomes. Multiple compute sub-kernels are generated, each comprising code from the function corresponding to a unique outcome of the branch. Multiple work units are produced by assigning one or more records of data corresponding to a given outcome of the branch to one of the multiple compute sub-kernels associated with the given outcome. The branch is removed. An operating system scheduler schedules each of the one or more compute sub-kernels to the first processor core or to the second processor core.	12-27-2012
20130159623	PROCESSOR WITH GARBAGE-COLLECTION BASED CLASSIFICATION OF MEMORY - Improved memory management in a processor is provided using garbage collection utilities. The processor includes higher performance memory units and lower performance memory units and a memory management unit. The memory management unit includes a garbage collection utility programmed to identify high use memory blocks and low use memory blocks within the higher and lower performance memory units. The memory management unit is also configured to move the high use memory blocks to higher performance memory and move the low use memory blocks to lower performance memory. The method comprises determining performance characteristics of available memory to identify higher performance memory and lower performance memory. Next memory block use metrics are analyzed to identify high use memory blocks and low use memory blocks. Finally, high use memory blocks are moved to the higher performance memory while the low use memory blocks are moved to the lower performance memory.	06-20-2013
20140047079	SYSTEM AND METHOD FOR EMULATING A DESIRED NETWORK CONFIGURATION IN A CLOUD COMPUTING SYSTEM - The present disclosure relates to a method and system for configuring a computing system, such as a cloud computing system. A method includes selecting a cluster of nodes for the computing system from a plurality of available nodes coupled to a communication network based on a comparison of a communication network configuration of an emulated node cluster and an actual communication network configuration of the plurality of available nodes. The method further includes modifying a network configuration of at least one node of a cluster of nodes to modify network performance of the at least one node on a communication network coupled to the cluster of nodes.	02-13-2014
20140047084	SYSTEM AND METHOD FOR MODIFYING A HARDWARE CONFIGURATION OF A CLOUD COMPUTING SYSTEM - The present disclosure relates to a method and system for configuring a computing system, such as a cloud computing system. A method includes determining, based on a shared execution of a workload by a cluster of nodes of the computing system, that at least one node of the cluster of nodes operated at less than a threshold operating capacity during the shared execution of the workload. The method further includes selecting a modified hardware configuration of the cluster of nodes based on the determining such that the cluster of nodes with the modified hardware configuration has at least one of a reduced computing capacity and a reduced storage capacity.	02-13-2014
20140047095	SYSTEM AND METHOD FOR TUNING A CLOUD COMPUTING SYSTEM - The present disclosure relates to a method and system for configuring a computing system, such as a cloud computing system. A method includes initiating a plurality of executions of a workload on a cluster of nodes based on a plurality of different sets of configuration parameters. The configuration parameters include at least one of an operational parameter of a workload container, a boot-time parameter of at least one node, and a hardware configuration parameter of at least one node. A set of configuration parameters is selected for the cluster of nodes from the plurality of different sets of configuration parameters based on a comparison of at least one performance characteristic of the cluster of nodes monitored during each execution of the workload and at least one desired performance characteristic. The workload is provided to the cluster of nodes for execution.	02-13-2014
20140047227	SYSTEM AND METHOD FOR CONFIGURING BOOT-TIME PARAMETERS OF NODES OF A CLOUD COMPUTING SYSTEM - The present disclosure relates to a method and system for configuring a computing system, such as a cloud computing system. A method includes providing a user interface comprising selectable boot-time configuration data and selecting, based on at least one user selection of the boot-time configuration data, a boot-time configuration for at least one node of a cluster of nodes of the computing system. The method further includes configuring the at least one node of the cluster of nodes with the selected boot-time configuration to modify at least one boot-time parameter of the at least one node.	02-13-2014
20140047272	SYSTEM AND METHOD FOR CONFIGURING A CLOUD COMPUTING SYSTEM WITH A SYNTHETIC TEST WORKLOAD - The present disclosure relates to a method and system for configuring a computing system, such as a cloud computing system. A method includes selecting, based on a user selection received via a user interface, a workload for execution on a cluster of nodes of the computing system. The workload is selected from a plurality of available workloads including an actual workload and a synthetic test workload. The method further includes configuring the cluster of nodes of the computing system to execute the selected workload such that processing of the selected workload is distributed across the cluster of nodes. The synthetic test workload may be generated by a code synthesizer based on a set of user-defined workload parameters provided via a user interface that identify execution characteristics of the synthetic test workload.	02-13-2014
20140047341	SYSTEM AND METHOD FOR CONFIGURING CLOUD COMPUTING SYSTEMS - The present disclosure relates to a method, system, and apparatus for configuring a computing system, such as a cloud computing system. A method includes, based on user selections received via a user interface, configuring a cluster of nodes by selecting the cluster of nodes from a plurality of available nodes, selecting a workload container module from a plurality of available workload container modules for operation on each node of the selected cluster of nodes, and selecting a workload for execution with the workload container on the cluster of nodes. Each node of the cluster of nodes includes at least one processing device and memory, and the cluster of nodes is operative to share processing of a workload.	02-13-2014
20140047342	SYSTEM AND METHOD FOR ALLOCATING A CLUSTER OF NODES FOR A CLOUD COMPUTING SYSTEM BASED ON HARDWARE CHARACTERISTICS - The present disclosure relates to a method and system for configuring a computing system, such as a cloud computing system. A method includes initiating a hardware performance assessment test on a group of available nodes to obtain actual hardware performance characteristics of the group of available nodes. The method further includes selecting a subset of nodes for the computing system from the group of available nodes based on a comparison of the actual hardware performance characteristics of the group of available nodes and desired hardware performance characteristics.	02-13-2014
20140108828	SEMI-STATIC POWER AND PERFORMANCE OPTIMIZATION OF DATA CENTERS - A device may receive information that identifies a first task to be processed, may determine a performance metric value indicative of a behavior of a processor while processing a second task, and may assign, based on the performance metric value, the first task to a bin for processing the first task, the bin including a set of processors that operate based on a power characteristic.	04-17-2014
20140136870	TRACKING MEMORY BANK UTILITY AND COST FOR INTELLIGENT SHUTDOWN DECISIONS - A device receives an indication that a memory bank is to be powered down, and determines, based on receiving the indication, shutdown scores corresponding to powered up memory banks. Each shutdown score is based on a shutdown metric associated with powering down a powered up memory bank. The device may power down a selected memory bank based on the shutdown scores.	05-15-2014
20140136873	TRACKING MEMORY BANK UTILITY AND COST FOR INTELLIGENT POWER UP DECISIONS - A device receives an indication that a memory bank is to be powered up, and determines, based on receiving the indication, power scores corresponding to powered down memory banks. Each power score corresponds to a power metric associated with powering up a powered down memory bank. The device powers up a selected memory bank based on the plurality of power scores.	05-15-2014
20140156941	Tracking Non-Native Content in Caches - The described embodiments include a cache with a plurality of banks that includes a cache controller. In these embodiments, the cache controller determines a value representing non-native cache blocks stored in at least one bank in the cache, wherein a cache block is non-native to a bank when a home for the cache block is in a predetermined location relative to the bank. Then, based on the value representing non-native cache blocks stored in the at least one bank, the cache controller determines at least one bank in the cache to be transitioned from a first power mode to a second power mode. Next, the cache controller transitions the determined at least one bank in the cache from the first power mode to the second power mode.	06-05-2014
20140181411	PROCESSING DEVICE WITH INDEPENDENTLY ACTIVATABLE WORKING MEMORY BANK AND METHODS - A data processing device is provided that includes an array of working memory banks and an associated processing engine. The working memory bank array is configured with at least one independently activatable memory bank. A dirty data counter (DDC) is associated with the independently activatable memory bank and is configured to reflect a count of dirty data migrated from the independently activatable memory bank upon selective deactivation of the independently activatable memory bank. The DDC is configured to selectively decrement the count of dirty data upon the reactivation of the independently activatable memory bank in connection with a transient state. In the transient state, each dirty data access by the processing engine to the reactivated memory bank is also conducted with respect to another memory bank of the array. Upon a condition that dirty data is found in the other memory bank, the count of dirty data is decremented.	06-26-2014
20140181414	MECHANISMS TO BOUND THE PRESENCE OF CACHE BLOCKS WITH SPECIFIC PROPERTIES IN CACHES - A system and method for efficiently limiting storage space for data with particular properties in a cache memory. A computing system includes a cache array and a corresponding cache controller. The cache array includes multiple banks, wherein a first bank is powered down. In response a write request to a second bank for data indicated to be stored in the powered down first bank, the cache controller determines a respective bypass condition for the data. If the bypass condition exceeds a threshold, then the cache controller invalidates any copy of the data stored in the second bank. If the bypass condition does not exceed the threshold, then the cache controller stores the data with a clean state in the second bank. The cache controller writes the data in a lower-level memory for both cases.	06-26-2014
20140223445	Selecting a Resource from a Set of Resources for Performing an Operation - The described embodiments comprise a selection mechanism that selects a resource from a set of resources in a computing device for performing an operation. In some embodiments, the selection mechanism is configured to perform a lookup in a table selected from a set of tables to identify a resource from the set of resources. When the identified resource is not available for performing the operation and until a resource is selected for performing the operation, the selection mechanism is configured to identify a next resource in the table and select the next resource for performing the operation when the next resource is available for performing the operation.	08-07-2014
20140258688	BENCHMARK GENERATION USING INSTRUCTION EXECUTION INFORMATION - Methods and systems are provided for generating a benchmark representative of a reference process. One method involves obtaining execution information for a subset of the plurality of instructions of the reference process from a pipeline of a processing module during execution of those instructions by the processing module, determining performance characteristics quantifying the execution behavior of the reference process based on the execution information, and generating the benchmark process that mimics the quantified execution behavior of the reference process based on the performance characteristics.	09-11-2014
20140333638	POWER-EFFICIENT NESTED MAP-REDUCE EXECUTION ON A CLOUD OF HETEROGENEOUS ACCELERATED PROCESSING UNITS - An approach and a method for efficient execution of nested map-reduce framework workloads to take advantage of the combined execution of central processing units (CPUs) and graphics processing units (GPUs) and lower latency of data access in accelerated processing units (APUs) is described. In embodiments, metrics are generated to determine whether a map or reduce function is more efficiently processed on a CPU or a GPU. A first metric is based on ratio of a number of branch instructions to a number of non-branch instructions, and a second metric is based on the comparison of execution times on each of the CPU and the GPU. Selecting execution of map and reduce functions based on the first and second metrics result in accelerated computations. Some embodiments include scheduling pipelined executions of functions on the CPU and functions on the GPU concurrently to achieve power-efficient nested map reduce framework execution.	11-13-2014
20140359126	WORKLOAD PARTITIONING AMONG HETEROGENEOUS PROCESSING NODES - A method of computing is performed in a first processing node of a plurality of processing nodes of multiple types with distinct processing capabilities. The method includes, in response to a command, partitioning data associated with the command among the plurality of processing nodes. The data is partitioned based at least in part on the distinct processing capabilities of the multiple types of processing nodes.	12-04-2014
20140359633	THREAD ASSIGNMENT FOR POWER AND PERFORMANCE EFFICIENCY USING MULTIPLE POWER STATES - A method is performed in a computing system that includes a plurality of processing nodes of multiple types configurable to run in multiple performance states. In the method, an application executes on a thread assigned to a first processing node. Power and performance of the application on the first processing node is estimated. Power and performance of the application in multiple performance states on other processing nodes of the plurality of processing nodes besides the first processing node is also estimated. It is determined that the estimated power and performance of the application on a second processing node in a respective performance state of the multiple performance states is preferable to the power and performance of the application on the first processing node. The thread is reassigned to the second processing node, with the second processing node in the respective performance state.	12-04-2014
20140372782	COMBINED DYNAMIC AND STATIC POWER AND PERFORMANCE OPTIMIZATION ON DATA CENTERS - Various datacenter or other computing center control apparatus and methods are disclosed. In one aspect, a method of computing is provided that includes defining plural processor performance bins where each processor performance bin has a processor performance state. At least one processor is assigned to each of the plural processor performance bins. Processor performance metrics of at least one of the processors are monitored while the at least one of the processors executes an incoming task. Processor power is modeled based on the monitored performance metrics. Future incoming tasks are assigned to one of the processor performance bins based on the modeled processor power.	12-18-2014

Patent applications by Mauricio Breternitz, Austin, TX US

Inventors list

Assignees list

Classification tree browser

Top 100 Inventors

Top 100 Assignees

Mauricio Breternitz, Austin US

Mauricio Breternitz, Austin, TX US