# Patent application title: METHOD AND APPARATUS FOR PROVIDING A MEASUREMENT OF PERFORMANCE FOR A NETWORK

##
Inventors:
Joel Sommers (Hamilton, NY, US)
Nicholas Duffield (Summit, NJ, US)
Paul Barford (Madison, WI, US)
Amos Ron (Madison, WI, US)

Assignees:
Wisconsin Alumni Research Foundation
AT&T Intellectual Property I, L.P.

IPC8 Class: AH04L1226FI

USPC Class:
370252

Class name: Multiplex communications diagnostic testing (other than synchronization) determination of communication parameters

Publication date: 2012-05-03

Patent application number: 20120106377

## Abstract:

Performance for a network is measured by sending multi-objective probes
on a path, receiving at least one of the multi-objective probes for the
path, and determining performance measurements for at least two
parameters of the path determined from the at least one of the
multi-objective probes. Separate algorithms are simultaneously executed
to measure the at least two parameters of the path determined from the at
least one of the multi-objective probes.## Claims:

**1.**A method for providing a measurement of performance for a network, comprising: sending a plurality of multi-objective probes on a path; receiving at least one of the plurality of multi-objective probes for the path; and determining a plurality of performance measurements for at least two parameters of the path determined from the at least one of the plurality of multi-objective probes, wherein separate algorithms are simultaneously executed to measure the at least two parameters of the path determined from the at least one of the plurality of multi-objective probes.

**2.**The method of claim 1, wherein one of the parameters comprises a packet loss.

**3.**The method of claim 2, further comprising: wherein one of the performance measurements comprises a frequency of packet loss.

**4.**The method of claim 1, wherein one of the performance measurements comprises a duration of a period in which packets are lost.

**5.**The method of claim 1, wherein the parameters comprise at least a delay variation and a packet loss rate.

**6.**The method of claim 5, wherein the delay variation comprises a time difference between when two probes are sent, and a time difference between when the two probes are received.

**7.**The method of claim 1, wherein each of the plurality of multi-objective probes includes a size of the multi-objective probe.

**8.**The method of claim 1, wherein the plurality of multi-objective probes comprises a single probing stream.

**9.**A computer-readable medium having stored thereon a plurality of instructions, the plurality of instructions including instructions which, when executed by a processor, cause the processor to perform a method for providing a measurement of performance for a network, comprising: sending a plurality of multi-objective probes on a path; receiving at least one of the plurality of multi-objective probes for the path; and determining a plurality of performance measurements for at least two parameters of the path determined from the at least one of the plurality of multi-objective probes, wherein separate algorithms are simultaneously executed to measure the at least two parameters of the path determined from the at least one of the plurality of multi-objective probes.

**10.**The computer-readable medium of claim 9, further comprising: determining at least one network quantile bound of distribution for at least one of the at least two parameters.

**11.**The computer-readable medium of claim 10, wherein the at least one network quantile bound is based on an additive network performance model for a network delay.

**12.**The computer-readable medium of claim 9, wherein the path comprises an end-to-end path.

**13.**The computer-readable medium of claim 9, wherein the parameters comprise at least a delay variation and a packet loss rate.

**14.**The computer-readable medium of claim 13, wherein the delay variation comprises a time difference between two probes when the two probes are sent, and a time difference between the two probes when the two probes are received.

**15.**The computer-readable medium of claim 9, wherein each of the plurality of multi-objective probes includes a spacing of packets within the multi-objective probe.

**16.**The computer-readable medium of claim 9, wherein the plurality of multi-objective probes comprises a single probing stream.

**17.**An apparatus for providing a measurement of performance for a network, comprising: a transmitter that transmits a plurality of multi-objective probes on a path; a receiver that receives at least one of the plurality of multi-objective probes for the path; and a processor that determines a plurality of performance measurements for at least two parameters of the path from the at least one of the plurality of multi-objective probes, wherein separate algorithms are simultaneously executed to measure the at least two parameters of the path determined from the at least one of the plurality of multi-objective probes.

**18.**The apparatus of claim 17, wherein the processor determines at least one network quantile bound of distribution for at least one of the at least two parameters.

**19.**The apparatus of claim 17, wherein the processor determines at least a delay and a packet loss rate.

**20.**The apparatus of claim 17, wherein each of the plurality of multi-objective probes includes a sequence number within the multi-objective probe.

## Description:

**CROSS**-REFERENCE TO RELATED APPLICATIONS

**[0001]**The present application is a continuation application of pending U.S. application Ser. No. 12/109,743, filed on Apr. 25, 2008, the contents of which are expressly incorporated by reference herein in its entirety.

**[0002]**The present invention relates generally to communications networks and, more particularly, to a method and apparatus for providing a measurement of performance for a packet network, e.g., a Local Area Network (LAN), a Virtual Private Network (VPN), an Internet Protocol (IP) network, and the like.

**BACKGROUND OF THE INVENTION**

**[0003]**A customer may request a network service provider to guarantee the performance of one or more network services. For example, the service provider and the customer may detail the level of network performance for a service to be received by the customer in a Service Level Agreement (SLA). For example, a SLA may detail transport level service assurances for a service using performance parameters such as: frequency of loss events, duration of loss events, packet loss rates, delay, delay variation, etc. Failing to meet an SLA guarantee may result in a service disruption for the customer. In addition, the failure to meet an SLA guarantee may result in a loss of revenue for the customer and/or the network service provider.

**[0004]**The service provider may perform compliance monitoring to ensure that SLA targets are met. The compliance monitoring may be performed passively by collecting network data or actively by injecting measurement probes. Passive measurements are performed on a link-by-link basis and are not well suited to end-to-end performance targets. Current active measurements are focused on assessing a specific parameter (e.g., delay or loss), and are often not sufficiently accurate to determine whether an SLA has been violated.

**SUMMARY OF THE INVENTION**

**[0005]**In one embodiment, the present invention discloses a method and apparatus for providing a measurement of performance for a network. For example, the method sends a plurality of multi-objective probes on a path, and receives one or more of said plurality of multi-objective probes for the path. The method then determines a plurality of performance measurements.

**BRIEF DESCRIPTION OF THE DRAWINGS**

**[0006]**The teaching of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:

**[0007]**FIG. 1 illustrates an illustrative network related to the current invention;

**[0008]**FIG. 2 illustrates an illustrative network of the current invention for providing a measurement of performance for a network;

**[0009]**FIG. 3 illustrates an illustrative architecture of a multi-objective probe scheduler;

**[0010]**FIG. 4 illustrates a flowchart of a method for providing a measurement of performance for a network; and

**[0011]**FIG. 5 illustrates a high level block diagram of a general purpose computer suitable for use in performing the functions described herein.

**[0012]**To facilitate understanding, identical reference numerals have been used, where possible, to designate identical elements that are common to the figures.

**DETAILED DESCRIPTION**

**[0013]**The present invention a method and apparatus for providing a measurement of performance for a packet network, e.g., a Local Area Network (LAN), a Virtual Private Network (VPN), an Internet Protocol (IP) network, and the like. For example, the present invention discloses a method for providing SLA compliance monitoring.

**[0014]**FIG. 1 illustrates an exemplary packet network 100 related to the current invention. Exemplary packet networks include Internet protocol (IP) networks, Asynchronous Transfer Mode (ATM) networks, frame-relay networks, and the like. An IP network is broadly defined as a network that uses Internet Protocol such as IPv4 or IPv6 and the like to exchange data packets. It should be noted that the present invention is not limited to a particular type of network.

**[0015]**In one embodiment, the packet network may comprise a plurality of endpoint devices 102-104 configured for communication with the core packet network 110 (e.g., an IP based core backbone network supported by a service provider) via an access network 101. Similarly, a plurality of endpoint devices 105-107 are configured for communication with the core packet network 110 via an access network 108. The network elements (NEs) 109 and 111 may serve as gateway servers or edge routers for the network 110.

**[0016]**The endpoint devices 102-107 may comprise customer endpoint devices such as personal computers, laptop computers, Personal Digital Assistants (PDAs), servers, routers, and the like. The access networks 101 and 108 serve as a means to establish a connection between the endpoint devices 102-107 and the NEs 109 and 111 of the IP/MPLS core network 110. The access networks 101 and 108 may each comprise a Digital Subscriber Line (DSL) network, a broadband cable access network, a Local Area Network (LAN), a Wireless Access Network (WAN), and the like.

**[0017]**The access networks 101 and 108 may be either directly connected to NEs 109 and 111 of the IP/MPLS core network 110 or through an Asynchronous Transfer Mode (ATM) and/or Frame Relay (FR) switch network 130. If the connection is through the ATM/FR network 130, the packets from customer endpoint devices 102-104 (traveling towards the IP/MPLS core network 110) traverse the access network 101 and the ATM/FR switch network 130 and reach the border element 109.

**[0018]**The ATM/FR network 130 may contain Layer 2 switches functioning as Provider Edge Routers (PERs) and/or Provider Routers (PRs). The PERs may also contain an additional Route Processing Module (RPM) that converts Layer 2 frames to Layer 3 Internet Protocol (IP) frames. An RPM enables the transfer of packets from a Layer 2 Permanent Virtual Connection (PVC) circuit to an IP network which is connectionless.

**[0019]**Some NEs (e.g., NEs 109 and 111) reside at the edge of the core infrastructure and interface with customer endpoints over various types of access networks. An NE that resides at the edge of a core infrastructure is typically implemented as an edge router, a media gateway, a border element, a firewall, a switch, and the like. An NE may also reside within the network (e.g., NEs 118-120) and may be used as a mail server, honeypot, a router, or like device. The IP/MPLS core network 110 may also comprise an application server 112 that contains a database 115. The application server 112 may comprise any server or computer that is well known in the art, and the database 115 may be any type of electronic collection of data that is also well known in the art. Those skilled in the art will realize that although only six endpoint devices, two access networks, five network elements, one application server are depicted in FIG. 1, the communication network 100 may be expanded by including any number of additional endpoint devices, access networks, network elements, application servers without altering the scope of the present invention.

**[0020]**The above IP network is described to provide an illustrative environment in which packets for voice and data services are transmitted on networks. In one embodiment, the network service provider and the customer may detail the network performance level for a network used to transport packets for the customer in a Service Level Agreement (SLA). For example, the SLA may detail transport level service assurances using network performance parameters. For example, the SLA may provide minimum performance targets for frequency of loss events, duration of loss events, packet loss rates, delay, delay variation, etc.

**[0021]**The network service provider may then perform compliance monitoring to determine if the performance parameters are meeting the minimum performance targets. The compliance monitoring may be based on passive or active measurements. For example, a passive measurement may be performed by collecting data from one or more counters (e.g., Simple Network Management Protocol (SNMP) counters) located within the network. However, the packets from and/or to the customer may traverse multiple links and/or networks. Hence, the passive measurement is insufficient for estimating the performance level experienced by the customer traffic on an end-to-end basis.

**[0022]**It should be noted that although the present invention is disclosed in the context of a single service provider, the present invention is not so limited. Namely, the present invention can be adapted to the end-to-end monitoring through multiple networks owned by different service providers.

**[0023]**An active measurement may be performed by injecting one or more measurement probes. A measurement probe refers to one or more packets sent on an end-to-end path. However, current active measurement probes are limited in focus to estimation and optimization of a single performance parameter. For example, one or more packets may be injected to measure the end-to-end delay for a customer traffic traversing a network. Furthermore, current active measurements make assumptions about underlying network conditions and are not suitable for realistic networks with abrupt changes.

**[0024]**In one embodiment, the present invention broadly discloses a method and apparatus for providing a measurement of performance for a network using a multi-objective probe. To better understand the present invention, the following networking terminology will first be provided:

**[0025]**Probe;

**[0026]**Multi-objective probing;

**[0027]**Packet loss rate;

**[0028]**Delay; and

**[0029]**Delay Variation.

**[0030]**A probe refers to one or more packets sent on an end-to-end path. For example, a delay measurement probe may be sent from a source to a destination to determine the delay being experienced by packets in traversing the network from the source to the destination. It should be noted that various systems can be used to send or receive probes, e.g., general purpose computers or NEs (e.g., switches, routers, and the like).

**[0031]**Multi-objective probing refers to simultaneous estimation of multiple performance metrics using a single probing stream of packets.

**[0032]**Packet loss rate refers to the total number of lost packets divided by the total number of transmitted packets on a given path (consisting of one or more links) over a specified time interval. For example, if one packet is lost out of 1000 packets sent in one second, the packet loss rate is 0.001 per second.

**[0033]**Delay refers to the duration of time between when a packet is sent by an origination device and when the packet is received by a destination device. For example, a delay may be the time a packet takes to travel from the origination device to the destination device.

**[0034]**Delay Variation refers to a measurement of the variation of delay experienced by multiple packets. For example, if two packets are sent one second apart and they reach the destination at one second apart, then the delay variation is zero. However, if the packets are received 1.1 seconds apart (in the order sent), then there is a delay variation in the network. Delay variation is sometimes refers to as a jitter.

**[0035]**Note that an SLA may specify availability as a performance parameter. Availability can be loosely defined as the capability of the network to successfully transmit any end-to-end probe over an interval of time. Thus, availability may be considered as a special case of a loss rate. The current invention addresses availability as a special case of a loss rate and as such, it is not discussed separately.

**[0036]**FIG. 2 illustrates an illustrative network 200 of the current invention for providing a measurement of performance for a network. For example, the customer endpoint devices 102 and 105 are communicating via the IP/MPLS core network 110. The IP/MPLS core network 110 may contain various routers 201-206 that support the communications between customer endpoint devices 102 and 105. FIG. 2 broadly shows that packets associated with the communications between customer endpoint devices 102 and 105 may traverse over different paths or links within the network 110.

**[0037]**In one embodiment, the network service provider implements the current invention for providing a measurement of performance parameters in an application server 212. The application server 212 implements one or more performance measurement algorithms for various network performance parameters, e.g., delay, delay variation, packet loss rate, and the like.

**[0038]**FIG. 3 illustrates an illustrative architecture 300 of a multi-objective probe scheduler. In one embodiment, one or more discrete-time performance measurement algorithms operating at the same time may schedule measurement probes to be sent at the same time slot. The requests may be accommodated by tagging probes according to one or more relevant estimator. Thus, a single probe stream 306 may be used for concurrent estimation of performance parameters such as delay, delay variation, packet loss, and other quantities, thereby reducing the impact of measurement traffic on a network.

**[0039]**In one embodiment, the discrete-time scheduler 304 provides callback and probe scheduling mechanisms. Probe modules 301, 302 and 303 provide various path-oriented performance measurement methods for delay, delay variation and packet loss, respectively. These path-oriented performance measurement methods will be described further below. The probe scheduler 304 interacts with network elements for sending and receiving probes via the network interface 305, i.e., generating a single probe stream 306 for measuring delay, delay variation and packet loss. This architecture allows for logical separation among multiple, simultaneously operating performance measurement methods and for optimizations of network bandwidth.

**[0040]**In one embodiment, a multi-objective probe may include one or more of: a time stamp indicating when the probe was sent, a time stamp indicating when the probe is received, one or more markers to indicate the parameter to be measured, size of the probe (e.g., the number of packets in each probe), the time spacing of the packets within a probe, and/or the sequence number within a probe.

**[0041]**In one embodiment, the current invention provides a measurement of delay for a network. For example, the measurement of delay may be performed to determine various statistical values for delay parameters such as mean delay, median delay, or high quantiles of the delay distribution such as 95

^{th}percentile, 99

^{th}percentile, and so on.

**[0042]**In one embodiment, the current method provides an estimate of mean delay along a path. For example, the method first models the delay as a continuous function f(t) whose independent variable is the time that a probe packet is sent and the dependent variable is measured as a one-way delay. Based on this model, a method to provide the mean delay estimation is to use Simpson's method for numerical integration. The Simpson's formulation is straightforward: once the domain of integration is partitioned, the integral of the function f over the subinterval I

_{j}is estimated by

**1 6 ( f ( a j ) + f ( b j ) + 4 f ( c j ) ) , ##EQU00001##**

**with a**

_{j,b}

_{j}the endpoints of I

_{j}, and with c

_{j}its midpoint. The error of the Simpson estimate is known to be

**e j**= f ( 4 ) ( ξ j ) 2880 I j 5 , ##EQU00002##

**with**ξ

_{j}some point in the interval I

_{j}. Thus, if the fourth derivative of f exists and is not too large, it is safe to state that the local error is of order 5; i.e., if the number of samples are doubled, the error in the estimate is reduced locally by a factor of 32, and globally by a factor of 16.

**[0043]**To apply Simpson's method to a discrete-time probe process for estimating mean end-to-end delay, the method does the following: at time sloti, the method draws a value k from a geometric distribution with parameter p

_{delay}. The geometric distribution is the discrete analog of the exponential distribution and yields unbiased samples. Probes representing the endpoints a

_{j}and b

_{j}are sent at time slots i and i+2(k+1)with the midpoint probe sent a time slot i+(k+1). At time sloti+2(k+1) the next subinterval begins, thus the last probe of a given subinterval is the first probe of the next one. Simpson's estimates from each subinterval are summed to form the total area under the delay function. The mean delay estimate is then obtained by dividing the integral estimate by the number of subintervals.

**[0044]**With the above formulation, the subintervals are not of equal lengths. In fact, the lengths form a geometric distribution. Thus, the method may either directly apply Simpson's method to estimate the mean delay, or the method may apply relative weights to the subintervals according to their lengths. In one embodiment, the current invention uses weighted subintervals to get more accurate results.

**[0045]**The above method to estimate mean delay described probes that are received at the destination device. However, probes may be lost in transit. In one embodiment, the current method discards subintervals where a probe loss occurs.

**[0046]**The above method assumed that delay largely behaves as a smooth function. However, for some network conditions it may be more accurate to account for random spikes in delay by modeling the process as the sum of two processes, one smooth and one random. For example, if the function f(t) is written as f

_{1}(t)+f

_{2}(t), with f

_{1}(t) smooth and f

_{2}(t) random, then the numerical integration does much better on f

_{1}(t) and slightly worse on f

_{2}(t) as compared to straight averaging. The Simpson's approach is effective for this model as well: if the values of the random part are quite small compared to the smooth part, then the estimate is better than simple averaging.

**[0047]**Note that there is little risk in using Simpson's method. Even if delay is a completely random process (which is highly unlikely), the variance of the Simpson's rule estimator for mean delay is increased only slightly as compared to simple averaging.

**[0048]**In one embodiment, the current method provides an estimate of one or more high quantiles for a delay along a path, such as the 95th percentile. Mathematically defined, let {x, :i=1, . . . , n} be n independent samples drawn at random from a common distribution F, sorted in increasing order. For simplicity, assume F is continuous. Let Q

_{p}denote the p

^{th}quantile of that distribution, i.e., the unique solution of F(Q

_{p}=p).

**[0049]**The method obtains confidence intervals for Q

_{p}based on {x

_{i}}. One approach is to start with the empirical distribution function: {circumflex over (F)}(x)=n

^{-1}#{i: x

_{i}≦x} and use a quantile estimate of the form: {circumflex over (Q)}

_{p}=max{x:{circumflex over (F)}(x)≦p}. Analysis of the variance of this estimator might then give asymptotic confidence intervals as n becomes large. Instead, the method seeks probabilistic bounds on Q

_{p}that hold for all n.

**[0050]**Now {x

_{k}≦x} is the event that at least k of the samples are less than or equal to x, an event which has probability G(n,F(x),k), where

**G**( n , p , k ) = j ≧ k p j ( 1 - p ) n - j ( n j ) . ##EQU00003##

**Taking x**=Q

_{p}, Pr[x

_{k}≦Q

_{p}]=G(n,p,k).

**[0051]**Based on the x

_{i}, the method now determines a level X.sup.+(n,p,ε) that the true quantile Q

_{p}is guaranteed to exceed only with some small probability ε.

**Thus**, the method chooses X.sup.+(n,p,ε)=x

_{K}.sub.+.sub.(n,p,ε) with K.sup.+(n,p,ε)=min{k:G(n,p,k)≦ε}.

**[0052]**Similarly, Pr[x

_{k}≧Q

_{p}]=1-G(n,p,k). Based on the x

_{i}, the method determines a level X

^{-}(n,p,e) that the true quantile Q

_{p}is guaranteed to fall below only with some small probability ε. Thus, the method chooses X

^{-}(n,p,ε)=x

_{K}.sub.-.sub.(n,p,ε) with K

^{-}(n,p,ε)=max{k:1-G(n,p,k)≦ε].

**[0053]**In other words, K.sup.+(n,p,ε) is the 1-ε quantile of the binomial B

_{n,p}distribution, while K

^{-}(n,p,ε) is the ε quantile of the binomial B

_{n,p}distribution. The K.sup.± can be computed exactly.

**[0054]**Table 1 provides some example quantile indices K.sup.± for various sample sizes n, and quantiles p. Confidence level is 1-ε=90%. Also shown is the reference quantile index K

^{0}=np. The dash (shown as -) indicates that no upper bound K.sup.+ was available, which may occur when the top atom has mass greater than the desired significance level, i.e., p

^{n}>ε.

**TABLE**-US-00001 TABLE 1 Quantile p = 50 p = 90 p = 99 n K

^{-}K

^{0}K.sup.+ K

^{-}K

^{0}K.sup.+ K

^{-}K

^{0}K.sup.+ 100 44 50 57 86 90 95 98 99 -- 1000 480 500 521 888 900 913 986 990 995 10000 4936 5000 5065 8961 9000 9039 9887 9900 9914

**[0055]**In one embodiment, the current invention provides a measurement of delay variation. The method first considers a stream of probes of length k, e.g., 100 probes. The method denotes the time difference between two probes i and j when they are sent as S

_{i,j}, and the time difference between the same two probes when they are received as r

_{i,j}. The method then constructs a matrix M where each cell M

_{i,j}contains the ratio r

_{i,j}/s

_{i,j}. Thus, M

_{i,j}is 1 if the spacing between probes i and j does not change; is greater than 1 if the measured spacing increases; or is less than 1 if the measured spacing decreases as the probes traverse the network path. Ratio r

_{i,j}/s

_{i,j}is defined as 1 for i=j and it is defined as 0 if probe i or j is lost. Note that computing the above ratio r

_{i,j}/s

_{i,j}with respect to consecutive probes in the stream gives a more accurate description of the instantaneous nature of delay variation while probes farther apart give a description of delay variation over longer time intervals.

**[0056]**The method then computes the eigenvalues of the matrix M, resulting in a vector e of length k, with values sorted from largest to smallest. If the probe stream traverses the network undisturbed, the matrix M would consist entirely of ones, with the largest eigenvalue as k and all other eigenvalues as 0. The method denotes the vector of these "expected" eigenvalues as e'. The method then subtracts e' from e, taking the L

_{l}norm of the resulting vector: Σ

_{i}=1

^{k}|e

_{i}-e

_{i}'|. The method refers to this L

_{l}norm as the delay variation matrix parameter. The delay variation matrix is useful for relative comparisons over time. The delay variation matrix formulation relies on and is motivated by the fact that there is a notion of what is expected in the absence of turbulence along the path, i.e., that probe spacing should remain undisturbed. By looking at the eigen-structure of the delay variation matrix, the method may extract the amount of distortion from what the method expects.

**[0057]**In one embodiment, the current invention provides a measurement of packet loss rate. As defined above, a packet loss rate is derived from the number of lost packets and the number of sent packets for a given time. The difficulty in determining a packet loss rate is estimating the number of sent packets (i.e. the demand for a path). The current invention uses a heuristic approach which initiates a probe pair at a given time slot with probability p

_{loss}for estimation of the end-to-end frequency of congestion episodes {circumflex over (F)} and the mean duration of congestion episodes {circumflex over (D)}.

**[0058]**Frequency of congestion episodes refers to the number of loss events over a period of time. The frequency may also be referred to as the frequency of loss event. Duration of congestion episodes refers to the period of time a loss event persists. The duration of congestion episode may also be referred to as the duration of loss event. For example, a network may experience a packet loss event with frequency of one time per 24 hour period with a mean duration of the loss event being 15 seconds. Various methods of measuring loss event frequency and duration are well known. For example, one method of measuring loss event frequency and duration is disclosed in Sommers et al., "Improving Accuracy In End-To-End Packet Loss Measurement", Proceedings of ACM SIGCOMM '05, 2005, which is herein incorporated by reference.

**[0059]**In this heuristic approach, each probe consists of three packets, sent back-to-back. The method measures the loss rate {circumflex over (l)} of the probes during congestion episodes. Since the methodology does not identify individual congestion episodes, the method then takes an empirical approach, treating consecutive probes in which at least one packet is lost as an indication of a congestion episode. The method assumes that the end-to-end loss rate L is stationary and ergodic. Given an estimate of the frequency of congestion {circumflex over (F)}, the method estimates the end-to-end loss rate as {circumflex over (L)}={circumflex over (F)}{circumflex over (l)}. The key assumption of this heuristic is that the method treats the probe stream as a marker flow, viz., that the loss rate observed by this flow has a meaningful relationship to other flows along the path.

**[0060]**Note that the probes comprise of multiple packets (e.g., a default of 3 packets per probe). In one embodiment, the probe stream can be a Transmission Control Protocol (TCP) stream. However, the present invention is not limited to a particular type of probe stream.

**[0061]**A set of methodologies for efficient per-path monitoring have been described above. In one embodiment, SLA compliance monitoring requires accurate and efficient measurement on a network-wide basis. For example, a service provider may wish to deploy a network wide SLA compliance monitoring. However, the measurement overhead of sending probes over a full n

^{2}mesh of paths is quite expensive.

**[0062]**In one embodiment, the current method provides an economical monitoring over a subset of network paths by inferring lower bounds on the quantiles of a distribution of path performance measures using measurements from a subset of network paths. In order to describe the methodology for network wide SLA monitoring more clearly, the mathematical foundation is first provided below.

**[0063]**Let G=(V,E) be a directed graph comprising vertices (nodes) V and directed edges (links) (ν,ν

_{2}).di-elect cons.E .OR right.V×V. Let R be a set of paths (routes) i.e., each r.di-elect cons.R is an ordered set of n.sub.>0 contiguous links (ν

_{0},ν

_{1}),(ν

_{1},ν

_{2}), . . . , (ν

_{n-1},ν

_{n}). The routing matrix A associated with R is the incidence matrix of the links in the routes, namely, A

_{re}=1 if link e occurs in route r and zero otherwise.

**[0064]**The method now describes what is referred to as a scalar additive network performance model. Let x:E→ be a function on the links. This gives rise to the path function y:R→ defined as y

_{r}=Σ

_{e}.di-elect cons.rx

_{e}=Σ

_{e}.di-elect cons.EA

_{rex}

_{e}. This relation is a prototype for additive network performance models. Two examples are additive network performance model for a network delay and additive network performance model for a network loss.

**[0065]**Additive network performance model for a network delay refers to a performance model wherein the latency of a packet traversing the path r is the sum of the latencies incurred on each link of the path. This may be understood either as the x

_{e}being individual measurements, or as x

_{e}being mean latencies.

**[0066]**Additive network performance model for a network loss refers to a performance model, wherein x

_{e}is the log transmission probability of traversing link e. If there is no spatial correlation between link losses, the method can determine y

_{r}as the log transmission probability along the path r.

**[0067]**For scalar additive performance measures, suppose that the matrix A is not of full (row) rank. Then, the set of row vectors is not linearly independent. Consequently, there exists a minimal set of paths S.OR right.RS≠R which span, such that every row of a

_{r}={A

_{re}:e.di-elect cons.E} of A can be expressed as a linear combination of the {a

_{r}:rεS}. For the scalar additive performance model, this translates to saying that all {y

_{r}: r.di-elect cons.R} can be determined from the subset{y

_{r}:r.di-elect cons.S}.

**[0068]**The current method extends the computational approach described above to infer distributions of a set of path performance measures from a subset. The method assumes in a given network setting the existence of the set of paths S.OR right.RS≠R with the properties detailed above established. This means in particular that for every network path in R, every link in this path is traversed by some path in the subset R. The distributions of delay in R can be inferred from only those in S. This inference depends on the assumption that any packet traversing a given link will experience the same delay distribution, even if the actual delays differ.

**[0069]**There are two challenges in trying to extend the scalar approach to distributions. The first is dependence among link measurements. Dependence is not an issue in the linear algebra of mean quantities since the average of a linear combination of random variables is equal to same linear combination of respective averages even when the variables are dependent. Working with distributions is more complex, for example the distribution of a sum of random variables is not equal to the convolution of their distributions unless the random variables are independent. A second complexity is algebraic: there is no simple subtraction operation for distributions. For example, if X and Y are independent random variables and X=Y in distribution, it is not the case that X-Y is identically zero.

**[0070]**Now, assume the routing (and hence the matrix A) is static over a measurement interval. On each path r a stream of measurement packets labeled i=1,2 , . . . , n

_{r}is launched along the path. Packet i incurs a latency X

_{re}

^{i}on traversing the link e.di-elect cons.r. The latency of the packet on the path is Y

_{r}

^{i}=Σ

_{r}.di-elect cons.rX

_{re}

^{i}.

**[0071]**The statistical assumption is that all X

_{re}

^{i}are independent. The assumption reflects either a delay encountered by a single multicast packet or a train of closely spaced unicast packets prior to branching to distinct endpoints.

**[0072]**In the present case, the method can consider two types of dependence. In the first case, the method considers dependence between different measurements. Provided probe packets are dispatched at intervals longer than the duration of a network congestion event, then probes on the same path or on intersecting paths are unlikely to exhibit delay dependence, even if individual packets experience the distribution of congestion events similarly on the same link. Thus, it seems reasonable to model the Y

_{re}

^{i}as independent. The second case to consider is dependence among the individual link delays X

_{re}

^{i}on a given path r. Violation of this property might occur in packet streams traversing a set of links congested by the same background traffic. The link delay correlation should not be significant in a large network with a diverse traffic.

**[0073]**For r.di-elect cons.R let {b

_{rr}.sub.',:r'.di-elect cons.S} be the coefficients of the spanning set {a

_{r}.sub.'.di-elect cons.S} in the expression of a

_{r}, i.e.,

**a r**= r ' .di-elect cons. S b rr ' a r ' . ( 1 ) ##EQU00004##

**[0074]**Let S

_{r}.sup.+={r'.di-elect cons.S:b

_{rr}.sub.'>0} and S

_{r}

^{-}={r'.di-elect cons.S:b

_{rr}.sub.'<0}. Assume {a

_{r}.sub.':r'.di-elect cons.S R} is a minimal spanning set. For each r.di-elect cons.R there exist positive integers d

_{r}and {d

_{rr}.sub.':r'.di-elect cons.S} such that

**d r a r**+ r ' .di-elect cons. S r - d rr ' a r ' = r ' .di-elect cons. S r + d rr ' a r ' . ( 2 ) ##EQU00005##

**[0075]**For each r.di-elect cons.R,e.di-elect cons.E let X

_{re}.sup.(i),i=1,2, . . . denote the sum of i independent copies of a single delay on link e,e.g., X

_{re}

^{1}; likewise let Y

_{r}.sup.(i) denote the sum of i independent copies of Y

_{r}

^{i}. The symbol =

^{d}will denote equality in distribution. Then,

**Y r**( d r ) + r ' .di-elect cons. S r - Y r ' ( d rr ' ) = d r ' .di-elect cons. S r + Y r ' ( d rr ' ) . ( 3 ) ##EQU00006##

**[0076]**One can see in equation (3) a basic feature of the results that follows merely from the partition of S into S

_{r}

^{-}and S

_{r}.sup.+. Suppose one is primarily interested in determining whether Y

_{r}often takes some large value. Suppose measurements indicate that some of the {Y

_{r}, :r'εS

_{r}.sup.+} tend to take large values, but that none of the {Y

_{r}.sub.':r'.di-elect cons.S

_{r}

^{-}} do. Then, from the equality (3), Y

_{r}must also tend to take large values. If none of the {Y

_{r}.sub.':r'.di-elect cons.S} tend to take large values, then neither does Y

_{r}. But when some Y

_{r}.sub.' for r' in both S

_{r}.sup.+ and S

_{r}

^{-}tend to take large values, then it is difficult to draw conclusions about Y

_{r}.

**[0077]**Then, let y

_{r}denote the common distribution of the Y

_{r}

^{i}, and {tilde over (y)}

_{r}its Laplace transform, i.e., {tilde over (y)}

_{r}(s)=∫

_{0}.sup.∞y

_{r}(dy)e

^{-}sy. Let * denote convolution. In terms of distributions, equation (3) becomes

**y r*** d r * r ' .di-elect cons. S r - y r ' * d rr ' = * r ' .di-elect cons. S r + y r ' * d rr ' . ( 4 ) ##EQU00007##

**[0078]**In Laplace transform space, from equation (4):

**y**~ r d r r ' .di-elect cons. S r - y ~ r ' d rr ' = r ' .di-elect cons. S r + y ~ r ' d rr ' . ( 5 ) ##EQU00008##

**[0079]**Given empirical estimates of {y

_{r}.sub.':r'.di-elect cons.S} one can use numerical Laplace transform inversion to recover all y

_{r}. The current method uses equation (4) directly in order to obtain bounds on the distributions y

_{r}.

**[0080]**Let V

_{i,i}=1,2, . . . , n be independent random variables and let V=Σ

_{i}=1

^{n}V

_{i}be their sum. Let Q

_{p}(V

_{i}.) denote the p-quantile of V

_{i}, i.e.,

**Pr**[V≦s]≧pQ

_{p}(V)≦x (6)

**[0081]**The following result formalizes the statement that if one knows that V

_{1}≦x a fraction p of the time, and V

_{2}≦y a fraction q of the time, then one can conclude that ν

_{1}+ν

_{2}is less than x+y no less than a fraction pq of the time.

**[0082]**Let V

_{i,i}=1,2, . . . , n be independent random variables with sum V=Σ

_{i}=1

^{n}V

_{i}, and let p

_{i}.di-elect cons.(0,1] with p=Π

_{i}=1

^{np}

_{i}, then

**Q p**( V ) ≦ i = 1 n Q p i ( V i ) . ( 7 ) ##EQU00009##

**[0083]**For network quantile bounds, denote

**Y**

_{r}.sup.±=Σ

_{r}.sub.'.sub..di-elect cons.S:Y

_{r}.sub.'.sup.(d

^{n}.sup.'), then

**Q**

_{p}(Y

_{r})≧(d

_{r})

^{-1}Q

_{p}

_{dr}(Y

_{r}.sup.(d,))- : (i)

**Q**

_{p}(Y

_{r}.sup.(d,))≧Q

_{pq}(Y

_{r}.sup.+)-Q

_{q}(Y

_{r}-

^{-}); and (ii)

**Q**

_{p}(Y

_{r})≧(d

_{r})

^{-1}sup

_{q}.di-elect cons.(0,1](Q

_{p}

_{dr}

_{1}(Y

_{r}.sup.+)-Q

_{q}(Y

_{r}

^{-})). (iii)

**[0084]**The above inequalities for network quantiles provide a lower bound on the quantiles, or, equivalently, an upper bound on the cumulative distribution. Thus, the inequalities underestimate the frequency with which a given level is exceeded. This may or may not be desirable if the measured quantiles are to be used for detecting service level agreement violations (i.e., raising alarms). On the one hand false positives will be reduced, while at the same time some high quantiles may be underestimated. A knowledge of the topology of measured paths may be used to adjust alarm thresholds in order to mitigate the effects of quantile underestimation.

**[0085]**The method first discretizes the distributions before performing convolution. A positive discrete mass distribution is specified by a tuple (ε,n,m={m

_{i}:i=0, . . . , n}) where ε is the bin width, with a mass m

_{i}in bin [iε,(i+1)ε) for i=0,1, . . . , n-1, and mass m

_{n}in [nε,∞). The two such distributions (ε,n,m) and (ε',n',m') have convolution:

**(e,n,m)*(e',n',m')=(ε+ε',1+(n-1)(n'-1),m') (9)**

**where m**

_{j}

^{n}=Σ

_{i}=0

^{jm}

_{i}m

_{j}-i.sup.'.

**[0086]**Given ε, n, a set of measurements {Y

_{r}

^{i}:i=1,2, . . . , n

_{r}} give rise to an empirical discrete mass distribution (ε,n,m) with m

_{i}=#{Y

_{r}

^{i}:Y

_{r}

^{i}.di-elect cons.[iε,(i+1)ε)} for i=0,1, . . . , n-1 and m

_{n}=#{Y

_{r}

^{i}:Y

_{r}

^{i}≧nε}. The distribution of each {Σ

_{r}.sub.'.sub..di-elect cons.S

_{r}.sub.±Σ

_{i}=1

^{d}

^{rr}.sup.'y

_{rr}.sub.':y.s- ub.rr.sub.'.di-elect cons.Ω

_{r}} is then estimated by taking the grand convolution over r'.di-elect cons.S

_{r}.sup.± of the d

_{rr}.sub.'-fold convolutions of the empirical mass distribution generated from each #{Y

_{r}.sub.'

^{i}:Y

_{r}.sub.'

^{i}.di-elect cons.[iε,(i+1)ε)}. A target resolution ε in the final distribution is achieved by choosing resolutions ε' for the constituent distribution that sum to ε, for example, ε'=ε/Σ

_{r}.sub.'.sub..di-elect cons.S

_{r}.sub.±d

_{rr}.sub.'. Finally, the method normalizes to a probability distribution by dividing each mass element by n

_{r}.sup.±. The resulting variables are

_{r}.sup.±, and the method uses them in place of the Y

_{r}.sup.± in the inequalities (8) above.

**[0087]**FIG. 4 illustrates a flowchart of a method 400 for providing a measurement of performance for a network. For example method 400 can be implemented by application server 212. Method 400 starts in step 405 and proceeds to step 410.

**[0088]**In step 410, method 400 sends one or more probes for a path. For example, the method sends a multi-objective probe for measuring one or more performance parameters on an end-to-end path.

**[0089]**In step 420, method 400 receives the one or more probes for the path. For example, the method receives packets that were part of the above multi-objective probe. Note that some of the packets may be lost.

**[0090]**In step 430, method 400 detects or determines one or more performance measurements. For example, the method detects or determines one or more of mean delay, delay variation, packet loss rate, frequency of loss events, duration of loss events, etc. for the path. For example, one or more performance measurement methods as disclosed above can be used.

**[0091]**In optional step 440, method 400 determines network quantile bounds. For example, the method may determine bounds on network delay and/or network loss for network wide compliance monitoring. The method then proceeds back to step 410 to send additional probes or ends in step 450.

**[0092]**It should be noted that the steps that define method 400 can be conducted within the context of a larger set of measurement objectives that are constrained by the amount of probe traffic that is introduced into the network over a given period of time. This budget can be defined by the accuracy of the resulting measurements where, roughly speaking, the larger the probe budget, the higher the resulting accuracy. The budget can also be defined by the probes consumption of network bandwidth where, roughly speaking, the lower the probe bandwidth consumption, the lower the resulting accuracy. The probe budget specification can also take both impact and accuracy into consideration.

**[0093]**It should be noted that although not specifically specified, one or more steps of method 400 may include a storing, displaying and/or outputting step as required for a particular application. In other words, any data, records, fields, and/or intermediate results discussed in the method 400 can be stored, displayed and/or outputted to another device as required for a particular application. Furthermore, steps or blocks in FIG. 4 that recite a determining operation, or involve a decision, do not necessarily require that both branches of the determining operation be practiced. In other words, one of the branches of the determining operation can be deemed as an optional step.

**[0094]**FIG. 5 depicts a high level block diagram of a general purpose computer suitable for use in performing the functions described herein. As depicted in FIG. 5, the system 500 comprises a processor element 502 (e.g., a CPU), a memory 504, e.g., random access memory (RAM) and/or read only memory (ROM), a module 505 for providing a measurement of performance for a network, and various input/output devices 506 (e.g., storage devices, including but not limited to, a tape drive, a floppy drive, a hard disk drive or a compact disk drive, a receiver, a transmitter, a speaker, a display, a speech synthesizer, an output port, and a user input device (such as a keyboard, a keypad, a mouse, and the like)).

**[0095]**It should be noted that the present invention can be implemented in software and/or in a combination of software and hardware, e.g., using application specific integrated circuits (ASIC), a general purpose computer or any other hardware equivalents. In one embodiment, the present module or process 505 for providing a measurement of performance for a network can be loaded into memory 504 and executed by processor 502 to implement the functions as discussed above. As such, the present process 505 for providing a measurement of performance for a network (including associated data structures) of the present invention can be stored on a computer readable medium or carrier, e.g., RAM memory, magnetic or optical drive or diskette and the like.

**[0096]**While various embodiments have been described above, it should be understood that they have been presented by way of example only, and not limitation. Thus, the breadth and scope of a preferred embodiment should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents.

User Contributions:

Comment about this patent or add new information about this topic:

People who visited this patent also read: | |

Patent application number | Title |
---|---|

20120122730 | Methods for Quantitative Analyses of Kinase Inhibitor Selectivity Using Small Size Panels |

20120122729 | Methods and Tests for Screening Bacterial Biofilms |

20120122728 | Method Of Screening Drugs For Reversal Of Amyloid Beta Neurotoxicity |

20120122727 | IN VITRO METHOD FOR PREDICTING WHETHER A COMPOUND IS GENOTOXIC IN VIVO |

20120122726 | MARKERS FOR ENDOMETRIAL CANCER |