Top Document: Mailing list management software FAQ Previous Document: 2.04 Performance and system-load issues related to server activity Next Document: 2.06 Features and usability for administrators See reader questions & answers on this topic! - Help others by sharing your knowledge The main issue in distributing to large lists is, how quickly can you get the mail out? Most MLM's leave routing and optimization decisions up to the MTA (Mail Transport Agent, usually Sendmail under Unix), but some other systems -- notably ListProc, LISTSERV, and SmartList -- take a more active approach in managing network load. To illustrate, let's look at the path mail takes to delivery in each of these four systems. Majordomo (a good example of the "leave-it-to-Sendmail" model), once it decides to forward a message to a list, passes it to a single Sendmail process along with the addresses for the entire list. Sendmail then does what it can to optimize delivery (i.e. sorting by MX record), and starts connecting to each machine in series. The result on a 200-subscriber list, with everyone in the U.S. and most with different mail exchangers, is that there's about an hour's delay between the first delivery and the last. This delay is dependent more on the speed of the recipient machines than on the speed of the host or the network link, so it varies pretty much linearly with the number of systems Sendmail has to connect to. This can be a big problem on a large and active list, because people late in the delivery queue will see and reply to "new" messages that others have seen (and perhaps replied to) several hours before -- conversations will get "out of sync." On the other hand, much longer delays might not matter for a list that consisted only of infrequent announcements, without discussion. ListProc speeds up the delivery process for large lists by passing each message off to your MTA (usually Sendmail, but ListProc doesn't care, it just connects to port 25 with SMTP) in chunks of N addresses, where N is defineable. Depending on how the MTA is configured, delivery will usually be faster because deliveries are parallelized -- the longest list is of size N. However, Sendmail can't optimize deliveries as much, because no particular Sendmail process has the entire list to work with. Thus network efficiency can go down (not actually a big problem, because ListProc produces "chunks" sorted by domain). Also, the peak load on your system increases because several Sendmails are running at once. (To avoid extreme loads on your system, ListProc can be configured to wait some number of seconds every X addresses -- this gives Sendmail a breather. Note that CREN recommends that Zmailer be used as the MTA for large lists; Zmailer does more flexible and efficient queueing than Sendmail's, eliminating many of these problems.) SmartList uses a method similar to ListProc's, splitting up the address list into chunks of a configurable size before passing them to the MTA (via the command line, not by port 25). However, it goes one better in that instead of simply controlling the number of addresses per chunk, SmartList lets the system administrator control maximum and minimum sizes per chunk, thus letting SmartList attempt to make "smart" choices about breaks between the chunks -- e.g. it would try to avoid breaking apart a group of addresses within a domain. Also, instead of simply specifying "send this many addresses and wait," with SmartList the administrator can specify, per list, the maximum number of MTA processes that should run simultaneously, thus managing the load-vs.-speed issues more directly. Finally, LISTSERV uses a specialized "distribute" routine (described briefly in RFC1429) that takes advantage of the cooperation of LISTSERV systems around the world. First, it sends copies of the message to all "local" subscribers, connecting to your MTA through port 25 as ListProc does. Then, it scans the list to find recipients who are in other LISTSERV sites' local areas, and it passes all those addresses, with a single copy of the message, to the nearest LISTSERV core site, which assures delivery with no further involvement by your system. Finally, it takes the "leftover" addresses -- the ones it couldn't recognize as "belonging" to any particular LISTSERV site -- and delivers them as if they were local. ("Local" deliveries, passed to your own MTA, are batched using the same method as ListProc.) Especially when LISTSERV can map many of your subscribers to other LISTSERV territories (it works best with non-U.S. and .edu addresses), and if many of those subscribers are across slow or expensive links from you, this can result in very high efficiency -- for example, a U.S. message destined for 1000 European users on different hosts would cross the Atlantic just once and then "fan out." Thus, DISTRIBUTE can both reduce network load and make it possible for small machines to handle large lists -- it can also result in high speed delivery, because in the best case more than 100 servers will work on your job simultaneously. However, it helps less for lists with few distant users, and it also results in each site taking on local deliveries for others, thus increasing the local delivery load. Why doesn't everyone use DISTRIBUTE, you might ask. There are a few reasons. A big one is that it currently runs only on LISTSERV, and many people associate that with BITNET and big IBM iron (distasteful to many Unix natives). Another reason is that DISTRIBUTE isn't an *ideal* solution to mass mailings on the Internet -- it's a BITNET tool that's been moved over and works pretty well, but not ideally. The big problem is that DISTRIBUTE computes routes using static tables (updated periodically by L-Soft). This is a fine method on BITNET, where network topology is known and stable, but not perfect for the Dynamic Internet(tm). You just can't get any granularity with a static table when things are always changing around you -- in fact, *any* centralized database will be a hassle. This problem was recognized a long time ago in relation to Internet host names, and the solution was the Domain Name System. The DNS is likely to play a big part in the ideal mail-distribution system, as well: there has been talk of extending the MX-record system with a bulk-mail-recipient record, but nothing definitive has been proposed. The bottom line for now is that someday, whether it's another proprietary system or it's a new Internet standard, something will replace DISTRIBUTE ... but for now it's the most efficient, widely used mass-mailing distribution system out there. If you're willing to do the work to coordinate with other people, you can set your list up with something like DISTRIBUTE on a custom basis. Just arrange for "exploder" MLM's in appropriate places (that is, across slow or expensive links and centered in a group of subscribers), and send mail to each of these exploders as part of your mail job. They will then distribute to their own people, and you'll get the network efficiency of DISTRIBUTE plus parallelized delivery -- a nice deal! The problem is that you have to make sure everyone is subscribed at the correct site, but with a disciplined subscribership it can work. (LISTSERV actually supports this arrangement as well, as a holdover from pre-DISTRIBUTE days -- the various lists are considered to be "peered," and on request the system allocates users among the servers based on network proximity and system load.) User Contributions:Top Document: Mailing list management software FAQ Previous Document: 2.04 Performance and system-load issues related to server activity Next Document: 2.06 Features and usability for administrators Single Page [ Usenet FAQs | Web FAQs | Documents | RFC Index ] Send corrections/additions to the FAQ Maintainer: naleks@Library.UMMED.EDU
Last Update March 27 2014 @ 02:11 PM
|
Comment about this article, ask questions, or add new information about this topic: