Mailing list management software FAQSection - 2.05 Performance and system-load issues related to mail delivery

Top Document: Mailing list management software FAQ
Previous Document: 2.04 Performance and system-load issues related to server activity
Next Document: 2.06 Features and usability for administrators

See reader questions & answers on this topic! - Help others by sharing your knowledge

The main issue in distributing to large lists is, how quickly can you get the
mail out?  Most MLM's leave routing and optimization decisions up to the MTA
(Mail Transport Agent, usually Sendmail under Unix), but some other systems
-- notably ListProc, LISTSERV, and SmartList -- take a more active approach
in managing network load.  To illustrate, let's look at the path mail takes
to delivery in each of these four systems.

Majordomo (a good example of the "leave-it-to-Sendmail" model), once it
decides to forward a message to a list, passes it to a single Sendmail
process along with the addresses for the entire list.  Sendmail then does
what it can to optimize delivery (i.e.  sorting by MX record), and starts
connecting to each machine in series.  The result on a 200-subscriber list,
with everyone in the U.S.  and most with different mail exchangers, is that
there's about an hour's delay between the first delivery and the last.  This
delay is dependent more on the speed of the recipient machines than on the
speed of the host or the network link, so it varies pretty much linearly with
the number of systems Sendmail has to connect to.  This can be a big problem
on a large and active list, because people late in the delivery queue will
see and reply to "new" messages that others have seen (and perhaps replied
to) several hours before -- conversations will get "out of sync."  On the
other hand, much longer delays might not matter for a list that consisted
only of infrequent announcements, without discussion.

ListProc speeds up the delivery process for large lists by passing each
message off to your MTA (usually Sendmail, but ListProc doesn't care, it just
connects to port 25 with SMTP) in chunks of N addresses, where N is
defineable.  Depending on how the MTA is configured, delivery will usually be
faster because deliveries are parallelized -- the longest list is of size N.
However, Sendmail can't optimize deliveries as much, because no particular
Sendmail process has the entire list to work with.  Thus network efficiency
can go down (not actually a big problem, because ListProc produces "chunks"
sorted by domain).  Also, the peak load on your system increases because
several Sendmails are running at once.  (To avoid extreme loads on your
system, ListProc can be configured to wait some number of seconds every X
addresses -- this gives Sendmail a breather.  Note that CREN recommends that
Zmailer be used as the MTA for large lists; Zmailer does more flexible and
efficient queueing than Sendmail's, eliminating many of these problems.)

SmartList uses a method similar to ListProc's, splitting up the address list
into chunks of a configurable size before passing them to the MTA (via the
command line, not by port 25).  However, it goes one better in that instead
of simply controlling the number of addresses per chunk, SmartList lets the
system administrator control maximum and minimum sizes per chunk, thus
letting SmartList attempt to make "smart" choices about breaks between the
chunks -- e.g. it would try to avoid breaking apart a group of addresses
within a domain.  Also, instead of simply specifying "send this many
addresses and wait," with SmartList the administrator can specify, per list,
the maximum number of MTA processes that should run simultaneously, thus
managing the load-vs.-speed issues more directly.

Finally, LISTSERV uses a specialized "distribute" routine (described briefly
in RFC1429) that takes advantage of the cooperation of LISTSERV systems
around the world.  First, it sends copies of the message to all "local"
subscribers, connecting to your MTA through port 25 as ListProc does.  Then,
it scans the list to find recipients who are in other LISTSERV sites' local
areas, and it passes all those addresses, with a single copy of the message,
to the nearest LISTSERV core site, which assures delivery with no further
involvement by your system.  Finally, it takes the "leftover" addresses --
the ones it couldn't recognize as "belonging" to any particular LISTSERV site
-- and delivers them as if they were local.  ("Local" deliveries, passed to
your own MTA, are batched using the same method as ListProc.)  Especially
when LISTSERV can map many of your subscribers to other LISTSERV territories
(it works best with non-U.S. and .edu addresses), and if many of those
subscribers are across slow or expensive links from you, this can result in
very high efficiency -- for example, a U.S.  message destined for 1000
European users on different hosts would cross the Atlantic just once and then
"fan out."  Thus, DISTRIBUTE can both reduce network load and make it
possible for small machines to handle large lists -- it can also result in
high speed delivery, because in the best case more than 100 servers will work
on your job simultaneously.  However, it helps less for lists with few
distant users, and it also results in each site taking on local deliveries
for others, thus increasing the local delivery load.

Why doesn't everyone use DISTRIBUTE, you might ask.  There are a few reasons.
A big one is that it currently runs only on LISTSERV, and many people
associate that with BITNET and big IBM iron (distasteful to many Unix
natives).  Another reason is that DISTRIBUTE isn't an *ideal* solution to
mass mailings on the Internet -- it's a BITNET tool that's been moved over
and works pretty well, but not ideally.  The big problem is that DISTRIBUTE
computes routes using static tables (updated periodically by L-Soft).  This
is a fine method on BITNET, where network topology is known and stable, but
not perfect for the Dynamic Internet(tm).  You just can't get any granularity
with a static table when things are always changing around you -- in fact,
*any* centralized database will be a hassle.  This problem was recognized a
long time ago in relation to Internet host names, and the solution was the
Domain Name System.  The DNS is likely to play a big part in the ideal
mail-distribution system, as well: there has been talk of extending the
MX-record system with a bulk-mail-recipient record, but nothing definitive
has been proposed.  The bottom line for now is that someday, whether it's
another proprietary system or it's a new Internet standard, something will
replace DISTRIBUTE ...  but for now it's the most efficient, widely used
mass-mailing distribution system out there.

If you're willing to do the work to coordinate with other people, you can set
your list up with something like DISTRIBUTE on a custom basis.  Just arrange
for "exploder" MLM's in appropriate places (that is, across slow or expensive
links and centered in a group of subscribers), and send mail to each of these
exploders as part of your mail job.  They will then distribute to their own
people, and you'll get the network efficiency of DISTRIBUTE plus parallelized
delivery -- a nice deal!  The problem is that you have to make sure everyone
is subscribed at the correct site, but with a disciplined subscribership it
can work.  (LISTSERV actually supports this arrangement as well, as a
holdover from pre-DISTRIBUTE days -- the various lists are considered to be
"peered," and on request the system allocates users among the servers based
on network proximity and system load.)

User Contributions:

Comment about this article, ask questions, or add new information about this topic:

Archived related questions and answers

Top Document: Mailing list management software FAQ
Previous Document: 2.04 Performance and system-load issues related to server activity
Next Document: 2.06 Features and usability for administrators

Single Page

[ Usenet FAQs | Web FAQs | Documents | RFC Index ]

Send corrections/additions to the FAQ Maintainer:
naleks@Library.UMMED.EDU

Last Update March 27 2014 @ 02:11 PM

Mailing list management software FAQ
Section - 2.05 Performance and system-load issues related to mail delivery

Search the FAQ Archives

Mailing list management software FAQ
Section - 2.05 Performance and system-load issues related to mail delivery

User Contributions:

Comment about this article, ask questions, or add new information about this topic: