When a web site gets more requests than it can handle it can become slow and unresponsive. In the worst case too many requests to a web site can cause the server to completely overload, stop handling requests and possibly even crash. This can be a problem for any kind of server application, not just Zope. The obvious solution to this problem is to use more than one computer, so in case one computer fails, another computer can continue to serve up your web site.
Using multiple computers has obvious benefits, but it also has some drawbacks. For example, if you had five computers running Zope then you must ensure that all five Zope installations have the same information on them. This is not a very hard task if you're the only user and you have only a few static objects, but for large organizations with thousands of rapidly changing objects, keeping five separate Zope installations synchronized manually would be a nightmare. To solve this problem, Zope Corporation created Zope Enterprise Objects, or ZEO. This chapter gives you a brief overview on installing ZEO, but there are many other options we don't cover. For more in-depth information, see the documentation that comes with the ZEO package, and also take a look at the ZEO discussion area.
ZEO is a system that allows you to run your site on more than one computer. This is often called clustering and load balancing. By running Zope on multiple computers, you can spread the requests evenly around and add more computers as the number of requests grows. Further, if one computer fails or crashes, other computers can still service requests while you fix the broken one.
ZEO runs Zope on multiple computers and takes care of making sure all the Zope installations share the exact same database at all times. ZEO uses a client/server architecture. The Zope installations on multiple computers are the ZEO Clients. All of the clients connect to one, central ZEO Storage Server, as shown in Figure 11-1.
Figure 11-1 Simple ZEO illustration
The terminology can be a bit confusing, because normally you think of Zope as a server, not a client. When using ZEO, your Zope processes act as both servers (for web requests) and clients (for data from the ZEO server).
ZEO clients and servers communicate using standard Internet protocols, so they can be in the same room or in different countries. ZEO, in fact, can distribute a Zope site all over the world. In this chapter we'll explore some interesting ways you can distribute your ZEO clients.
ZEO serves many hits in a fail-safe way. If your site does not get millions of hits, then you probably don't need ZEO. There is no hard-and-fast rule about when you should and should not use ZEO, but for the most part you should not need to run ZEO unless:
All of these cases are fairly advanced, high-end uses of Zope. Installing, configuring, and maintaining systems like these requires advanced system administration knowledge and resources. Most Zope users will not need ZEO, or may not have the expertise necessary to maintain a distributed server system like ZEO. ZEO is fun, and can be very useful, but before jumping head-first and installing ZEO in your system you should weigh the extra administrative burden ZEO creates against the simplicity of running just a simple, stand-alone Zope.
The most common ZEO setup is one ZEO server and multiple ZEO clients. Before installing and configuring ZEO though, consider the following issues:
ZEO is not distributed with Zope, you must download it from the Products Section of Zope.org.
Installing ZEO requires a little bit of manual preparation. To install ZEO, download the ZEO-1.0.tgz from the Zope.org web site and place it in your Zope installation directory. Now, unpack the tarball. On Unix, this can be done with the following command:
$ tar -zxf ZEO-1.0.tgz
On Windows, you can unpack the archive with WinZip. Before installing ZEO, make sure you back up your Zope system first.
Now you should have a ZEO-1.0 directory. Next, you have to copy some files into your Zope top level lib/python directory. This can be done on UNIX with:
$ cp -R ZEO-1.0/ZEO lib/python
If you're running windows, you can use the following DOS commands to copy your ZEO files:
C:\...Zope\>xcopy ZEO-1.0\* lib\python /S
Now, you have to create a special file in your Zope root directory called custom_zodb.py. In that file, put the following python code:
import ZEO.ClientStorage Storage=ZEO.ClientStorage.ClientStorage(('localhost',7700))
This will configure your Zope to run as a ZEO client. If you pass
ClientStorage a tuple, as this code does, the tuple must have two
elements, a string which contains the address to the server, and the
port that the server is listening on. In this example, we're going to
show you how to run both the clients and the servers on the same
machine, so the machine name is set to
Now, you have ZEO properly configured to run on one computer. Try it out by first starting the server. Go to your Zope top level directory in a terminal window or DOS box and type:
python lib/python/ZEO/start.py -p 7700
This will start the ZEO server listening on port 7700 on your computer. Now, in another window, start up Zope like you normally would, with the z2.py script:
$ python z2.py -D ------ 2000-10-04T20:43:11 INFO(0) client Trying to connect to server ------ 2000-10-04T20:43:11 INFO(0) ClientStorage Connected to storage ------ 2000-10-04T20:43:12 PROBLEM(100) ZServer Computing default pinky ------ 2000-10-04T20:43:12 INFO(0) ZServer Medusa (V1.19) started at Wed Oct 4 15:43:12 2000 Hostname: pinky.zopezoo.org Port:8080
Notice how in the above example, Zope tells you client Trying to connect to server and then ClientStorage Connected to storage. This means your ZEO client has successfully connected to your ZEO server. Now, you can visit http://localhost:8080/manage (or whatever URL your ZEO client is listening on) and log into Zope as usual.
As you can see, everything looks the same. Go to the Control Panel and click on Database Managment. Here, you see that Zope is connected to a ZEO Storage and that its state is connected.
Running ZEO on one computer is a great way to familiarize yourself with ZEO and how it works. Running ZEO on one computer does not, however, improve the speed of your site, and in fact, it may slow it down just a little. To really get the speed benefits that ZEO provides, you need to run ZEO on several computers, which is explained in the next section.
Setting up ZEO to run on multiple computers is very similar to running ZEO on one computer. There are generally two steps, the first step is to start the ZEO server, and the second step is to start one or more ZEO clients.
For example, let's say you have four computers. One computer named zooserver will be your ZEO server, and the other three computers, named zeoclient1, zeoclient2 and zeoclient3, will be your ZEO clients.
The first step is to run the server on zooserver. To tell your ZEO server to listen on the tcp socket at port 9999 on the zooserver interface, run the server with the start.py script like this:
$ python lib/python/ZEO/start.py -p 9999 -h zooserver.zopezoo.org
This will start the ZEO server. Now, you can start up your clients by going to each client and configuring each of them with the following custom_zodb.py:
import ZEO.ClientStorage Storage=ZEO.ClientStorage.ClientStorage(('zooserver.zopezoo.org',9999))
Now, you can start each client's z2.py script as shown in the previous section, Installing and Running ZEO. Notice how the host and port for each client is the same, this is so they all connect to the same server. By following this procedure for each of your three clients you will have three different Zope's all serving the same Zope site. You can verify this by going visiting port 8080 on all three of your ZEO client machines.
You probably want to run ZEO on more than one computer so that you can take advantage of the speed increase this gives you. Running more computers means that you can serve more hits per second than with just one computer. Distributing the load of your web site's visitors however does require a bit more elaboration in your system. The next section describes why, and how, you distribute the load of your visitors among many computers.
In the previous example you have a ZEO server named zooServer and three ZEO clients named zeoclient1, zeoclient2, and zeoclient3. The three ZEO clients are connected to the ZEO server and each client is verified to work properly.
Now you have three computers that serve content to your users. The next problem is how to actually spread the incoming web requests evenly among the three ZEO clients. Your users only know about www.zopezoo.org, not zeoclient1, zeoclient2 or zeoclient3. It would be a hassle to tell only some users to use zeoclient1, and others to use zeoclient3, and it wouldn't be very good use of your computing resources. You want to automate, or at least make very easy, the process of evenly distributing requests to your various ZEO clients.
There are a number of solutions to this problem, some easy, some advanced, and some expensive. The next section goes over the more common ways of spreading web requests around various computers using different kinds of technology, some of them based on freely-available or commercial software, and some of them based on special hardware.
The easiest way to distribute requests across many web servers is to pick from a list of mirrored sites, each of which is a ZEO client. Using this method requires no extra software or hardware, it just requires the maintenance of a list of mirror servers. By presenting your users with a menu of mirrors, they can use to choose which server to use.
Note that this method of distributing requests is passive (you have no active control over which clients are used) and voluntary (your users need to make a voluntary choice to use another ZEO client). If your users do not use a mirror, then the requests will go to your ZEO client that serves www.zopezoo.org.
If you do not have any administrative control over your mirrors, then this can be a pretty easy solution. If your mirrors go off-line, your users can always choose to come back to the master site which you do have administrative control over and choose a different mirror.
On a global level, this method improves performance. Your users can choose to use a server that is geographically closer to them, which probably results in faster access. For example, if your main server was in Portland, Oregon on the west coast of the USA and you had users in London, England, they could choose your London mirror and their request would not have to go half-way across the world and back.
To use this method, create a property in your root folder of type lines named "mirror_servers". On each line of this property, put the URL to your various ZEO clients, as shown in Figure 11-2.
Figure 11-2 Figure of property with URLs to mirrors
Now, add some simple DTML to your site to display a list of your mirrors:
<h2>Please choose from the following mirrors: <ul> <dtml-in mirror_servers> <li><a href="&dtml-sequence-item;"><dtml-var sequence-item></a></li> </dtml-in> </ul>
This DTML displays a list of all mirrors your users can choose from. When using this model, it is good to name your computers in ways that assist your users in their choice of mirror. For example, if you spread the load geographically, then choose names of countries for your computer names.
Alternatively, if you do not want users voluntarily choosing a mirror, you can have the index_html method of your www.zopezoo.org site issue HTTP redirects. For example, use the following code in your www.zopezoo.org site's index_html method:
This code will redirect any visitors to www.zopezoo.org to a random mirror server.
The Domain Name System, or DNS, is the Internet mechanism that translates computer names (like "www.zope.org") into numeric addresses. This mechanism can map one name to many addresses.
The simplest method for load-balancing is to use round-robin DNS, as illustrated in Figure 11-3.
Figure 11-3 Load balancing with round-robin DNS.
When www.zopezoo.org gets resolved, BIND answers with the address of either zeoclient1, zeoclient2, or zeoclient3 - but in a rotated order every time. For example, one user may resolve www.zopezoo.org and get the address for zeoclient1, and another user may resolve www.zopezoo.org and get the address for zeoclient2. This way your users are spread over the various ZEO clients.
This not a perfect load balancing scheme, because DNS resolve information gets cached by the other nameservers on the net. Once a user has resolved www.zopezoo.org to a particular ZEO client, all subsequent requests for that user also go to the same ZEO client. The final result is generally alright, because the total sum of the requests are really spread over your various ZEO clients.
One down-side to this solution is that it can take from hours to days for name servers to refresh their cached copy of what they think the address of www.zopezoo.org is. If you are not responsible for the maintenance of your ZEO clients and one fails, then 1/Nth of your users (where N is the number of ZEO clients) will not be able to reach your site until their name server cache refreshes.
Configuring your DNS server to do round-robin name resolution is a pretty advanced technique that is not covered in this book. A good reference on how to do this can be found in the Apache Documentation.
Distributing the load with round-robin DNS is useful, and cheap, but not 100% effective. DNS servers can have strange caching policies, and you are relying on a particular quirk in the way DNS works to distribute the load. The next section describes a more complex, but much more powerful way of distributing load called Layer 4 Switching.
Layer 4 switching lets one computer transparently hand requests to a farm of computers. This is a pretty advanced technique that is beyond the scope of this book, but it is worth pointing out several products that do Layer 4 switching for you.
Layer 4 switching involves a switch that, according to your preferences, chooses from a group of ZEO clients whenever a request comes in, as shown in Figure 11-4.
Figure 11-4 Illustration of Layer 4 switching
There are hardware and software Layer 4 switches. There are a number of software solutions, but one in general that stands out is the Linux Virtual Server (LVS). This is an extension to the free Linux operating system that lets you turn a Linux computer into a Layer 4 switch. More information on the LVS can be found on its web site.
There are also a number of hardware solutions that claim higher performance than software based solutions like LVS. Cisco Systems has a hardware router called LocalDirector that works as a Layer 4 switch, and Alteon also makes a popular Layer 4 switch.
Without ZEO, your entire Zope system is a single point of failure. ZEO allows you to spread that point of failure around to many different computers. If one of your ZEO clients fails, other clients can answer requests on the failed clients behalf.
Note that as of this writing, the single point of failure can't be entirely eliminated, because there is still one central storage server. The methods described in this section, however, do minimize the risks of failure by spreading most of Zope across many computers.
What this means is that, while this does remove a lot of risk away from your web servers as a single point of failure, it does not eliminate all risk because now the ZEO server is a single point of failure. There are several ways of dealing with this issue.
One popular method is to accept the single point of failure risk and mitigate that risk as much as possible by using very high-end, reliable equipment for your ZEO server, frequently backing up your data, and using inexpensive, off-the-shelf hardware for your ZEO clients. By investing the bulk of your infrastructure budget on making your ZEO server rock solid (redundant power supplies, RAID, and other fail-safe methods) you can be pretty well assured that your ZEO server will remain up, even if a handful of your inexpensive ZEO clients fail.
Some applications, however, require absolute 100% up-time. There is still a chance, with the solution described above, that your ZEO server will fail. If this happens, you want a backup ZEO server to jump in and take over for the failed server right away.
Like Layer 4 switching, there are a number of products, software
and hardware, that help you mitigate this kind of risk. One
popular software solution for linux is called
fake. Fake is a Linux based
utility that can make a backup computer take over for a failed
primary computer by "faking out" network addresses. When used
in conjunction with monitoring utilities like
heartbeat, fake can guarantee almost
100% up-time of your ZEO server and Layer 4 switches. Using
fake in this way is beyond the scope of this book.
So far, we've explained these techniques for mitigating a single point of failure:
The final piece of the puzzle is the ZEO server itself, and where it stores its information. If your primary ZEO server fails, how can your backup ZEO server ensure it has the most recent information that was contained in the primary server? As usual, there are several ways to solve this problem, and they are covered in the next section.
Before explaining the details of how the ZEO server works, it is worth understanding some details about how Zope storages work in general.
Zope does not save any of its object or information directly to disk. Instead, Zope uses a storage component that takes care of all the details of where objects should be saved.
This is a very flexible model, because Zope no longer needs to be concerned about opening files, or reading and writing from databases, or sending data across a network (in the case of ZEO). Each particular storage takes care of that task on Zope's behalf.
For example, a plain, stand-alone Zope system can be illustrated in Figure 11-5.
Figure 11-5 Zope connected to a filestorage
You can see there is one Zope application which plugs into a FileStorage. This storage, as its name implies, saves all of its information to a file on the computer's filesystem.
When using ZEO, you simple replace the FileStorage with a ClientStorage, as illustrated in Figure 11-6.
Figure 11-6 Zope with a Client Storage and Storage server
Instead of saving objects to a file, a ClientStorage sends objects over a network connection to a Storage Server. As you can see in the illustration, the Storage Server uses a FileStorage to save that information to a file on the ZEO server's filesystem.
Storages are interchangeable and easy to implement. Because of their interchangeable nature, ZEO Storage Servers can use ZEO ClientStorages to pass on object data to yet another ZEO Storage Server. This is illustrated in Figure 11-7.
Figure 11-7 Multi-tiered ZEO system
Here, you can see a number of ZEO clients funnel down through three ZEO servers, which in turn act as ZEO clients themselves and funnel down into the final, central ZEO server than saves its information in a FileStorage. Now, that central ZEO server is the single point of failure in the system. If any of your other clients, or intermediate servers fail, the system will still continue to work, but if the central server fails, then you need an alternative.
fake you can have a back-up storage server strategy, but
this method is not very well proven and hasn't been explored by
the authors. In the future, ZEO will have a "multiple-server"
feature, that allows a group of storage servers to act as a
quorum, so if one or more storage servers fail, the remaining
servers in the quorum can continue to serve objects.
There are a number of advantages to an approaches like these, especially if you are interested in creating a massively distributed network object database. Of course, with any system of advantages, there are some drawbacks as well, which are discussed in the next section.
For the most part, running ZEO is exactly like running Zope by itself, but there are a few issues to keep in mind.
First, it takes longer for information to be written to the Zope object database. This does not slow down your ability to use Zope (because Zope does not block you during this write operation) but it does increase your chances of getting a ConflictError. Conflict errors happen when two ZEO clients try to write to the same object at the same time. One of the ZEO clients wins the conflict and continues on normally. The other ZEO client looses the conflict and has to try again.
Conflict errors should be as infrequent as possible because they could slow down your system. While it's normal to have a few conflict errors (due to the concurrent nature of Zope) it is abnormal to have a lot of conflict errors. The pathological case is when more than one ZEO client tries to write to the same object over and over again very quickly. In this case, there will be lots of conflict errors, and therefore lots of retries. If a ZEO client tries to write to the database three times and gets three conflict errors in a row, then the request is aborted and the data is not written.
Because ZEO takes longer to write this information, the chances of getting a ConflictError are higher than if you are not running ZEO. Because of this, ZEO is more write sensitive than running Zope without ZEO. You may have to keep this in mind when you are designing your network or application. As a rule of thumb, more and more frequent writes to the database increase your chances of getting a ConflictError. On the flip side, faster and more reliable network connections and computers lower your chances of getting a ConflictError. By taking these two factors into account, conflict errors can be mostly avoided.
Finally, as of this writing, there is no built in encryption or authentication between ZEO servers and clients. This means that you must be very careful about who you expose your ZEO servers to. If you leave your ZEO servers open to the whole Internet, then anyone can connect to your ZEO server and write data into your database, and that can be bad news.
This is not an unsolveable problem however, because you can use other tools, like firewalls, to protect your ZEO servers. If you are running a ZEO client/server connection over an unsecure network and you want guarantee that your information is kept private, you can use tools like OpenSSH and stunnel to set up secure, encrypted communication channels between your ZEO clients and servers. How these tools work and how to set them up is beyond the scope of this book, but both packages are adequately documented on their web sites. For more information on firewalls, with Linux in particular, we recommend the book "Linux Firewalls" by Robert Ziegler, which is published by New Riders.
In this chapter we looked at ZEO, and how ZEO can substantially increases the capacity of your website. In addition to running ZEO on one computer to get familiarized, we looked at running ZEO on many computers, and various techniques for spreading the load of your visitors among those many computers.
ZEO is not a magic bullet solution, and like other system designed to work with many computers, it adds another level of complexity to your web site. This complexity pays off however when you need to serve up lots of dynamic content to your audience.