SQL Statements

Conceptual information on relational databases and tables is of course entirely moot if you don't have any idea of how to directly interact with your data. From a general perspective, SQL consists entirely of structured statements, with which all data in the database is added, modified, and removed. These statements form the basis for your communication with the PostgreSQL server.

The following sections dissect the anatomy of a SQL statement into its structural pieces, explaining the significance of each, and their relation to one another. The standard PostgreSQL command-line client, psql, provides output to display example PostgreSQL statements.

Our SQL examples commonly take place within an example database called booktown, the database for our imaginary bookstore, Book Town. The output from psql is consistently prefixed with a default prompt style, which looks like this:

booktown=#

Some simpler examples may use our generic test database, testdb, if not specific to the Book Town examples. By default, the psql prompt displays only the name of the connected database and the =# characters indicating that the system is ready for a new command (though you will see that the = symbol will change dynamically as psql tracks the status of SQL input). We display this prompt along with the SQL input and output in order to help familiarize you with the psql output.

Chapter 4 documents psql in more detail, and it is only mentioned here to explain the source and style of this book's SQL examples using PostgreSQL.

Note: The schema (with sample data) for the booktown database can be found in the booktown.sql file, on the CD-ROM. To install this database, type psql -U postgres template1 -f /mnt/cdrom/booktown.sql from the command line (where /mnt/cdrom is the path to your mounted CD, and postgres is your PostgreSQL superuser).

The Anatomy of a SQL Statement

SQL statements always begin with a command (a word, or group of words, that describes what action the statement will initiate). The command can be called the verb of the SQL statement, as it always describes an action to be taken. Statements typically contain one or more clauses, which are formal modifiers that further describe the function of the SQL statement.

Table 3-2 contains a list of some of the most commonly used PostgreSQL commands.

Table 3-2. Fundamental PostgreSQL commands

Command

Description

CREATE DATABASE

Creates a new database

CREATE INDEX

Creates a new index on a table column

CREATE SEQUENCE

Creates a new sequence in an existing database

CREATE TABLE

Creates a new table in an existing database

CREATE TRIGGER

Creates a new trigger definition

CREATE VIEW

Creates a new view on an existing table

SELECT

Retrieves records from a table

INSERT

Adds one or more new records into a table

UPDATE

Modifies the data in existing table records

DELETE

Removes existing records from a table

DROP DATABASE

Destroys an existing database

DROP INDEX

Removes a column index from an existing table

DROP SEQUENCE

Destroys an existing sequence generator

DROP TABLE

Destroys an existing table

DROP TRIGGER

Destroys an existing trigger definition

DROP VIEW

Destroys an existing table view

CREATE USER

Adds a new PostgreSQL user account to the system

ALTER USER

Modifies an existing PostgreSQL user account

DROP USER

Removes an existing PostgreSQL user account

GRANT

Grant rights on a database object to a user

REVOKE

Deny rights on a database object from a user

CREATE FUNCTION

Creates a new SQL function within a database

CREATE LANGUAGE

Creates a new language definition within a database

CREATE OPERATOR

Creates a new SQL operator within a database

CREATE TYPE

Creates a new SQL data type within a database

While obviously code-like in nature, SQL was designed with ease of use and readability in mind. As a result, SQL statements often bear a strong resemblance to simple, instructional English sentences. A strong feature of SQL is that its statements are designed to instruct the server what data to find, not literally how to find it, as you would be forced to do in an ordinary programming language. Reading a well-designed SQL query should be nearly as easy as reading an ordinary sentence.

Note: In SQL texts, the word query is frequently used interchangeably with statement. In order to be clear, within this book the term query is used only to refer to statements which return data (e.g., SELECT statements), rather than general SQL statements, which may instead create, add, or modify data.

Internally, PostgreSQL interprets structured SQL statements as a sequence of tokens, usually delimited by whitespace (spaces or newlines, outside of quotes), though some tokens may be placed adjacently if there is no chance of ambiguity (such as when operators are placed directly next to identifiers). A token in this context is a word or character that can be identified meaningfully by the server when the SQL statement is parsed, or interpreted.

Technically, each token can either be considered a keyword, an identifier, a quoted identifier, a constant (also called a literal ), or one of several special character symbols. Keywords are words PostgreSQL recognizes as words with pre-defined SQL or PostgreSQL-specific meanings; these include SQL commands, clauses, function names, and special noise terms, which are often accompanied optionally with SQL commands (e.g., the noise term WORK in the COMMIT command). In contrast, identifiers represent variable names for tables, columns, and any other database object.

Both keywords and identifiers reference internally defined functions, values, or records, as far as PostgreSQL is concerned. Constants, on the other hand, describe pieces of data that are interpreted literally, such as a number or character string.

Finally, a SQL statement contains special character symbols. These are reserved characters (such as parentheses, the semicolon, and square brackets) that logically affect the meaning and arrangement of your keywords, identifiers, and literals. You can think of these characters as the punctuation for your SQL statements.

Operators fall under the category of special character symbols; they can be used to imply logical operations or evaluations between data values (either literals, or represented by identifiers), and are generally between one and four characters in length.

The following sections explain and expand upon the nature of these elementary components of SQL.

Token Formatting Considerations

As described in the preceding section, each sequential element of a SQL statement is considered a token. What may not be immediately clear, however, is that tokens may be kept all on the same line, or they may be split across several lines, as extra whitespace is ignored by PostgreSQL's parser.

Consider the SQL statement in Example 3-1, which is executed first on a single line, and then executed again, split across two separate lines. Both SELECT statements instruct the database to display the entire contents of the my_list table:

Example 3-1. Spaces and newlines

testdb=# SELECT * FROM my_list;
                     todos
------------------------------------------------
 Pick up laundry.
 Send out bills.
 Wrap up Grand Unifying Theory for publication.
(3 rows)

testdb=# SELECT *
testdb-#        FROM
testdb-#        my_list;
                     todos
------------------------------------------------
 Pick up laundry.
 Send out bills.
 Wrap up Grand Unifying Theory for publication.
(3 rows)

In Example 3-1 there are several newlines and spaces between the second statement's tokens. As you can see by the identical output, PostgreSQL ignores the extra newlines and spaces, making both statements semantically equivalent. You can take advantage of this behavior by splitting a long string of tokens across numerous lines for improved readability of your SQL statement. This probably isn't necessary for statements as simple as those in Example 3-1, but it can be quite helpful when dealing with complex SQL statements with numerous clauses, expressions, and conditions. Throughout this book we will periodically split some statements over several lines to help show what each part of the statement is intended to accomplish.

Keywords and Identifiers

Keywords are any reserved SQL terms which have a reserved syntactic meaning to the server. Some common keywords are INSERT, UPDATE, SELECT, and DELETE.

All SQL commands are keywords, though many keywords themselves are not complete commands. For instance, the command INSERT INTO is a valid SQL command, and the word INTO is a reserved keyword. As you might guess, however, the word INTO has no particular significance when used out of context.

Identifiers, as described earlier, are variable names that reference database objects. These names are arbitrarily designated by the creator of the database object upon creation. The objects which can be referred to by identifiers in PostgreSQL may be databases, tables, columns, indices, views, sequences, rules, triggers, or functions.

Example 3-2 adds three pieces of information about Oregon into a simple table called states.

Example 3-2. Keywords and commands

booktown=# INSERT INTO states VALUES (33, 'Oregon', 'OR');
INSERT 3389701 1

In Example 3-2, the INSERT INTO SQL command makes use of the SQL keywords INSERT, INTO, and VALUES.

The INSERT INTO command modifies the table referenced by the states identifier. The modification in this case is the insertion of a new record.

Quoted identifiers

While not normally required, quotes can be used around identifiers, meaning they should be interpreted literally. For example, if we want to view each of the columns from a table called states, a simple statement to achieve this would ordinarily read:

booktown=# SELECT * FROM states;
 id |    name    | abbreviation
----+------------+--------------
 33 | Oregon     | OR
 42 | Washington | WA
(2 rows)

The keywords in this statement are SELECT and FROM, while the identifiers are the asterisk * (indicating all columns), and states (the table name). With this command, we are selecting all columns from a table named states and thereby viewing its contents.

You can accomplish the same thing by putting quotes around the identifier, with the following statement:

booktown=# SELECT * FROM "states";
 id |    name    | abbreviation
----+------------+--------------
 33 | Oregon     | OR
 42 | Washington | WA
(2 rows)

As you can see, the output is identical when applying quotes to a lowercase identifier. However, the following statement, which uses quotes around the stAtes identifier, will fail:

booktown=# SELECT * FROM "stAtEs"; 
ERROR: Relation 'stAtEs' does not exist

This statement fails because it instructs PostgreSQL to look for a table called, literally, stAtEs (rather than states). In other words, with the use of quotes, the statement has explicitly requested that PostgreSQL interpret the identifier name literally.

All non-quoted identifiers are folded, or converted, to lowercase. When specifying stAtEs, or STATES (i.e., any combination of uppercase or lowercase letters) without quotes, PostgreSQL automatically converts the identifier to lowercase (states) before processing the statement.

Note: The folding of unquoted identifiers to lowercase names is a PostgreSQL-specific convention. The SQL92 standard specifies that unquoted identifiers always be converted to uppercase. For both legacy and readability reasons, PostgreSQL does not intend to move to this part of the SQL92 standard.

This should be of special note to database administrators familiar with other SQL products, such as Oracle, who expect case to automatically change to uppercase. If you are a developer, and you are interested in writing easily portable applications, be sure to consider this case issue to avoid conflicts over this convention.

Since the parser can still read and understand mixed-case statements (provided that they are formed with the correct syntax), you should use uppercase and lowercase terminology carefully. Your use of case can both help and hinder your efficiency when working with a large amount of SQL.

We recommend that, for readability, you try typing identifiers in lowercase and keywords in uppercase, the convention used throughout this book. By visually separating the fixed, systematic terminology from the user-defined data objects, you make it a great deal easier to quickly read and understand complex SQL statements.

When quotes are required

The only instances where you are required to use quotes are either when a database object's identifier is identical to a keyword, or when the identifier has at least one capitalized letter in its name. In either of these circumstances, you must remember to quote the identifier both when creating the object, as well as in any subsequent references to that object (e.g., in SELECT, DELETE, or UPDATE statements).

If you do not quote an identifier that is spelled identically to an existing keyword, PostgreSQL will return an error message because it interprets the intended identifier as a keyword. For instance, if you had a table whose name was literally select, you would get an error message if you tried querying it with the following statement:

testdb=# SELECT * FROM select; 
ERROR: parser: parse error at or near "select"

As you can see, an unquoted query on a table called select produces an error message. To specify that select is in fact a table, and not a keyword, it needs to be placed inside of quotes. Therefore, the correct syntax to view a table named select is as follows.

testdb=# SELECT * FROM "select";
 selected
----------
        0
        1
       52
      105
(4 rows)

Remember that any identifiers with at least one capitalized letter must be treated similarly. For example, if you've for some reason created a table named ProDucts (notice the capitalized "P" and "D"), and you want to destroy it (as you probably should, with a name like that!), then once again the identifier needs to be quoted in order to accurately describe its name to PostgreSQL, as follow:

booktown=# DROP TABLE ProDucts;
ERROR: table "products" does not exist
booktown=# DROP TABLE "ProDucts";
DROP

This technique can be extremely useful in some circumstances, even if you never name database objects with these criteria yourself. For example, importing data through an external ODBC connection (e.g., via Microsoft Access) can result in table names with all capitalized letters. Without the functionality of quoted identifiers, you would have no way to accurately reference these tables.

Identifier validity

Both keywords and identifier names in PostgreSQL have a maximum length limit of 31 characters. Parsed keywords or identifiers over that length limit are automatically truncated. Identifiers may begin with any letter (a through z), or with an underscore, and may then be followed by letters, numbers (0 through 9), or underscores. While keywords are not permitted to start or end with an underscore, identifier names are permitted to do so. Neither keywords nor identifiers should ever begin with a number.

In the Section called When quotes are required" we described how quoted identifiers could be used to "overrule" the case insensitivity of identifiers by placing quotes around them. The same rule-bending can apply to the assertion that an identifier cannot begin with a number. While PostgreSQL will not allow you to create a table using the name 1st_bent_rule without quotes, the name is acceptable if it is surrounded with quotes.

Example 3-3 first fails in trying to create an illegally named table. It then proceeds to bend the rules with quotes.

Example 3-3. Bending rules

booktown=# CREATE TABLE 1st_bent_rule (rule_name text);
ERROR:  parser: parse error at or near "1"
booktown=# CREATE TABLE "1st_bent_rule" (rule_name text);
CREATE

Furthermore, while quotes themselves are, of course, not allowed within the set of quotes to refer to a table name, other normally illegal characters are allowed, such as spaces and ampersands. Take note that while the ANSI/ISO SQL standard forbids using identifiers with the same names as SQL keywords, PostgreSQL (like many other SQL implementations) has a similarly relaxed view on this, allowing you to force such names with quoted identifiers.

Remember that while the use of quotes can be a useful trick to know for unusual circumstances, if you wish to design portable, standard SQL statements and relations, it is best to adhere to ANSI/SIO standards whenever possible.

Constants

While much of the data in working with a database is stored on the disk and referred to via identifiers (e.g., table names, column names, and functions), there are obviously times when new data must be introduced to the system. This may be observed when inserting new records, when forming clauses to specify criteria to delete or modify, or even when performing calculations on existing records. This data is input through constants, which are sometimes called literals because they literally represent a value in a SQL statement (rather than referencing an existing value by identifier).

An implicitly typed constant is one whose type is recognized automatically by PostgreSQL's parser merely by its syntax. PostgreSQL supports five types of implicitly typed constants:

String constants

A string constant is an arbitrary sequence of characters bound by single quotes (apostrophes). These are typically used when inserting character data into a table or passing character data to any other database object. A practical example of the necessity of string constants is updating the first and last names of authors in Book Town's authors table:

booktown=# SELECT * FROM authors;
  id   | last_name |    first_name
-------+-----------+------------------
  1809 | Geisel    | Theodor Seuss
  1111 | Denham    | Ariel
 15990 | Bourgeois | Paulette
 25041 | Bianco    | Margery Williams
    16 | Alcott    | Luoisa May
   115 | Poe       | Edgar Allen
(6 rows)

Looking at this table's contents, it might stand out to you that the first_name with id 16, Louisa May has been misspelled as Luoisa May. To correct this, an UPDATE statement can be made with a string constant, as shown in Example 3-4.

Example 3-4. Using string constants

booktown=# UPDATE authors 
booktown-#        SET first_name = 'Louisa May'
booktown-#        WHERE first_name = 'Luoisa May';
UPDATE 1
booktown=# SELECT * FROM authors;
  id   | last_name |    first_name
-------+-----------+------------------
  1809 | Geisel    | Theodor Seuss
  1111 | Denham    | Ariel
 15990 | Bourgeois | Paulette
 25041 | Bianco    | Margery Williams
    15 | Poe       | Edgar Allen
    16 | Alcott    | Louisa May
(6 rows)

The UPDATE statement made in Example 3-4 uses the string constants Louisa May and Luoisa May in conjunction with the SET and WHERE keywords. This statement updates the contents of the table referenced by the authors identifier and, as shown, corrects the misspelling.

The fact that string constants are bound by single quotes presents an obvious semantic problem, however, in that if the sequence itself contains a single quote, the literal bounds of the constant are made ambiguous. To escape (make literal) a single quote within the string, you may type two adjacent single quotes. The parser will interpret the two adjacent single quotes within the string constant as a single, literal quote. PostgreSQL will also allow single quotes to be embedded by using a C-style backslash:

testdb=# SELECT 'PostgreSQL''s great!' AS example;
       example
---------------------
 PostgreSQL's great!
(1 row)
booktown=# SELECT 'PostgreSQL\'s C-style slashes are great!' AS example;
                 example
-----------------------------------------
 PostgreSQL's C-style slashes are great!
(1 row)

PostgreSQL also supports the C-style "backslash escape" sequences, which are listed in Table 3-3.

Table 3-3. PostgreSQL supported C-style escape sequences

Escape sequence

Description

\\

Literal backslash

\'

Literal apostrophe

\b

Backspace

\f

Form feed

\n

Newline

\r

Carriage return

\t

Tab

\xxx

ASCII character with the corresponding octal number xxx

Warning

As a result of the backslashes' special meaning described in Table 3-3, in order to include a backslash in the string you must escape it using a another backslash (e.g., 'A single backslash is: \\' will transform the pair of backslashes into a single backslash).

When entering two quoted character strings to PostgreSQL that are separated by some amount of whitespace, and where that whitespace includes at least one newline, the strings are concatenated and viewed as if they had been typed as one constant. This is illustrated in Example 3-5.

Example 3-5. Multiline string constants

booktown=# SELECT 'book'
booktown-#
booktown-# 'end' AS example;
 example
---------
 bookend
(1 row)

booktown=# SELECT 'bookend' AS example;
 example
---------
 bookend
(1 row)

As you can see, the semantics of the two statements is equivalent. However, at least one newline is required for this interpretation to be possible, as spaces alone would result in the following error:

booktown=# SELECT 'book' 'end' AS mistake;
ERROR:  parser: parse error at or near "'"

This error occurs because without a newline, PostgreSQL will assume that you are referring to two separate constants. If you wish to concatenate two string constants this way on a single line, PostgreSQL supports the || operator for text concatenation (see Chapter 5, for more details on this operator).

booktown=# SELECT 'book' || 'end' AS example;
 example
---------
 bookend
(1 row)

Bit string constants

Bit string constants provide a way to directly represent a binary value with an arbitrary sequence of ones and zeroes. Similarly to string constants, they are bound by single quotes, but they also must be preceded by a leading B character (which may be uppercase or lowercase). This character identifies to PostgreSQL that the forthcoming constant is a bit string, and not a normal string of character data.

Syntactically, the opening single quote must follow immediately after the leading B, and the bit string may not contain any character other than 0 or 1. While there cannot be whitespace within this string of bits, it can be continued across multiple lines just like regular string constants, as documented in the Section called String constants."

Bit string constants are generally only useful when working with tables or functions that require binary values. Example 3-6 demonstrates the use of a bit string constant upon a simple table containing raw bytes. A bit string byte is inserted into a list of bytes in the my_bytes table, and insertion is verified with a simple query.

Example 3-6. Using bit string constants

testdb=# INSERT INTO my_bytes VALUES (B'00000000');
testdb=# SELECT my_byte FROM my_bytes;
 my_byte
----------
 10000000
 10000001
 10000101
 11111111
 00000000
(5 rows)

Integer constants

Integer constants are far more frequently used than bit string constants. PostgreSQL identifies an integer constant as any token that consists solely of a sequence of numbers (without a decimal point) and that is outside of single-quotes. Technically, SQL defines integer constants as a sequence of decimal digits with no decimal point. The range of values available for an integer constant depends largely on the context within which it is used, but PostgreSQL's default for the integer data type is a 4-byte signed integer, with range from –2147483648 to 2147483647.

Integer constants are used anywhere you wish to represent a literal integer value. They are used frequently within mathematical operations, as well as in SQL commands that reference a column with an integer data type. Example 3-7 is a simple demonstration of the use of integer constants to update an author's numeric identifier via an UPDATE command.

Consider once again the authors table used in previous sections, which correlates a numeric author identifier with two character strings representing the author's first and last name. Suppose that, for administrative reasons, it has been deemed necessary that any author with an identifier of less than 100 must be modified to a value of more than 100.

The first step to correct this would be to locate any author with such an id value. An integer constant can first be used in a SELECT statement's WHERE clause to perform a less-than comparison to check.

Example 3-7. Using integer constants

booktown=# SELECT * FROM authors WHERE id < 100;
  id   | last_name |    first_name
-------+-----------+------------------
    16 | Alcott    | Louisa May
(1 row)

booktown=# SELECT * FROM authors WHERE id = 116;
  id   | last_name |    first_name
-------+-----------+------------------
(0 rows)

booktown=# UPDATE authors 
booktown-#    SET id = 116 
booktown-#  WHERE id = 16;
UPDATE 1
booktown=# SELECT * FROM authors WHERE id = 116;
  id   | last_name |    first_name
-------+-----------+------------------
   116 | Alcott    | Louisa May
(1 row)

In Example 3-7, the WHERE clause in the SELECT statement compares the id column identifier against an integer constant of 100, returning one row. Once the author with the offending id is found, a second SELECT statement is issued to check for an existing author with an id of 116. This is to verify that the new id is not in use by another author within the authors table, as this column has been specified as requiring a unique identifier. Finally, an UPDATE statement is executed, again using integer constants in both the SET and WHERE clauses.

Floating-point constants

A floating-point constant is similar to an integer constant, but it is used to represent decimal values as well as whole integers. These are required whenever such a floating-point value must be represented literally within a SQL statement.

A floating-point constant can be represented in several forms, as shown in Table 3-4. Each occurrence of ## represents one or more digits.

Table 3-4. Floating-point representations

Representation

Example

##.##

6.4

##e[+-]##]

8e-8

[##].##[e[+-]##]

.04e8

##.[##][e[+-]##]

4.e5

In the first form, there must be at least one digit before or after the decimal point for PostgreSQL to recognize the value as a floating-point constant versus an integer constant. The other options involve having at least one digit before or after an exponent clause, denoted by the e in the list. The presence of either the decimal point, the exponent clause, or both, distinguishes an integer constant from a floating-point.

Each of these valid formats is represented in Example 3-8 through a simple SQL SELECT statement illustrating a variety of floating-point conventions.

Example 3-8. Valid floating-point values

booktown=# SELECT .04 AS small_float,
booktown-#    -16.63 AS negative_float,
booktown-#       4e3 AS exponential_float,
booktown-#    6.1e-2 AS negative_exponent;
 small_float | negative_float | exponential_float | negative_exponent
-------------+----------------+-------------------+-------------------
        0.04 |         -16.63 |              4000 |             0.061
(1 row)

Boolean constants

Boolean constants are much simpler than any other constant values recognized by PostgreSQL, as they may consist only of two possible values: true and false. When PostgreSQL encounters either of these terms outside of single quotes, they are implicitly interpreted as Boolean constants, rather than a string constant. Example 3-9 shows this important distinction.

Example 3-9. The difference between true and 'true'

testdb=# SELECT true AS boolean_t,
testdb-#        'true' AS string_t,
testdb-#        false AS boolean_f,
testdb-#        'false' AS string_f;
 bool_t | string_t | bool_f | string_f
--------+----------+--------+----------
 t      | true     | f      | false
(1 row)

When the terms true and false are parsed by PostgreSQL outside of single quotes, they are implied Boolean values. As shown in Example 3-9, PostgreSQL displays values which are literally of the type boolean as t or f, though be careful not to try to use only t or f as Boolean constant values, as this will not be interpreted correctly by PostgreSQL, and will cause an error.

Special Character Symbols

Special character symbols are characters with a pre-defined syntactic meaning in PostgreSQL. They are typically disallowed from being used in identifier names for this reason, though as mentioned in the section on quoted identifiers, this restriction can usually be worked around with quotes if need be.

Punctuation symbols

Some special character symbols help to make up the "punctuation" of a SQL statement, much like parentheses, periods and commas do in the English language. Table 3-5 shows some common PostgreSQL-recognized syntactic symbols.

Table 3-5. Punctuation Symbols

Character

Definition

* (asterisk)

Used with the SELECT command to query all columns in the table, and with the count() aggregate function to count all rows in a table.

() (parentheses)

Used to group expressions, enforce operator precedence, and to make function calls. The use of parentheses is highly subjective to the context in which they are used.

[] (brackets)

Used in the selection of specific elements in an array, or in the declaration of an array type (e.g., with the CREATE TABLE command).

; (semicolon)

Used to terminate a SQL command. The only place it can be used within a statement is within a string constant or quoted identifier.

, (comma)

Some commands use the comma to separate elements within a list.

. (period)

Used in floating-point constants (e.g., 3.1415), as well as to reference column names as children of tables (e.g., table_name.column_name).

: (colon)

Used to select slices from arrays.

$ (dollar sign)

Used in the body of a function definition to represent a positional parameter, or argument.

Operator symbols

An operator is another type of special character symbol; it is used to perform operations on identifiers or constants, returning resultant values. Operators can be used for mathematical operations, such as addition, as well as to perform comparison and logical operations.

Consider again the books table, and its numeric author_id field. Recall that the author_id column is an integer used to identify an author. Now imagine that, due to a system modification, all author identifiers must be incremented by 1,500. This can be achieved by evaluating the result of an operation (an operator expression) in an UPDATE statement upon the author_id column. This requires use of the addition (+) operator. An example of this can be seen in Example 3-10.

Example 3-10. Operators in statements

booktown=# SELECT * FROM books;
  id  |            title            | author_id | subject_id
------+-----------------------------+-----------+------------
 7808 | The Shining                 |      4156 |          9
  156 | The Tell-Tale Heart         |        15 |          9
 4513 | Dune                        |      1866 |         15
 4267 | 2001: A Space Odyssey       |      2001 |         15
 1608 | The Cat in the Hat          |      1809 |          2
 1590 | Bartholomew and the Oobleck |      1809 |          2
(6 rows)

booktown=# UPDATE books SET author_id = author_id + 1500; 
UPDATE 6
booktown=# SELECT * FROM books;
  id  |            title            | author_id | subject_id
------+-----------------------------+-----------+------------
 7808 | The Shining                 |      5656 |          9
  156 | The Tell-Tale Heart         |      1515 |          9
 4513 | Dune                        |      3366 |         15
 4267 | 2001: A Space Odyssey       |      3501 |         15
 1608 | The Cat in the Hat          |      3309 |          2
 1590 | Bartholomew and the Oobleck |      3309 |          2
(6 rows)

As you can see in Example 3-10, each author_id record is modified with the results of the + operator's operation upon the previous author_id value.

Common operators that you are may already familiar with include the basic mathematical operators: the + sign for the addition of two numeric values, the - sign for the subtraction of one numeric value from another, etc. Some of the more esoteric operators include the bitwise & and | operators, which modify binary values at the bit level.

In addition to these character symbol operators, it's important to remember the SQL keywords, which are frequently called operators as well. Most notably, this includes the logical operators AND, OR, and NOT. While technically keywords, these terms are grouped with the operators because of their operational effect upon constants and identifiers.

Table 3-6 lists some fundamental PostgreSQL operators.

Table 3-6. Fundamental PostgreSQL operators

Category

Operator

Definition

Mathematical operators

+ (addition)

Adds two numeric types

- (subtraction)

Subtracts one numeric type from another

/ (division)

Divides one numeric type by another

* (multiplication)

Multiplies one numeric type by another

! (factorial)

Returns an integer's factorial

@ (absolute value)

Returns the absolute value of a numeric value

Comparison operators

= (equivalence)

Compares two values for equivalence

< (less than)

Evaluates whether or not one number is less than another

> (greater than)

Evaluates whether or not one number is larger than another

~ (regular expression)

Performs a regular expression comparison on text values

Logical operators

AND

Returns true if both Boolean conditions are true

OR

Returns true if at least one of two Boolean conditions is true

NOT

Returns the opposite of a Boolean condition

While many operators have various connotations depending on their context, the = operator is an especially important one due to its meaning when used with an UPDATE statement's SET clause.

While in most expressions the = operator is an equivalence operator (used to compare two values for equivalence), when following the SET clause and an identifier name in an UPDATE statement, the = is read as an assignment operator. This means that it is used to assign a new value to an existing identifier, as the SET term implies.

For more information on operators, see the Section called Operators in Chapter 5."

Comments

Comments are blocks of text that, through special character sequences, can embed non-SQL text within SQL code. These can be used within blocks of code, because PostgreSQL removes the commented areas from the input stream and treats it as whitespace. There are two styles of comments available: single-line comments, and multiline comments.

Single-line comments are preceded by two dashes (- -) and may either be on a line by themselves, or they may follow valid SQL tokens. (The comments themselves are not considered tokens to PostgreSQL's parser, as any character data following the - - sequence, up to the end of the line, is treated as whitespace.) This is demonstrated in Example 3-11.

Example 3-11. Single-line comments

testdb=# SELECT 'Test' -- This can follow valid SQL tokens,
testdb-#               -- or be on a line of it own.
testdb-# AS example;
 example
---------
 Test
(1 row)

Multiline comments begin with a sequential slash-asterisk (/*) sequence, and terminate with a sequential asterisk-slash (*/) sequence. This style of commenting may already be familiar to C programmers, but there is one key difference between PostgreSQL's interpreter and the C language interpreter: PostgreSQL comments may be nested. Therefore, when you create a multiline comment within another multiline comment, the */ used to close the inner comment does not also close the outer comment. Example 3-12 provides a comment explanation.

Example 3-12. Multiline comments

testdb=# SELECT 'Multi' /* This comment extends across
testdb*#                 * numerous lines, and can be
testdb*#                 * /* nested safely */ */
testdb-# || '-test' AS example;
  example
------------
 Multi-test
(1 row)

Nesting comments can be useful if you have a file containing SQL syntax of which you wish to comment a large portion before sending to PostgreSQL for interpreting and execution. If you have already used multiline comments within that document and you wish to comment a large section which includes those comments, PostgreSQL is intelligent enough to recognize that a closing comment sequence (*/) closes only the most recently opened comment, not the entire commented region.

Note: The asterisk character by itself (without an adjacent slash character) has no special meaning within a comment. The extra asterisks in Example 3-12 on multiline comments are provided only for aesthetic purposes and readability.

Putting It All Together

In summary, a SQL statement is comprised of tokens, where each token can represent either a keyword, identifier, quoted identifier, constant, or special character symbol. Table 3-7 uses a simple SELECT statement to illustrate a basic, but complete, SQL statement and its components.

Table 3-7. A simple SQL query

SELECT

id, name

FROM

states

Token Type

Keyword

Identifiers

Keyword

Identifier

Description

Command

Id and name columns

Clause

Table name

As shown in the table, the SELECT statement contains the keywords SELECT and FROM. Together, the FROM keyword and states token compose a clause, as they modify and further describe the SELECT command.

The id, name, and states tokens are the identifiers of the statement. The id and name identifiers specify the selected columns, while the states identifier specifies the table name to select from. Therefore, with the preceding SQL query, you are instructing PostgreSQL to display the columns named id and name for each row from the states table. Example 3-13 shows the output this query generates within the booktown database.

Example 3-13. Example SQL query

booktown=# SELECT id, name FROM states;
 id |    name
----+------------
 42 | Washington
 51 | Oregon
(2 rows)

booktown=#

Getting more complicated, Table 3-8 and Table 3-9 break down another example statement. This statement uses the UPDATE command, along with SET and WHERE clauses, which respectively specify with what to update the records, and how to find the records to update.

Table 3-8. UPDATE example: the SET clause

UPDATE

states

SET

id

=

51

keyword

identifier

keyword

identifier

operator

integer constant

command

table name

clause

column

assignment

new id value

Table 3-9. UPDATE example: the WHERE clause

WHERE

name

=

'Oregon'

keyword

identifier

operator

string constant

clause

column name

equivalence

string value to match

When executed, this statement examines each record's name column to find matches for the WHERE clause's stated condition (equivalence to the string constant 'Oregon'). Then, for each row which matches that condition, it updates the id column with the value 51.

Breaking it down, this UPDATE statement has three keywords, three identifiers, two operators, and two constants. The keywords are UPDATE (the SQL command), SET (specifies the updates to make), and WHERE (identifies the rows to update). The identifiers are the states table name, the id column name, and the name column name.

The operators are both represented by the = operator. When used with the SET clause, this operator is used for assignment (to assign a new value to an existing record's identified column); this is a special use which is unique to the SET clause. In contrast, when used with the WHERE clause, the = operator is used to check equivalence between values. In this case, this means that the equivalence operator will check the value of a record's name column against a string constant with the value of Oregon.

Finally, the constants in this statement are the integer constant 51 (the new value for the id column), and the string constant Oregon (compared to the name column through the WHERE clause).

Example 3-14 therefore updates the states table by setting the id column to 51 whenever the name column matches the value Oregon. It then checks the results of that UPDATE statement with another SELECT statement.

Example 3-14. A SQL update

booktown=# UPDATE states 
booktown-#    SET id = 51
booktown-#  WHERE name = 'Oregon';
UPDATE 1
booktown=# SELECT * FROM states 
booktown-#  WHERE name = 'Oregon';
 id |  name  | abbreviation
----+--------+--------------
 51 | Oregon | OR
(1 row)