3.3. Manipulating files

3.3.1. Viewing file properties More about ls

Besides the name of the file, ls can give a lot of other information, such as the file type, as we already discussed. It can also show permissions on a file, file size, inode number, creation date and time, owners and amount of links to the file. With the -a option to ls, files that are normally hidden from view can be displayed as well. These are files that have a name starting with a dot. A couple of typical examples include the configuration files in your home directory. When you've worked with a certain system for a while, you will notice that tens of files and directories have been created that are not automatically listed in a directory index. Next to that, every directory contains a file named just dot (.) and one with two dots (..), which are used in combination with their inode number to determine the directory's position in the file system's tree structure.

You should really read the Info pages about ls, since it is a very common command with a lot of useful options. Options can be combined, as is the case with most UNIX commands and their options. A common combination is ls -al; it shows a long list of files and their properties as well as the destinations that any symbolic links point to. ls -latr displays the same files, only now in reversed order of the last change, so that the file changed most recently occurs at the bottom of the list. Here are a couple of examples:

Albums/  Radio/  Singles/  gene/  index.html

krissie:~/mp3>ls -a
./   .thumbs  Radio     gene/
../  Albums/  Singles/  index.html

krissie:~/mp3>ls -l Radio/
total 8
drwxr-xr-x    2 krissie krissie  4096 Oct 30  1999 Carolina/
drwxr-xr-x    2 krissie krissie  4096 Sep 24  1999 Slashdot/

krissie:~/mp3>ls -ld Radio/
drwxr-xr-x    4 krissie krissie  4096 Oct 30  1999 Radio/

krissie:~/mp3>ls -ltr
total 20
drwxr-xr-x    4 krissie krissie  4096 Oct 30  1999 Radio/
-rw-r--r--    1 krissie krissie   453 Jan  7  2001 index.html
drwxrwxr-x   30 krissie krissie  4096 Oct 20 17:32 Singles/
drwxr-xr-x    2 krissie krissie  4096 Dec  4 23:22 gene/
drwxrwxr-x   13 krissie krissie  4096 Dec 21 11:40 Albums/

On most Linux versions ls is aliased to color-ls by default. This feature allows to see the file type without using any options to ls. To achieve this, every file type has its own color. The standard scheme is in /etc/DIR_COLORS:

Table 3-5. Color-ls default color scheme

ColorFile type
redcompressed archives
whitetext files
flashing redbroken links

More information is in the man page. The same information was in earlier days displayed using suffixes to every non-standard file name. For mono-color use (like printing a directory listing) this scheme is still in use:

Table 3-6. Default suffix scheme for ls

CharacterFile type
nothingregular file
*executable file
|named pipe

A description of the full functionality and features of the ls command can be read with info ls. More tools

To find out more about the kind of data we are dealing with, we use the file command. By applying certain tests that check properties of a file in the file system, magic numbers and language tests, file tries to make an educated guess about the format of a file. Some examples:

mike:~>file Documents/
Documents/: directory

mike:~>file high-tech-stats.pdf
high-tech-stats.pdf: PDF document, version 1.2

mike:~>file Nari-288.rm
Nari-288.rm: RealMedia file

mike:~>file bijlage10.sdw
bijlage10.sdw: Microsoft Office Document

mike:~>file logo.xcf
logo.xcf: GIMP XCF image data, version 0, 150 x 38, RGB Color

mike:~>file cv
cv.txt: ISO-8859 text

mike:~>file image.png
image.png: PNG image data, 616 x 862, 8-bit grayscale, non-interlaced

mike:~>file figure
figure: ASCII text

mike:~>file me+tux.jpg
me+tux.jpg: JPEG image data, JFIF standard 1.01, resolution (DPI),
            "28 Jun 1999", 144 x 144

mike:~>file 42.zip.gz
42.zip.gz: gzip compressed data, deflated, original filename,
         `42.zip', last modified: Thu Nov  1 23:45:39 2001, os: Unix

mike:~>file vi.gif
vi.gif: GIF image data, version 89a, 88 x 31

mike:~>file slide1
slide1: HTML document text

mike:~>file template.xls
template.xls: Microsoft Office Document

mike:~>file abook.ps
abook.ps: PostScript document text conforming at level 2.0

mike:~>file /dev/log
/dev/log: socket

mike:~>file /dev/hda
/dev/hda: block special (3/0)

The file command has a series of options, among others the -z option to look into compressed files. See info file for a detailed description. Keep in mind that the results of file are not absolute, it is only a guess. In other words, file can be tricked.

3.3.2. Creating and deleting files and directories Making a mess...

... Is not a difficult thing to do. Today almost every system is networked, so naturally files get copied from one machine to another. And especially when working in a graphical environment, creating new files is a piece of cake and is often done without the approval of the user. To illustrate the problem, here's the full content of a new user's directory, created on a standard RedHat system:

[newuser@blob user]$ ls -al
total 32
drwx------   3 user 	user        4096 Jan 16 13:32 .
drwxr-xr-x   6 root     root        4096 Jan 16 13:32 ..
-rw-r--r--   1 user 	user      24 Jan 16 13:32 .bash_logout
-rw-r--r--   1 user 	user     191 Jan 16 13:32 .bash_profile
-rw-r--r--   1 user 	user     124 Jan 16 13:32 .bashrc
drwxr-xr-x   3 user 	user    4096 Jan 16 13:32 .kde
-rw-r--r--   1 user 	user    3511 Jan 16 13:32 .screenrc
-rw-------   1 user 	user      61 Jan 16 13:32 .xauthDqztLr

On first sight, the content of a "used" home directory doesn't look that bad either:

app-defaults/ crossover/   Fvwm@     mp3/      OpenOffice.org638/
articles/     Desktop/     GNUstep/  Nautilus/ staroffice6.0/
bin/          Desktop1/    images/   nqc/      training/
brol/         desktoptest/ Machines@ ns_imap/  webstart/
C/            Documents/   mail/     nsmail/   xml/
closed/       Emacs@       Mail/     office52/ Xrootenv.0

But when all the directories and files starting with a dot are included, there are 185 items in this directory. This is because most applications have their own directories and/or files, containing user-specific settings, in the home directory of that user. Usually these files are created the first time you start an application. In some cases you will be notified when a non-existent directory needs to be created, but most of the time everything is done automatically.

Furthermore, new files are created seemingly continuously because users want to save files, keep different versions of their work, use Internet applications, and download files and attachments to their local machine. It doesn't stop. It is clear that one definitely needs a scheme to keep an overview on things.

In the next section, we will discuss our means of keeping order. We only discuss text tools available to the shell, since the graphical tools are very intuitive and have the same look and feel as the well known point-and-click MS Windows-style file managers, including graphical help functions and other features you expect from this kind of applications. The following list is an overview of the most popular file managers for GNU/Linux. Most file managers can be started from the menu of your desktop manager, or by clicking your home directory icon, or from the command line, issuing these commands:

  • nautilus: The default file manager in Gnome, the GNU desktop. Excellent documentation about working with this tool can be found at http://www.gnome.org.

  • konqueror: The file manager typically used on a KDE desktop. The handbook is at http://docs.kde.org.

  • mc: Midnight Commander, the Unix file manager after the fashion of Norton Commander. All documentation available from http://gnu.org/directory/.

These applications are certainly worth giving a try and usually impress newcomers to Linux, if only because there is such a wide variety: these are only the most popular tools for managing directories and files, and many other projects are being developed. Now let's find out about the internals and see how these graphical tools use common UNIX commands. The tools Creating directories

A way of keeping things in place is to give certain files specific default locations by creating directories and subdirectories (or folders and sub-folders if you wish). This is done with the mkdir command:

richard:~>mkdir archive

richard:~>ls -ld archive
drwxrwxrwx  2 richard richard           4096 Jan 13 14:09 archive/

Creating directories and subdirectories in one step is done using the -p option:

richard:~>cd archive

richard:~/archive>mkdir 1999 2000 2001

1999/  2000/  2001/

richard:~/archive>mkdir 2001/reports/Restaurants-Michelin/
mkdir: cannot create directory `2001/reports/Restaurants-Michelin/':
No such file or directory

richard:~/archive>mkdir -p 2001/reports/Restaurants-Michelin/

richard:~/archive>ls 2001/reports/

If the new file needs other permissions than the default file creation permissions, the new access rights can be set in one move, still using the mkdir command, see the Info pages for more. We are going to discuss access modes in the next section on File Security.

The name of a directory has to comply with the same rules as those applied on regular file names. One of the most important restrictions is that you can't have two files with the same name in one directory (but keep in mind that Linux is, like UNIX, a case sensitive operating system). There are virtually no limits on the length of a file name, but it is usually kept shorter than 80 characters, so it can fit on one line of a terminal. You can use any character you want in a file name, although it is advised to exclude characters that have a special meaning to the shell. When in doubt, check with Appendix C. Moving files

Now that we have properly structured our home directory, it is time to clean up unclassified files using the mv command:

richard:~/archive>mv ../report[1-4].doc reports/Restaurants-Michelin/

This command is also applicable when renaming files:

richard:~>ls To_Do
-rw-rw-r--    1 richard richard      5 Jan 15 12:39 To_Do

richard:~>mv To_Do done

richard:~>ls -l done
-rw-rw-r--    1 richard richard      5 Jan 15 12:39 done

It is clear that only the name of the file changes. All other properties remain the same.

Detailed information about the syntax and features of the mv command can be found in the man or Info pages. The use of this documentation should always be your first reflex when confronted with a problem. The answer to your problem is likely to be in the system documentation. Even experienced users read man pages every day, so beginning users should read them all the time. After a while, you will get to know the most common options to the common commands, but you will still need the documentation as a primary source of information. Note that the information contained in the HOWTOs, FAQs, man pages and such is slowly being merged into the Info pages, which are today the most up-to-date source of online (as in readily available on the system) documentation. Copying files

Copying files and directories is done with the cp command. A useful option is recursive copy (copy all underlying files and subdirectories), using the -R option to cp. The general syntax is

cp [-R] fromfile tofile

As an example is the case of user newguy, who wants the same Gnome desktop settings user oldguy has. One way to solve the problem is to copy the settings of oldguy to the home directory of newguy:

victor:~>cp -R ../oldguy/.gnome/ .

This gives some errors involving file permissions, but all the errors have to do with private files that newguy doesn't need anyway. We will discuss in the next part how to change these permissions in case they really are a problem. Removing files

Use the rm command to remove single files, rmdir to remove empty directories. (Use ls -a to check whether a directory is empty or not). The rm command also has options for removing non-empty directories with all their subdirectories, read the Info pages for these rather dangerous options.

NoteHow empty can a directory be?

It is normal that the directories . (dot) and .. (dot-dot) can't be removed, since they are also necessary in an empty directory to determine the directories ranking in the file system hierarchy.

On Linux, just like on UNIX, there is no garbage can - at least not for the shell, although there are plenty of solutions for graphical use. So once removed, a file is really gone, and there is generally no way to get it back unless you have backups, or you are really fast and have a real good system administrator. To protect the beginning user from this malice, the interactive behavior of the rm, cp and mv commands can be activated using the -i option. In that case the system won't immediately act upon request. Instead it will ask for confirmation, so it takes an additional click on the Enter key to inflict the damage:

mary:~>rm -ri archive/
rm: descend into directory `archive'? y
rm: descend into directory `archive/reports'? y
rm: remove directory `archive/reports'? y
rm: descend into directory `archive/backup'? y
rm: remove `archive/backup/sysbup200112.tar'? y
rm: remove directory `archive/backup'? y
rm: remove directory `archive'? y

We will discuss how to make this option the default in Chapter 7, which discusses customizing your shell environment.

3.3.3. Finding files Using shell features

In the example on moving files we already saw how the shell can manipulate multiple files at once. In that example, the shell finds out automatically what the user means by the requirements between the square braces "[" and "]". The shell can substitute ranges of numbers and upper or lower case characters alike. It also substitutes as many characters as you want with an asterisk, and only one character with a question mark.

All sorts of substitutions can be used simultaneously; the shell is very logical about it. The Bash shell, for instance, has no problem with expressions like ls dirname/*/*/*[2-3].

In other shells, the asterisk is commonly used to minimize the efforts of typing: people would enter cd dir* instead of cd directory. In Bash however, this is not necessary because the GNU shell has a feature called file name completion. It means that you can type the first few characters of a command (anywhere) or a file (in the current directory) and if no confusion is possible, the shell will find out what you mean. For example in a directory containing many files, you can check if there are any files beginning with the letter A just by typing ls Aand pressing the Tab key twice, rather than pressing Enter. If there is only one file starting with "A", this file will be shown as the argument to ls (or any shell command, for that matter) immediately. Which

A very simple way of looking up files is using the which command, to look in the directories listed in the user's search path for the required file. Of course, since the search path contains only paths to directories containing executable programs, which doesn't work for ordinary files. The which command is useful when troubleshooting "Command not Found" problems. In the example below, user tina can't use the acroread program, while her colleague has no troubles whatsoever on the same system. The problem is similar to the PATH problem in the previous part: Tina's colleague tells her that he can see the required program in /opt/acroread/bin, but this directory is not in her path:

tina:~>which acroread
/usr/bin/which: no acroread in (/bin:/usr/bin:/usr/bin/X11)

The problem can be solved by giving the full path to the command to run, or by re-exporting the content of the PATH variable:

tina:~>export PATH=$PATH:/opt/acroread/bin

tina:~>echo $PATH

Using the which command also checks to see if a command is an alias for another command:

gerrit:~>which -a ls
ls is aliased to `ls -F --color=auto'
ls is /bin/ls

gerrit:~>which -a which
which is aliased to `type'
which is /usr/bin/which

gerrit:~>which type
type is a shell builtin

This actually means that which is built-in in the shell, but that there is also a which "stand alone version". The shell built-in version precedes the which in /usr/bin, which is still there for compatibility with UNIX. Find and locate

These are the real tools, used when searching other paths beside those listed in the search path. The find tool, known from UNIX, is very powerful, which may be the cause of a somewhat more difficult syntax. GNU find, however, deals with the syntax problems. This command not only allows you to search file names, it can also accept file size, date of last change and other file properties as criteria for a search. The most common use is for finding file names:

find <path> -name <searchstring>

This can be interpreted as "Look in all files and subdirectories contained in a given path, and print the names of the files containing the search string in their name" (not in their content).

Another application of find is for searching files of a certain size, as in the example below, where user peter wants to find all files in the current directory or one of its subdirectories, that are bigger than 5 MB:

peter:~>find . -size +5000k

If you dig in the man pages, you will see that find can also perform operations on the found files. A common example is removing files. It is best to first test without the -exec option that the correct files are selected, after that the command can be rerun to delete the selected files. Below, we search for files ending in .tmp:

peter:~> find . -name "*.tmp" -exec rm {} \;


Later on (1999 according to the man pages, after 20 years of find), locate was developed. This program is easier to use, but more restricted than find, since its output is based on a file index database that is updated only once every day. On the other hand, a search in the locate database is less time- and CPU-consuming than a search with find.

Most Linux distributions use slocate these days, security enhanced locate, the modern version of locate that prevents users from getting output they have no right to read. The files in root's home directory are such an example, these are not normally accessible to the public. A user who wants to find someone who knows about the C-shell may issue the command locate .cshrc, to display all users who have a customized configuration file for the C shell. Supposing the users root and jenny are running C shell, then only the file /home/jenny/.cshrc will be displayed, and not the one in root's home directory. On most systems, locate is a symbolic link to the slocate program:

billy:~>ls -l /usr/bin/locate
lrwxrwxrwx 1 root slocate  7 Oct 28 14:18 /usr/bin/locate -> slocate*

User tina could have used locate to find the application she wanted:

tina:~>locate acroread

Directories that don't contain the name bin can't contain the program - they don't contain executable files. There are three possibilities left. The file in /usr/local/bin is the one tina would have wanted: it is a link to the shell script that starts the actual program:

tina:~>file /usr/local/bin/acroread
/usr/local/bin/acroread: symbolic link to ../Acrobat4/bin/acroread

tina:~>file /usr/local/Acrobat4/bin/acroread
/usr/local/Acrobat4/bin/acroread: Bourne shell script text executable

tina:~>file /usr/local/Acrobat4/Reader/intellinux/bin/acroread
/usr/local/Acrobat4/Reader/intellinux/bin/acroread: ELF 32-bit LSB 
executable, Intel 80386, version 1, dynamically linked (uses 
shared libs), not stripped

In order to keep the path as short as possible, so the system doesn't have to search too long every time a user wants to execute a command, we add /usr/local/bin to the path and not the other directories, which only contain the binary files of one specific program, while /usr/local/bin contains other useful programs as well.

Again, a description of the full features of find and locate can be found in the Info pages. The grep command General line filtering

A simple but powerful program, grep is used for filtering input lines and returning certain patterns to the output. There are literally thousands of applications for the grep program. In the example below, jerry uses grep to see how he did the thing with find:

jerry:~>grep -a find .bash_history
find . -name userinfo
man find
find ../ -name common.cfg

NoteSearch history

Also useful in these cases is the search function in bash, activated by pressing Ctrl+R at once, such as in the example where we want to check how we did that last find again:

thomas ~> 
(reverse-i-search)`find': find `/home/thomas` -name *.xml

Type your search string at the search prompt. The more characters you type, the more restricted the search gets. This reads the command history for this shell session (which is written to .bash_history in your home directory when you quit that session). The most recent occurrence of your search string is shown. If you want to see previous commands containing the same string, type Ctrl+R again.

See the Info pages on bash for more.

All UNIXes with just a little bit of decency have an online dictionary. So does Linux. The dictionary is a list of known words in a file named words, located in /usr/share/dict. To quickly check the correct spelling of a word, no graphical application is needed:

william:~>grep pinguin /usr/share/dict/words

william:~>grep penguin /usr/share/dict/words

Who is the owner of that home directory next to mine? Hey, there's his telephone number!

lisa:~>grep gdbruyne /etc/passwd
gdbruyne:x:981:981:Guy Debruyne, tel 203234:/home/gdbruyne:/bin/bash

And what was the E-mail address of Arno again?

serge:~/mail>grep -i arno *
sent-mail: To: <Arno.Hintjens@celeb.com>
sent-mail: On Mon, 24 Dec 2001, Arno.Hintjens@celeb.com wrote:

find and locate are often used in combination with grep to define some serious queries. For more information, see Chapter 5 on I/O redirection. Special characters

Characters that have a special meaning to the shell have to be escaped. The escape character in Bash is backslash, as in most shells; this takes away the special meaning of the following character. The shell knows about quite some special characters, among the most common /, ., ? and *. A full list can be found in the Info pages and documentation for your shell. For instance, say that you want to display the lines containing "searchstring*" (where * matches the asterisk character) instead of any lines containing the string "searchstring"* (where * matches any amount of any character), you issue the command

grep "searchstring\*" file(s)

Finding the string "e.g." in a file will report all lines containing any character in the second and forth position of the search string. If you escape the dots, you will find the occurrences of the string representing the abbreviation for "example given":

grep "e\.g\." file

More in the grep Info pages.

3.3.4. More ways to view file content General

Apart from cat, which really doesn't do much more than sending files to the standard output, there are other tools to view file content.

The easiest way of course would be to use graphical tools instead of command line tools. In the introduction we already saw a glimpse of an office application, OpenOffice. Other examples are the GIMP (start up with gimp from the command line), the GNU Image Manipulation Program; xpdf to view Portable Document Format files (PDF); GhostView (gv) for viewing PostScript files; the Mozilla Project, links (a text mode browser), Konqueror, Opera and many others for web content; XMMS, CDplay and others for multi-media file content; AbiWord, Gnumeric, KOffice etc. for all kinds of office applications and so on. There are thousands of Linux applications; to list them all would take days.

Instead we keep concentrating on shell- or text-mode applications, which form the basics for all other applications. These commands work best in a text environment on files containing text. When in doubt, check first using the file command.

So let's see what text tools we have that are useful to look inside files.

NoteFont problems

Plain text tools such as the ones we will now be discussing, often have problems with "plain" text files because of the font encoding used in those files. Special characters, such as accented alphabetical characters, Chinese characters and other characters from languages using different character sets than the default en_US encoding and so on, are then displayed the wrong way or replaced by unreadable rubbish. These problems are discussed in Section 7.5. "less is more"

Undoubtedly you will hear someone say this phrase sooner or later when working in a UNIX environment. A little bit of UNIX history explains this:

  • First there was cat. Output was streamed in an uncontrollable way.

  • Then there was pg, which may still be found on older UNIXes. This command puts text to the output one page at the time.

  • The more program was a revised version of pg. This command is still available on every Linux system.

  • less is the GNU version of more and has extra features allowing highlighting of search strings, scrolling back etc. The syntax is very simple:

    less file

    More information is located in the Info pages. Head and tail

These two commands display the n first/last lines of a file respectively. To see the last ten commands entered:

tony:~>tail -10 .bash_history 
locate configure | grep bin
man bash
xawtv &
grep usable /usr/share/dict/words 
grep advisable /usr/share/dict/words 
info quota
man quota
echo $PATH

head works similarly. The tail command has a handy feature to continuously show the last n lines of a file that changes all the time. This -f option is often used by system administrators to check on log files. More information is located in the system documentation files.

3.3.5. Linking files Link types

Since we know more about files and their representation in the file system, understanding links (or shortcuts) is a piece of cake. A link is nothing more than a way of matching two or more file names to the same set of file data. There are two ways to achieve this:

  • Hard link: Associate two or more file names with the same inode. Hard links share the same data blocks on the hard disk, while they continue to behave as independent files.

    There is an immediate disadvantage: hard links can't span partitions, because inode numbers are only unique within a given partition.

  • Soft link or symbolic link (or for short: symlink): a small file that is a pointer to another file. A symbolic link contains the path to the target file instead of a physical location on the hard disk. Since inodes are not used in this system, soft links can span across partitions.

The two link types behave similar, but are not the same, as illustrated in the scheme below:

Figure 3-2. Hard and soft link mechanism

Note that removing the target file for a symbolic link makes the link useless.

Each regular file is principally a hardlink. Hardlinks can not span across partitions, since they refer to inodes, and inode numbers are only unique within a given partition.

It may be argued that there is a third kind of link, the user-space link, which is similar to a shortcut in MS Windows. These are files containing meta-data which can only be interpreted by the graphical file manager. To the kernel and the shell these are just normal files. They may end in a .desktop or .lnk suffix; an example can be found in ~/.gnome-desktop:

[dupont@boulot .gnome-desktop]$ cat La\ Maison\ Dupont
[Desktop Entry]
Name=La Maison Dupont

This example is from a KDE desktop:

[lena@venus Desktop]$ cat camera
[Desktop Entry]

Creating this kind of link is easy enough using the features of your graphical environment. Should you need help, your system documentation should be your first resort.

In the next section, we will study the creation of UNIX-style symbolic links using the command line. Creating symbolic links

The symbolic link is particularly interesting for beginning users: they are fairly obvious to see and you don't need to worry about partitions.

The command to make links is ln. In order to create symlinks, you need to use the -s option:

ln -s targetfile linkname

In the example below, user freddy creates a link in a subdirectory of his home directory to a directory on another part of the system:

freddy:~/music>ln -s /opt/mp3/Queen/ Queen

freddy:~/music>ls -l
lrwxrwxrwx  1 freddy  freddy  17 Jan 22 11:07 Queen -> /opt/mp3/Queen

Symbolic links are always very small files, while hard links have the same size as the original file.

The application of symbolic links is widespread. They are often used to save disk space, to make a copy of a file in order to satisfy installation requirements of a new program that expects the file to be in another location, they are used to fix scripts that suddenly have to run in a new environment and can generally save a lot of work. A system admin may decide to move the home directories of the users to a new location, disk2 for instance, but if he wants everything to work like before, like the /etc/passwd file, with a minimum of effort he will create a symlink from /home to the new location /disk2/home.