5.3. Summary

In this chapter, we've briefly examined how Perl turns Perl source input into a tree data structure suitable for executing; in the next chapter, we'll look more specifically at the nature of the nodes in that tree.

There are two stages to this operation: the tokeniser, toke.c, chops up the incoming program and recognises different token types; the parser perly.y then assembles these tokens into phrases and sentences. In reality, the whole task is driver by the parser - Perl calls yyparse to parse a program, and when the parser needs to know about the next token, it calls yylex.

While the parser is relatively straightforward, the tokeniser is somewhat more tricky. The key to understanding it is to divide its operation into checking tokeniser state, dealing with non-alphanumeric symbols in ordinary program code, dealing with alphanumerics, and dealing with double-quoted strings and other interpolation contexts.

Very few people actually understand the whole of how the tokeniser and parser work, but this chapter should have given you a useful insight into how Perl understands program code, and how to locate the source of particular behaviour inside the parsing system.