RSS

Monthly Archives: October 2014

Software Language Engineering: The Abstract Syntax Tree (Early Edition)

What is an Abstract Syntax Tree?

(Let me preface this by saying that syntax trees are hardly the only way to write a translator; but they’re broadly used because of their simplicity, versatility, and flexibility.)

“Abstract” as in abstraction. An abstract syntax tree (henceforth AST) is the manner in which we will be encoding every command of our source and object programs. Think of it as an internal representation of the semantics, derived from the original syntax, and encodable in the final syntax.

The biggest trick with an AST is to remember not to confuse it for its representation. Syntax and semantics are elemental, in the old-school black-magic Aleister Crowley sense. You can’t even talk about one without comparing it to the other. This is, in my personal analysis, where a lot of the works-like-magic notion about compilers comes from; but there’s nothing magical about it.

To understand this, think about what happens when you’re speaking with another human being. The art of communication is the act of transference of a thought from one mind to another, through some mutually external medium; be it speech, or writing, or hand gestures. When someone begins to speak, and you begin to listen, your brain (operationally) initially considers every single possible thing that the speaker is about to say. As they pronounce the phonemes and syllables and words of their sentence, the possibilities narrow, and different semantic clusters in your mental network go dark. Once they’ve finished their statement, given that it was well-formed, you have a reasonably precise notion of what it was that was in their mind, which they are trying to convey to you. You can duplicate it, within reason. If it was ill-formed, you might only have a vague, or even a holistically incorrect, notion.

This is the concept of an AST; the AST is the thought itself. It is a (reasonable) binary representation of the program data. I used to think of ASTs as being something like a sentence diagram; but this isn’t really the case, and it in fact held me back for some time. My script grammars were limited to only mild variations of the structure of the language they were written in. ASTs are not grammatical structures; they are more implicit than that. This is the greatest reason why you must not confuse what an AST is for what an AST might look like; it’s the difference between using it like one would a first language, and a second language.

I’m assuming that you have already read the chapter on building an efficient Scanner. I’m also encouraging you to keep the Glossary open in another tab, as I may accidentally jump ahead in my definitions in an early edition of this piece. (Promise, I’ll review it with a giant whiteboard and lots of draft-mechanics before I declare the series finished; you are always free to scream at me about it in the comments.)

So, as an example, let’s consider basic English. Actually, let’s consider Bad English, because I’m not willing to jot down every single rule for a language with a lexicon three times larger than any other western language. Bad English will have a limited set of rules, and will generate a set of only a few basic sentences. For further simplicity, they will all be declarative, much like a programming language.

The first step, before you even begin to piece together an AST, is to write down the rules of your language in a formal specification; that is, a specification with a distinct set of rules to its grammar, which often must be known well by the reader, and allows for little chance of misinterpretation. An informal specification is usually presented in natural language, like this article; it is understood by a much wider audience, but is subject to considerable misinterpretation in contrast.

I recommend Backus-Naur Form (BNF). BNF has a number of translators written for it, often involving meta-programming which generates usable source code. I won’t be using any here, but they do exist. The chief notion for writing it is for you, particularly as you are creating a new language.

Primary BNF looks something like this.

Sentence ::= Subject Verb Object .
Subject  ::= I
Subject  ::= a Noun
Subject  ::= the Noun
Object   ::= me
Object   ::= you
Object   ::= a Noun
Object   ::= the Noun
Noun     ::= dog
Noun     ::= stick 
Noun     ::= hog
Verb     ::= ate
Verb     ::= caught
Verb     ::= bought

Before I go further, I should discuss the concept of a terminal, and nonterminal, symbol. All of the words in the above definition are some form of symbol; a terminal symbol can be parsed no farther. A nonterminal symbol can be broken down.  Any program syntax consists of a finite set of nonterminal symbols, and a finite set of terminal symbols. In the example above, “Sentence” is a nonterminal symbol, as it can be inspected to retrieve a Subject, a Verb, and an Object. “I” and “dog” are terminal symbols, as they cannot be broken down further.

Note that the BNF structure isn’t ideal yet. The “::=” is pronounced “may consist of”, and means as much. A sentence is a subject, followed by a verb, followed by an object, followed by a terminal period. (Subject-verb without object is a syntax error in Bad English.) It is thus represented as “Sentence := Subject Verb Object .” So what is a Subject? A subject may consist of “I”, may consist of “a Noun”, and may consist of “the Noun”. A Noun may consist of “dog”, “stick”, or “hog”.

A simple AST would be a Sentence tree, referencing a Subject tree, a Verb tree, and an Object tree, with an implicit period. You might already see how this could be parsed to create a sequence of generated functions, regular expressions, and switch-case statements, which would work as a parser and produce a Sentence tree. But remember how I said that the BNF was not complete? While grammatically correct and certainly easier to form, the biggest issue with making this deterministic is the repeated definitions.

You need to take something like the four definitions for “Object” and condense them into one. In BNF syntax, this is usually done with a vertical bar “|”, meaning “or”. Condensing the nonterminal definitions, we have the much more palatable:

Sentence ::= Subject Verb Noun .
Subject  ::= a Noun | the Noun | I
Object   ::= me | you | a Noun | the Noun
Noun     ::= dog | stick | hog
Verb     ::= ate | caught | bought

Much nicer, yes? But still not ideal. BNF has been around for a while; it actually goes back as far as ALGOL 58, in 1959. He was looking at Noam Chomsky’s work with context-free grammars, The first (arguable) regular expressions were actually developed in 1956 by Stephen Keene, but they were pretty primitive. Much of the format of BNF served as a major influence on them.

The above works fine for such a simple language as Bad English, as read by a human being with our own translating and interpreting internal hardware. For a program, the simpler the better, remembering that the most important factor going into BNF is preservation of syntax. Consider this instead:

Sentence ::= Subject Verb Object .
Subject  ::= (a | the) Noun | I
Object   ::= (a | the) Noun | me | you
Noun     ::= (d|h)og | stick
Verb     ::= ate | (ca|bo)ught

This is perfectly valid, and even more concise. Remove the whitespace, and any one of them can easily be condensed into a working regular expression. More importantly, redundancies become very easy to catch on implementation.

So How Do I Use It?

When you are constructing a language compiler or interpreter, especially from the ground up, always write these out first. Often, it will be exclusively for your own reference. Everything you’re about to do with the Parser module is encapsulated in that BNF-formatted data.

I have an abbreviated format, which I usually use, which basically has the type name, “::=”, and a regular expression for the sequence of elements that this type contains. There are ups and downs to using regular expressions directly; the biggest issue that I run into is that single-member may-consist-ofs (such as, “Identifier ::= Numeric | Nominal” can easily be implemented by simply having the classes for Numeric and Nominal extend a class for Identifier. Then it’s just a question of «expected class».isAssignableFrom(«provided class»). Ultimately, if you can comfortably implement it, then yes, you can do it. I’ll be showing you another way, which I feel is often a cleaner implementation.

Does This Have Something to Do With Your Caveat On the Scanner?

Why yes. Yes it does.

It’s much more convenient for a Parser to simply handle ASTs. Having it handle items which may be either ASTs, or Strings, is extremely difficult in Java. Give me one more chapter to explain the details of “Terminal” and “NonTerminal” ASTs, and then I’ll show you what needs to happen to your Scanner.

I promise it isn’t big.

[To Be Completed]

Advertisements
 
 

Tags: , , , , , , , , , , , , ,

Software Language Engineering: Acknowledgements

Acknowledgements

I have a lot of people to thank for my knowledge of translator construction, which is always growing, especially lately. My chief interest in putting this tutorial together was not simply helping out other curious minds (as important as that is), but also solidifying my own understanding of the process. As Albert Einstein once said, “You do not really understand something until you can explain it to your grandmother”.

My first translators, though I may not have called them that at the time, were built maybe fifteen years ago, when I first attempted to add executable data to a personal file format with a newly constructed grammar. I made a lot of terrible mistakes, and I learned from them. (You will too.) What is the nature of human inquisition and curiosity if not adaptation? Writing translator engines is a notoriously nonlinear and irrational problem (technical term!),

For starters, there is an excellent manual on the subject, first published in the year 2000 by Pearson Education Limited. It’s called “Programming Language Processors in Java“, by David A. Watt and Deryck F. Brown. Java was very different back then, and I spot new things in the example code which I can optimize all the time; but the core material is perfectly communicated. It’s in the realm of a hundred dollars, but if you’re even knee deep in this, I recommend it. (Tragically it is not yet available via Safari Books.)

Additionally, via MIT’s OpenCourseWare, you have an entire course on the subject. It appears to be partially complete; the videos can be found under “Lecture Notes”. I’m still familiarizing myself with the professors, Prof. Martin Rinard and Prof. Saman Amarasinghe, but they have been very helpful. Like any online course, you should make it a point to find extracurricular work in the subject.

Finally, my general education on the subjects of linguistics, psycholinguistics, and neurolinguistics have provided me with a grasp of that nature of language that goes far beyond natural language. While I must put emphasis on the more directly related text above, there are many great books on these subjects, by equivalently great teachers, which are very helpful alongside them. Joseph O’Connor and John Seymour have played a large part into my comprehension of the human end of any language, and believe me, it is important to you.

I have another book in front of me, as yet unopened, on modern compiler design; but I won’t be mentioning it until I actually use it. That wouldn’t be particularly fair. I’ve also received some help, at least in the form of my own constructive criticism, from the work of Andrew Davison, all of the unnamed writers and editors of Oracle’s JavaScript in Java guides, and a world of programmers who came before me. I have a lot to thank each of them for.

I believe that my work is sufficiently original beyond the bindings of actually working; but should any of them determine that my code is too close to theirs, or that I am in violation of a license of some kind, please contact me so that I may make amends. The future of computing lies on the backs of the teachers as much as it does the engineers.

 

Tags: , , , , , , , , , , , , ,

Software Language Engineering: Glossary

Glossary

For the sake of my brain, let’s start with some technical definitions.

For most of you, the “compiler” is the “deific thing” that turns your “words” into “magic computer stuff”. Come on, admit it. I’m about to explain and change all of that with a few short definitions.

Languages

Hex Edit: v. To change the nature of a stretch of binary code by altering its values, typically through the usage of hexadecimal numbers. Total neck pain to do.

Assembly Language: n. The first step away from hex editing of machine code, typically (but not always!) with a one-to-one correspondence between command and processor instruction.

Low-Level Language: n. An assembly language or direct machine code; basically what a processor understands or something very close to it.

High-Level Language: n. An abstraction from assembly language to something more human-directed; such as C, Pascal, Java, Python, and most of what you are familiar with.

Build Tool Types

Translator: n. A program capable of representing the semantic of one language in another. (Frequently used.)

Source Program: n. The program being input into a translator.

Object Program: n. The program output by a translator.

High-Level Translator: n. A translator which translates between representations in one high-level language, and another high-level language. As an example, Java to C.

Assembler: n. Translator with a source language of assembly and a target language of the machine code for a specific processor type, like x86, amd64, PPC, or SPARC.

Disassembler: n. Translator with a source language of machine code, and a target language of assembly.

Compiler: n. (Important!) Translator the converts a high-level language into a low-level language. Most typically (but not always) the high-level language is converted into an assembly code, possibly through a multi-stage process; the assembly is then converted into machine code with an assembler.

Code Generator: n. The late phase of a translator which generates code in the target language.

Implementation Language: n. The language in which the compiler itself is built.

Cross-Compiler: n. A compiler which produces machine code for one architecture, but is implemented and run on another.

Host Machine: n. The machine that a translator is run on.

Target Machine: n. The machine that a compiler’s object code is to be run on.

Bootstrap: v. To implement a translator in the translator’s target language.

Full Bootstrap: v. To compile a new edition of a compiler through nothing but its previous version, beginning with an initial version that is a subset of the language, no larger than is necessary to compile. (People actually do this. It’s pretty amazing.)

Half Bootstrap: v. To compile an edition of a compiler for a separate target platform from the implementation platform, requiring a two-phase process of compiling the translator with an altered target machine implementation language, and then recompiling the original source code through the new edition; generating an implementation of the compiler that runs on the target platform, for the target platform. (Actually even more awesome. Similar methods are used with full bootstrapping to improve the quality of the compiler’s object program.)

Interpreter: n. A program that accepts any program (the source program), expressed in a specific language (the source language), and runs that program immediately, without outputting any translation of it. (This does not mean that no translator is implemented internally!)

Just-In-Time Compiler: n. A compiler designed to translate a source program into an object program just as it is loaded into memory, skipping around the usual penalties for interpretive compilers. (Earliest usage in Java was arguably 1993; but its earliest identifiable use was actually in Lisp in 1960.)

Emulator: n. An interpreter designed to run machine code with one target implementation, on another.

Abstract Machine: n. A theoretical model of a machine that exists, on the current platform, only as an emulator.Virtual Machine: n. The software interpreter that provides the functionality of an abstract machine. (Technically.)

Real Machine: n. A concrete and tangible machine. Well, not literally concrete, but I’m sure you get the idea.

Language Design

Portability: n. The percentage of code that does not need to change when the underlying architecture changes.

Code: n. An instance of meaning expressed in a specific language

Token: n. A distinct chunk of text, which generally carries meaning.

Expression: n. rule for computing values, specified in code

Data Type: n. Class of program-manipulable data

Control Structure: n. Code which depends on the status of a memory address to determine the actions to perform

Declaration: n. Code which produces an identifier for specific values

Abstraction: n. Mental tool for separating notions of “what” from notions of “how”

Encapsulation: n. Grouping of related declarations, and selective hiding of them.

Data Abstraction: See “Encapsulation”

Syntax: Literal format of a language. Specifically, which tokens are used, and how phrases are composed from tokens and subphrases.

Phrase: n. Command, expression, declaration, or complete program

Contextual Constraint: n. Bound on the nature of a well-formed expression.

Static Semantic: See “Contextual Constraint”

Scope Rule: n. Boundary on the influence of a declaration

Type Rule: n. Boundary allowing for the inference of the type of provided data

Semantics: n. Notion of the meaning of a program

Denotational Semantic: n. Mathematical mapping of a program’s inputs to its outputs

Operational Semantics: n. Behavior of a program as it is run on a machine

Formal Specification: n. Precise notation on the structure of a language. Often specialized.

Informal Specification: n. Imprecise, but more commonly readable, notation on the structure of a language. More often misinterpreted than Formal Specification.

Backus-Naur Form (BNF): n. Specific format for declaring the valid syntax order for a language, explained in detail here.

 
 

Tags: , , , , , , ,

1: Other Powers

Other Powers

    The café was lit by a warm and sacred sun, the heat seeping into Marc’s skin as the coffee seeped into his heart and bones. He sat across from a man with a golden goatee, dark eyes, and a needle-sharp smile. The occasional glances of the wait staff and the patrons around them always slipped right around his odd company, seeing him, so rarely perceiving him, instead letting their gazes wrap around the scar-like smile lines and glowering confidence of the most treacherous priest in all the world.

“You and I aren’t all that different, you know,” came the serpentine words of his companion. He struggled to keep his espresso steady, through a tremmoring hand. As slowly as he could, he set the ceramic ounce glass down on its plate, struggling to steady it with his other arm. “Strange as it may be for me to say this.”

Over the overture of finely ground coffee and cinnamon and biscotti was the smell of stagnant rain puddles on scorching hot asphalt, the faintest hint of exhaust, and a tinge of hot rubber. A thousand sneakered people must have passed on the sidewalk that morning, the lifeblood of urbania, each destined to a different faceless concrete tower. Deferred raindrops slid slowly down the canopy overhead, dripping to the ground below

“I’m sure we have plenty of differences,” said Marc.

“But not enough,” said the man, before he could continue. “You are a turncoat. You’re a back-stabber, in the eyes of many of the gods. You will help them one moment, and scold them the next. You will risk your life to bring them closer to their goal, then cripple them before they get there. You would do the same to me, were the moment to arrive.”

“No,” said Marc. “Any shaman would. The only difference is that I have the knowledge to go through with it, and the dignity to be open about it. Also, the talent to receive such opportunities. But I’ll give you this, I do have intentions of my own. I have goals of my own, representations, things I hold to be sacred. I don’t keep them secret, either.” He stared down into the inky blackness of his coffee cup, pondering how to continue.

“If only anyone ever listened,” said the man. “But then if they listened, you might not have such a reputation. These ‘opportunities’ would never come about. These crimes, well, they would never be committed. They would never need to be committed.”

Marc smiled, and nodded to the man, taking another sip. “Perhaps I’ve underestimated you, just a little. Maybe, at least socially, we do have a thing or two in common. But knowing that, I wonder how much I would trust you with. How much I would tell you about myself, I mean.”

“We are outcasts, you and I. The only difference is your mortal blood. Your,” he said, pausing, tasting his words before speaking them. “Forgetability.”

“Oh, I imagine I’ll be thought of long after I’m dead,” said Marc. “Just, you know, by other names.”

“Death. Reminds me. You know I’ve been responsible for that more than once now.”

“Yeah, you’ve actually done that a lot,” said Marc.

“Most famously Baldr.” The man smiled again, toothily, the kind of facade of happiness one wears when inside, he is being torn apart by wolves and lions; the smile of despondence. “Of course I didn’t kill him directly. Höðr did that, kind, jovial, and drunk as he was. Also blind.”

“I’m sure that helped.” Marc gave him a look somewhere between cynicism and aloofness.

“But contrary to popular, and rather arrogant, belief, I never imagined that I would get away with it. Not once. One does not orchestrate the assassination of a loved god, and then just live it down. I gave them a run, but I was caught, and tortured for a very, very, long time.”

“So the stories go.”

“Also in opposition to popular belief, I didn’t do it for myself. I did it to maintain a sacred balance in the universe, a balance that these other half-wits were aching for the chance to defy and destroy. They tried to make Baldr immortal. Truly immortal. Without opposition. And believe me, before you accuse me of overreacting, that immortality and immutability would become a punishment for him that was much greater than anything I dealt out.

“See, that’s what we have in common,” said the old god. “We’re both turncoats and back-stabbers. We are both opportunists, even war profiteers. We are rust, and ash, and all of the wearing things. We are the stress that makes metals break, and the water that bleeds the words from the page. We are the slip of the tongue and the birth of the rumor.

“That’s the only thing we need to have in common. And do you know why this is? It’s because we know how the world works, we have a rough idea where it came from and where it has to go. You hurt them out of love, not hate; and then you just let it go.”

Marc choked back a laugh. “You aren’t trying to recruit me, are you? Because for a moment there, that’s what it sounded like. It sounded like you were trying to get me to swear allegiance to you.”

The old god smiled back, the one reflexive expression he seemed to have left. His fingers trembled again, his breath quick and heavy; he struggled to still it. He was mildly offended, but let it go. “No, you fool. Those that take oaths are never true followers, they are bound by a few words, maybe a burn or a little blood if they want to look tough, nothing more. You follow your own path, that’s what makes you so beautifully dangerous, and frankly, it’s a path I can get along with. But in the end, after you cross them,”

“—if I cross them,” Marc interjected.

“After you cross them, they will catch you. And maybe they won’t be able to take your life, or your memory, or your legend, or whatever; but they can be surprisingly petty. And they will find what means the most to you and take that. For such petty people, they can be very patient and persistent in doing it.” The god lifted his espresso once more, with a visibly trembling hand, and watched the disturbed rippling of the syrupy liquid within it. “They took more from me than I ever thought that they could.”

Marc observed the symptoms of his anxiety, and made an effort to pay his words at least a little attention. “So that’s the point of this meeting, then? I’ll try and remember it.”

“Do that. But no. Sometimes I just want a little company. I was locked up for a very long time.”

“Well, Loki, feel free to call me up. When you’re feeling sociable.”

“If I’m feeling sociable,” said Loki. His lips twitched, the kind of honest smile that one tries so desperately to hide. He waved his hand to one of the wait staff, slipping like liquid back into the world of the perceived for just a moment, then dropped it back to the table. “Check please?”

 
Leave a comment

Posted by on October 23, 2014 in Fiction, Stygian Quartet, The Magus

 

Tags: , , , , , , , , , , , , , , ,

Software Language Engineering: Intro and the Scanner

(Early Edition)

[My project has evolved a little since I started this. Allow me to make a few modifications for clarity.]

Introduction

So, back to Software Language Engineering. Well, not really; I literally just wrote the preface, and I’m a little burnt out. However, I think it’s only fair that I walk as much as I talk, so allow me to begin this by explaining the core concept of a scanner, and providing you with a little example code. This entry will be fair game for revision for a very long time; but that’s for the sake of humility, not inaccuracy. I might even separate it into two distinct parts; one on theory, the other on implementation. If anyone’s voting for that discretion, please mention it in the comments. Also note that I am reserving the right to back-propagate any major change I make to any part of this program later on; thus even after completion this article may never be static. However, I do promise to compile and run the code myself before putting it up. You may quite liberally scream at me in the comments if I still muck something up.

The Scanner

The notion of a build tool is to take code, often but not always human-written, and convert it to a set of native software structures that can be understood and executed by a CPU. The two initial phases of that, one of which I will be addressing here, are scanning, and parsing. Scanning is the earliest phase, an operation which is far too easy to accidentally amalgamate with parsing. Scanning is separating the code into tokens with atomic meaning to the language. They are not organized or checked for semantic sensibility, they are simply separated and cleaned up a little.

For the human end, it’s often easier to allow as much whitespace as we care to use, wherever we should care to put it. It’s easy to do, it’s streamlined, and it isn’t hard to read. For the later stages of a typical build tool, unless otherwise specified, whitespace is noise. It is syntax without semantic, ignorable data that serves no purpose. It stops at the scanner. Likewise, a build tool is not typically concerned with the content of a structure like a text string. It is concerned with where it begins, and where it ends; its literal contents are ultimately irrelevant.

Additionally, comments are treated much like whitespace. Whether your comment character is a double forward slash “//” or a hash sign “#”, you don’t want your program worrying about what’s between it and the end of the line. These are also responsibilities of the scanner. If you’re smart, you’ll be setting up abstract classes to contain these. Most languages (every one of them that I can think of off of the top of my head) can be interpreted in this manner, so you are likely to use them many times over.

We’re going to begin with an example builder for basic algebra, with the end goal of simplifying the equation. I suggest that you create four packages in your project, as I have: oberlin.builder, oberlin.builder.scanner, oberlin.algebra.builder, and oberlin.algebra.builder.scanner. (My own code, actively under development, can be found on Github, but please note that I have pushed this to Github only sporadically of late; given the nature of the tutorial, I am more concerned with pushing working code to it than whatever I’m experimenting with.) In builder, create a launch class called Main. This will be your testing facility.

All code © Michael Eric Oberlin, October 11th 2014; and subject to the terms of the GNU General Public License Agreement.

package oberlin.builder;

/*
 * © Michael Eric Oberlin, October 11, 2014
 * 
 * This program is free for educational usage, and free to be modified in any manner; however
 * its creator cannot be held responsible for any damage done to end-users' computers from it
 * or any alteration of it.
 */
import java.io.*;
import java.util.*;

import oberlin.algebra.builder.*;
import oberlin.algebra.builder.scanner.*;

public class Main {
    
//    private Builder builder = new NullaryBuilder();
    private Builder builder = new AlgebraicBuilder();
    
    public static void main(String...args) throws IOException, BuilderException {
        new Main();
    }
    
    public Main() throws IOException, BuilderException {
        try(BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
                BufferedWriter out = new BufferedWriter(new OutputStreamWriter(System.out));) {
            
            out.write("Type \"!exit\" to exit the program\n");
            String input = "";
            Object built = null;
            while(!input.trim().toLowerCase().equals("!exit")) {
                if(input.length() != 0) {
                    built = builder.build(input);
                    
                    out.write("Builds to: " + built);
                    out.newLine();
                    out.flush();
                }
                
                out.write("Enter text: ");
                out.flush();
                input = in.readLine();
            }
        }
    }
}

The notion here is to have the least likely token that I can think of, bang-exit, be used to exit the loop. You could remove that tag, if you wanted to rely on Control-C or Eclipse’s red square to exit; but generally it’s safe to assume that not every end user has, or knows, an easy way out of an arbitrary program.

I am using BufferedReader and BufferedWriter because they’re easier to work with than System.in and System.out directly; and in all but the most desperate debugging situations, System.out.print* is fairly deprecated code—a one-line anti-pattern, really. It’s 2014, people! Learn to use loggers!

You may also note that you have two Scanner objects declared, one of which is commented out. I strongly recommend creating a Nullary* version of every interface, in place of using null. Null is more a natural state than an idea, and the inclination to use it to express any other meaning tends to lead to predicaments. (In Java 8, if you must use null, or contend with old code that uses null, consider wrapping each instance in Optionals.)

So, on running this program, you are continually presented with an inquiry for a line of text. Our build target is that line. On entry, it is sent to a Builder, which returns an object. The program reports that object, and you’re good to go for another line entry. Generally this structure is programmer-directed, but it isn’t a bad basis for something along the lines of an inline interpreter, such as the one Python uses. All of our experiments will stem from this program.

So, what is Builder?

Builder is a class that encapsulates the build process in entirety. It will reference the scanner, the parser, and every other major component. For end users, it is a “black box”; something identified entirely by its function. The programs “javac”, “rhino”, and “g++” could all stand in as a Builder. We’ll begin with an abstraction of it, like so.

package oberlin.builder;

import java.util.List;
import java.util.logging.*;

import javax.naming.OperationNotSupportedException;

import oberlin.algebra.builder.nodes.Program;
import oberlin.builder.scanner.*;
import oberlin.builder.parser.*;
import oberlin.builder.parser.ast.*;

public abstract class Builder {
    private Logger logger = Logger.getLogger("Builder");
    private Scanner<?> scanner = new NullaryScanner();
    private Parser<?> parser;
    
    public Object build(String code) throws BuilderException {
        List<AST> tokens = (List<AST>) scanner.apply(new Terminal(code, new SourcePosition()));
        
        parser = createParser(tokens);
        
        //We now have a list of Terminals, ready to be constructed
        //into an AST via a Parser.
        
        //NOTE: This is technically an exclusive part of the algebra
        //builder. It should be abstracted out.
        AST program = parser.parse(Program.class);
        
        //DEBUG
        analyzeTree(program);
        
        //TODO: Generate Object Program with code generator
        //TODO: Return Object Program
        
        //TODO: At this point, objectProgram should be our completed and
        //runnable binary. The following code is more for
        //embedded testing than anything else, and should be removed 
        //before release.
        
        StringBuilder szBuilder = new StringBuilder();
        for(AST token : tokens) {
            szBuilder.append(token).append("; ");
        }
        return szBuilder.toString();
        
    }
    
    //TODO: MICK: Remove redundancies before release.
    public AST getParseTree(String code) {
        List<AST> tokens = (List<AST>) scanner.apply(
            new Terminal(code, new SourcePosition()));
        
        parser = createParser(tokens);
        
        //We now have a list of Terminals, ready to be constructed
        //into an AST via a Parser.

        //NOTE: This is technically an exclusive part of the algebra
        //builder. It should be abstracted out.
        AST program = parser.parse(Program.class);
        
        return program;
    }
    
    public Parser<?> createParser(List<AST> tokens) {
        return new NullaryParser(tokens);
    }

    public Scanner<?> getScanner() {
        return scanner;
    }

    public void setScanner(Scanner<?> scanner) {
        this.scanner = scanner;
    }

    protected Parser<?> getParser() {
        return parser;
    }

    protected void setParser(Parser<?> parser) {
        this.parser = parser;
    }
    
    //DEBUG
    private void analyzeTree(AST ast) {
        try {
            System.out.println(ast.getContainedNodeNames());
            
            for(AST node : ast.getContainedNodes()) {
                analyzeTree(node);
            }
        } catch(OperationNotSupportedException ex) {
            System.out.println(ex.getMessage());
        }
    }
    
}

[I’m sure you see the skips and bumps characteristic of a work in progress.]

Calling the build(…) method will in turn run each component, beginning with the implemented scanner, and derive a program object from the code. So, given the rough outline of a scanner’s function, this is my recommendation for your Scanner class. A scanner’s job is not to interpret code. Keep that in the back of your mind. It is, succinctly, to divide it into a sequence of specialized tokens usually called “lexemes”. Its job is also not to determine the meaning of those lexemes, that task belongs to a parser.

Confession

One of the first complete textbooks I ever read, on the direct subject of building a build tool, was “Programming Language Processors in Java — Compilers and Interpreters”, by David A. Watt and Deryck F. Brown, Prentice Hall. It’s an excellent introduction to the subject, and I had some familiarity before I first picked it up; but it was written in 2000.

That was back in the Java 2 days. Those were dark times. A lot of wonderful and exciting things have happened in the past ten years or so, and a decent part of my compulsion to write this tutorial is to take my old methods and, well, refurbish them for the modern era. My initial draft of this code had some embarrassing parts. It scanned not only for comments and valid tokens, but for delimiters (separately), whitespace (in spite of its identical treatment to a comment), and a lot of other nonsense.

Please forgive any relic of a less experienced time that slips into my new material; Java 2 was before we even had enumerations, let alone generics, and if you were going to pick up any bad habits, it was a great time for it. I actually wrote a haiku about that. It was a long time ago, and the longer the better. Java 8, or even Java 6, is much better suited to writing such material.

End Confession

Scanners aren’t ultimately that different from Parsers, so if I wanted to, I could jump to Parser, have it back reference a specialization of itself, and write out the rules in Backus-Naur Form. But if I did that, you guys would be really mad at me and no better off, so let’s go through it logically.

Every language differs by grammar, and Scanner’s interest in that grammar is key spellings for the language lexicon. The best way to represent these spellings is through an enumeration of self-contained checks against a regular expression. It is also possible to store them in a map, but I find that to be extraneous here.

Let’s start with the outline of the abstracts.

package oberlin.builder.scanner;

import java.util.*;
import java.util.function.*;
import java.util.logging.*;

import oberlin.builder.*;
import oberlin.builder.parser.ast.*;

/**
 * New design for scanners. Requires an enumeration of type
 * TerminalSpelling to iterate over, and a code string.
 * Returns a list of tokens.
 * 
 * @author © Michael Eric Oberlin Nov 2, 2014
 *
 */
public interface Scanner<E extends Enum<E> & TerminalSpelling>
    extends Function<AST, List<AST>> {
    @Override
    public default List<AST> apply(AST code) {
        Logger logger = Logger.getLogger("Scanner");
        
        //Start with a list of tokens, with the singular pre-token of the bulk of code
        List<AST> tokens = new LinkedList<>();
        tokens.add(code);
        
        //For each item found in the code, parse it, and replace
        //the contents of tokens with the parsed contents
        for(int index = 0; index < tokens.size(); index++) {
            //maintain for check
            AST origToken = tokens.get(index);
            
            List<AST> newTokens = new ArrayList<>();
            for(TerminalSpelling terminalSpelling : getSpelling().getEnumConstants()) {
                
                try {
                    newTokens.addAll(terminalSpelling.matchToken(tokens.get(index)));
                    
                    replaceItemWithCollection(tokens, index, newTokens);
                    break;
                } catch (MismatchException e) {
                    //didn't match, so continue to the next item
                    continue;
                }
            }
            
            //Final defensive check: if one token was received, and one token provided, and
            //the old is not the same as the new, then the only thing that happened was the
            //removal of irrelevant data. Thus, index should be decremented so that this new
            //item may be scanned again.
            if(newTokens.size() == 1 && !newTokens.get(0).equals(origToken)) index--;
        }
        
        //This algorithm will always terminate the token list with an empty token, the one item
        //which cannot have semantic value. So, remove it here.
        tokens.remove(tokens.size() - 1);
        
        return tokens;
    }
    
    /**
     * Internally used method which replaces an entry in a
     * provided list with a collection.
     * 
     * @param list list to be altered
     * @param entry numeric index of replaced entry
     * @param replacement item to insert in place of current data
     */
    static <E> void replaceItemWithCollection(List<E> list,
        int entry, Collection<E> replacement) {
        list.remove(entry);
        list.addAll(entry, replacement);
    }
    
    /**
     * 
     * @return TerminalSpelling allocated to token recognition.
     */
    public Class<E> getSpelling();
}
package oberlin.builder;

import java.util.LinkedList;
import java.util.List;
import java.util.regex.*;

import oberlin.builder.parser.SourcePosition;
import oberlin.builder.parser.ast.*;

public interface TerminalSpelling {
    
    public default List<AST> matchToken(AST ast) throws MismatchException {
        List<AST> returnable = new LinkedList<>();
        
        Pattern pattern = getPattern();
        Matcher matcher = pattern.matcher(ast.toString());
        if(matcher.find()) {
            returnable.addAll(manageToken(matcher, ast.getPosition()));
            returnable.add(new Terminal(ast.toString().substring(matcher.end()), ast.getPosition()));
            return returnable;
        } else throw new MismatchException("String \"" + ast + "\" does not match grammar pattern + \"" + pattern + "\"");
    }
    
    public Pattern getPattern();
    
    /**
     * Determines, often through enumeration self-inspection, what should be done with a passed token.
     * 
     * @param token the original token removed from the complete String by matchToken
     * @return appropriate value given the circumstances and implementation of TerminalSpelling.
     */
    public List<AST> manageToken(Matcher matcher, SourcePosition position);
}

There is one item here that I might spend a few lines of English on. In Scanner, there is one chunk that you might not recognize, unless you’re some kind of an enumerations ninja, or an experimenter like me.

for(TerminalSpelling terminalSpelling : getSpelling().getEnumConstants())

The important detail is to see to it that for every implementation, getSpelling() always returns the class of the enumeration. This class has a method, getEnumConstants(), which returns a list of all members of the enumeration (provided that the class is an enumeration). Therefore, this line allows for the paging through of every single type of TerminalSpelling in the enumeration, one by one, via external iteration.

So what is TerminalSpelling in detail? Each spelling is an example of a valid or ignorable token, and how it should look. In a properly formed program in your target language, every single chunk, regardless of whether it ends up in a token or not, should match these TerminalSpelling members. Scanner is easy enough, all you really need to do to override it (in this instance, at least) is set a spelling rule for it to follow. Your language might have a self-modification feature, so making it final may not be a good idea, but the default will work for most languages. As for TerminalSpelling, that’s where the interesting stuff happens. Take, as an example, our implementation for algebraic simplification:

package oberlin.algebra.builder;

import java.util.LinkedList;
import java.util.List;
import java.util.logging.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import oberlin.builder.TerminalSpelling;
import oberlin.builder.TerminalSpellingHandler;
import oberlin.builder.parser.SourcePosition;
import oberlin.builder.parser.ast.AST;
import oberlin.algebra.builder.nodes.*;

/*
 * Major changes:
 * For starters, if a chunk of the string matches a pattern, then that group
 * needs to be returned as the spelling of a token.
 * <Specific Atomic Type> ← Token ← Terminal ← AST
 * 
 * All tokens should contain a reference to their regular expression/grammar.
 * 
 */
public enum AlgebraicSpelling implements TerminalSpelling {
        //COMMENTS
        WHITESPACE(Pattern.compile("[\\s&&[^\\r\\n]]+"/*"^\\s+"*/), GrammarType.COMMENT, new TerminalSpellingHandler<Whitespace>(){
            //NOTE: We are looking for all whitespace other than newlines! Thus, \r and \n must be ignored.
            //Hence, the strange regex above. (To look for the same item in two places has undefined and
            //generally unpleasant behavior.)
            @Override
            public Whitespace getTerminal(String spelling, SourcePosition position) {
                return new Whitespace(spelling, position);
            }
            
        }),
        BLOCK_COMMENT(Pattern.compile("^/\\*.*?\\*/"), GrammarType.COMMENT, new TerminalSpellingHandler<BlockComment>(){

            @Override
            public BlockComment getTerminal(String spelling, SourcePosition position) {
                return new BlockComment(spelling, position);
            }
            
        }),
        LINE_COMMENT(Pattern.compile("^//.*+$"), GrammarType.COMMENT, new TerminalSpellingHandler<LineComment>(){

            @Override
            public LineComment getTerminal(String spelling, SourcePosition position) {
                return new LineComment(spelling, position);
            }
            
        }),
        
        //VALIDS
        NOMINAL(Pattern.compile("^[\\D&&\\w]\\w+"), GrammarType.KEEP, new TerminalSpellingHandler<Nominal>(){

            @Override
            public Nominal getTerminal(String spelling, SourcePosition position) {
                return new Nominal(spelling, position);
            }
            
        }),
        NUMERIC(Pattern.compile("^\\d+"), GrammarType.KEEP, new TerminalSpellingHandler<Numeric>(){

            @Override
            public Numeric getTerminal(String spelling, SourcePosition position) {
                return new Numeric(spelling, position);
            }
            
        }),
        OPERATOR(Pattern.compile("^[+-/\\\\÷\\*×\\^]"), GrammarType.KEEP, new TerminalSpellingHandler<Operator>(){

            @Override
            public Operator getTerminal(String spelling, SourcePosition position) {
                return new Operator(spelling, position);
            }
            
        }),
        EQUATOR(Pattern.compile("^!?=?[=><]"), GrammarType.KEEP, new TerminalSpellingHandler<Equator>(){

            @Override
            public Equator getTerminal(String spelling, SourcePosition position) {
                return new Equator(spelling, position);
            }
            
        }),
        
        //DELIMITERS
        LPAREN(Pattern.compile("^\\("), GrammarType.KEEP, new TerminalSpellingHandler<LParen>(){

            @Override
            public LParen getTerminal(String spelling, SourcePosition position) {
                return new LParen(spelling, position);
            }
            
        }),
        RPAREN(Pattern.compile("^\\)"), GrammarType.KEEP, new TerminalSpellingHandler<RParen>(){

            @Override
            public RParen getTerminal(String spelling, SourcePosition position) {
                return new RParen(spelling, position);
            }
            
        }),
        //NOTE: "\\r?\\n" also qualifies as whitespace! Remove it from the whitespace search!
        NEWLINE(Pattern.compile("\\n"), GrammarType.KEEP, new TerminalSpellingHandler<NewLine>(){
            @Override
            public NewLine getTerminal(String spelling, SourcePosition position) {
                return new NewLine(spelling, position);
            }
        })
    ;
    
    //PRIVATE FIELDS
    private Logger logger = Logger.getLogger("AlgebraicSpelling");
    
    //ENUMERATIONS
    private enum GrammarType {
        KEEP,
        COMMENT;
    }
    
    //FIELDS
    private final Pattern pattern;
    private final GrammarType type;
    private final TerminalSpellingHandler<?> handler;
    
    //CONSTRUCTORS
    /**
     * The two underlying details for any syntax pattern are its regular expression, and its semantic meaning.
     * The pattern is simply a compiled regular expression for matching the item. The type is a clue to its
     * semantic meaning, which is checked in a switch-case statement when called.
     * 
     * @param pattern compiled regular expression identifying the token
     * @param type clue as to what should be done with the token once found
     */
    private AlgebraicSpelling(Pattern pattern, GrammarType type, TerminalSpellingHandler<?> handler) {
        this.pattern = pattern;
        this.type = type;
        this.handler = handler;
    }
    
    //GETTERS/SETTERS
    @Override
    public Pattern getPattern() {
        return pattern;
    }

    @Override
    public List<AST> manageToken(Matcher matcher, SourcePosition position) {
        List<AST> ret = new LinkedList<>();
        switch(this.type) {
        case KEEP:
            ret.add(handler.getTerminal(matcher.group(), position));
            break;
        case COMMENT:
            //Just ignore it
            break;
        }
        return ret;
    }

}

So, every member is initialized to a specific regex pattern, and a notion of whether it is meant to be kept, or thrown out. You may also note that pure whitespace, aside from any new line character, is treated as a comment; this may not be the case for your language, but it is for most. In Java, as an example, whitespace only serves to delimit one token from another before compilation, unless it’s surrounded by other delimiters like quotation marks.

Scanner has a List, initially containing the bulk code. This code is passed, one token at a time, to the each member of TerminalSpelling; when one of them matches it, they return a new list of internal tokens. Those new tokens are spliced into the original list where the old token was.

Unless the recognizing spelling returned a single token that is, somehow, different from the token passed in (such as when whitespace or a comment is removed), index is incremented; otherwise, measures are in place to keep the index the same, and the token is scanned again until it is a single token in and of itself.

In the end, of course, the last psuedo-token (the remaining bulk of code) will be (or should be) an empty string. It serves no further purpose, so it is removed. At this point, we have a stream of tokens, ready to be parsed.

Afterthoughts

See that? Cheesecake. My original code, based on tripe I coughed out long ago, involved a lot of unnecessary steps. Simple code is functional code; code with more effort than strictly necessary results in at least two things: programmer cognitive fatigue, and often-unrepeatable or nigh-untraceable errors. If I have any further mistakes that I need to correct, please inform me in the comments. I’ll be updating the Github repo with the new code soon. Enums are a beautiful thing.

So Am I Finished?

Maybe. If you’re just looking for a text scanner, and no further formatting needs to be done to the token, then yeah, you’re ready to go. If, however, you are looking at attaching this thing to a parser, then there’s one more thing we need to add. It’s not big, but it makes life easier. It’s Terminal AST formatting for each token. I’ll explain that in a later chapter, after I’ve justified the need for it. It’s written into the Scanner class, but it’s conceptually part of the Parser module. Essentially, you’re at least pretty much done.

 
5 Comments

Posted by on October 12, 2014 in Software Language Engineering

 

Tags: , , , , , , , , ,

Software Language Engineering: Preface

The truth is, there are very few things that I can write an original tutorial on when it comes to software, and I have no intention of doing anyone the disservice of insisting on my own tutorial when someone else may have done a perfectly good job, and likely a superior one. But there are some things, things that not every programmer does, that there are excruciatingly few tutorials for on the internet.

If there’s one thing that is eternally true about the internet, it’s that whatever the topic, there are always at least a few people browsing for it. This is for you. This whole article chain is going to be about writing programming languages, be they scripts or natives, and most specifically it will be about writing compilers for them.

[For those interested in downloading my source code for the tutorials (which are being written and tested as I create this), you may do so via the github repository.]

There is, pleasantly, an MIT course available to the public on the subject. Unfortunately, by the nature of a sequence of lectures, it is likely to take you much more time than reading through these articles, and time is not a commodity for us. All the same, I do encourage you to at least browse through it sometime; the video is fairly low-resolution, but the material is great. As mentioned in it, there are two programs commonly used for creating the scanner and parser, respectively JLex and CUP. I will not be using them, but if you’re in a pinch, you might consider it.

I will be using Java 8, as it has a lot of wonderful features for scanning through collections (even vast, nay, gargantuan ones), and pretty good regular expression tools. If you’re concerned with the double-escaping of regular expressions that is inherent to the structure of the Java language, I recommend using this regex parser, as it allows you to work with singly-escaped regular expressions and converts them to a Java-comprehensible string. Most of us don’t have a natural knack for any regular expression syntax, so it comes in handy to have a testing environment. There are a number of other good ones out there, and you’re of course also welcome to produce your own kit if the fancy strikes you.

If you haven’t caught on, regular expressions are invaluable for this stuff.

Every build process has a number of standard components, the presence of each of which describes the nature of the build. They share a few things in common, which thankfully can be modularized into their own abstract classes. I’ll be showing you my style as we work through this.

Given that you’re even bothering to read the preface (it’s certainly no insult if you don’t), you may be expecting a little history on compilers. The first one was built in 1952 and simply denoted “A-0“, for “Arithmetic Language Version Zero”. It was created by the late Rear Admiral Grace Hopper of the United States Navy. This is actually a long stride after the first programmable machine, a loom built by Joseph Marie Jacquard introduced to the world in 1801. Programming, before that point, consisted basically of hex-editing and the notion of “coding” was, well, not really a notion at all. If you are a hardcore programmer (and I’m assuming you are), then memorize her name; we lost her in 1992, but the face of computing has forever been changed by her work.

You might also memorize Jacquard, as he was basically treated like Doctor Frankenstein by the local villagers and had the barn in which he kept his programmable looms burnt down by a bunch of Luddites, while he was off at war fighting for his nation. It didn’t work, of course. Torches and pitchforks aside, he still, also, changed the world, and I’m not just talking about the garment industry.

A-0 went through a number of phases, inclusive of A-0, A-1, A-2, A-3 (also known as ARITH-MATIC), AT-3 (also known as MATH-MATIC), and finally B-0 (FLOW-MATIC). I’m not particularly sure where this chain ended, but I am fairly certain that it did end and I am quite confident that I’ve never seen a word of any of them in my life. The most bronze-age language which is still in reasonably common use (C) derived from B, which was created from BCPL (backronymed “Before C Programming Language”, which was created from ALGOL 60 (which came from ALGOL 58), which came from FORTRAN, which came from a loose collection of macros known as Speedcode which IBM developed in 1953. (Note that it’s still a full year after Read Admiral Hopper immaculately created the first compiler without any other compiler to build through. Let that sink in.) C itself has been through three ANSI standards, C89, C99, and most recently C11, but they’re pretty much the same game with keywords and native structures made more appropriate to their time. C led to C++, C++ led to Java, Ruby and Python came out of pretty much nowhere, and then there’s all these other guys over there, and quite a few domain-specific little guys, and more languages than any of us will ever personally be fluent in. (I don’t care what your resumé says.)

They all have a few things in common, though. As an example, they are textual. (Don’t let this imply to you that there’s no such thing as a non-textual language, though; LabView and even HyperCard made quite a few advances in that direction.) They’re all stored in data structures, which are usually what we would think of as files. They’re all, most importantly, interpreted and encoded by another program. We will be building that program, and if you keep to a certain code of ethics in your construction, you will find it to be a fairly simple process.

Believe it or not, it’s a bit of a tradition to develop a compiler for a language in the language itself. It’s called “bootstrapping“. Obviously this isn’t so readily done with the first one (this isn’t an improbability drive), but the tradition began in 1958 with a language you’ve never heard of called NELIAC (basically another kind of ALGOL, which you may also never have heard of). It was popularized in 1962 by Lisp. Today, it’s commonly used for C, Java, Python, Scheme, and a plethora of others. It offers a few incidental advantages, inclusive of reducing the compiler’s object code, applying compiler output improvements to its own code, and the HR benefit of having no need for the programmers to know more than the compiled language. I’m sure I’ll come back to this, as when you can do it, you should do it. You get to catch the wave of every other programmer’s improvements to the compiler’s code, and add your own two bits to it.

Wrapping up, in the dendrites of the great tree of software language tendencies, we’re moving in a lot of different directions right now. Some of them are contradictory, sometimes both choices have advantages, and I can easily say that more people know how to program right now than at any other point in history. Software and hardware advances, like Google’s QScript and the OpenCL Standard maintained by Khronos, are splitting the once monotonous course of digital electronics into a thousand different rays, and it can go literally anywhere we want it to. Remember that what I am describing to you in this series is the standard method of engineering a computer language; it is hardly the only possibility. It’s an honor to help you follow it, and possibly, for quite a few of you, lead it.

 
 

Tags: , , , , , , , , , , , , , , , , , , ,

2: “Grey”

Grey

© Michael Eric Oberlin, October 11, 2014

The bottle had a transformative effect on the meal. Something about the ingredients loosening their tongues and their wits, morphing each of them into someone else, someone hidden. Maegon couldn’t remember the last time he had talked so much. He couldn’t quite remember what he had said, either, so at least there was a sense of balance to it.

As he came to sobriety, the first thing he noticed was the feeling of wet sand between his bare toes, the feeling of the cold ocean washing over his feet. His lock was still tied, but a bit tangled. He was at his home, his real home. His house wasn’t much to speak of, a single room shack somewhere up the beach. It was the ocean, the Grey, that brought sobriety to him.

The mists never cleared, in this world of his. They slurred starlight and sunlight into an indistinguishable half-light. Where there was no river or ocean, there were hot springs and faults, the planet’s inner heat forever belching out refined vapor. It drifted out over the ocean, into the unknown, forming an intangible barrier to those that would stray just a little too far.

He dug his feet into the sand once more, and thought. He wasn’t entirely sure how long it had been. He wasn’t tired. The only rational thing to do was that which he knew, so he treaded down the beach, barefoot, to the docks, where his boat was moored.

Maegon could only guess what had happened to his boots. They were good boots, durable, leather; likely somewhere in the ornithomancer’s abode, perhaps scattered, haphazard, on the floor. He skipped over the aging wood of the dock, careful not to get a splinter from its stressed and broken surface, and stepped down into the more polished surface of his watercraft.

Beneath it was a thick pane, made of soda-lime glass; a small lens shape was molded into it, beneath a trapdoor that covered part of the window. Beneath the pane was eternity, the whole of the ocean staring up at his wide pupils, free of the shroud of fog and wall. On the end was a reel of wire, the “chain”, one end permanently attached to the end of the dock. The other had a curious timer on it, consisting of one gear within another within another, each rotating about the edge of its containing gear for a full cycle before the other one moved a tenth of the way about its own container.

Their clocks were based on the only thing available to them that everyone could agree on, and that was the metric, the number ten. Beyond that, it was a question of mainspring temperance and convenience of construction, leveling off at the easiest materials to make and maintain.

This clock was not the best clock, the salt had eroded through its central spring many times now, and the condition of the other gears was questionable at best. Sometimes it ticked a little too slow, others it was too fast, occasionally, not at all. When it reached a third of the outermost gear (colloquially called a “throne”), the clock would trigger a small motor which would reel him back to shore.

Maegon untied the mooring rope, and pushed off. The metal wire slowly unreeled behind him, and gradually, the shore disappeared in the white fog. His only real notion of how far out he was was the emptying of the reel, but that could be deluding. The wired dropped into the sea, how far down he could only guess, and the docks could be anywhere from ten meters beyond the edge of his sight, to a thousand kilometers away.

As far as Deep was concerned, the docks no longer existed.

Maegon raised the trapdoor, exposing the lens. He opened up a small tackle box and pulled out two vials, one full of a white powder, the other a yellow salt. The white powder was emptied in part on the glass lens, and the yellow salt behind it. He tucked them away, then dipped his fingers in the ocean and splashed a little water onto the conglomerate. It began to smoke, just a little, then light up in a brilliant yellow.

The trap door closed behind the alchemical fire, protecting his eyes. Below, the entire ocean seemed to come into focus, the flame pouring its light down on everything beneath the boat. Every fish, every piece of coral, seemed to glow in the phosphor radiance. He supposed that he hadn’t been entirely forthcoming, earlier; he was not entirely alone on the ocean.

A puffer fish bobbed by on the seafloor, its casual burst of hidden spines dissuading the attention of a sea snake. Deep found himself again under the spell of wondering how deep the ocean went, how far out it might go, what lay beyond the beach of infinite shrouding. Were there other continents like this one? Were there other people, beyond the fog, or in some other part of it?

His tail coiled itself nearly into a ring and his fingers formed white-knuckled fists, his natural muscular response to his equally tense curiosity. There were other sailors, other fishermen, who had been lost at sea. Their cables rusted and broke, or their pegging on the pier collapsed, allowing their boats to drift further away, swallowed by the Grey itself. What happened to them, no one was really sure, but there were plenty of fisherman tales about what might have been.

Deep liked to imagine that they had landed on some other body of land, one that was not accessible by any dry route. He imagined that they found a place so wonderful that they never wanted to go back, or so alien that they never ran out of riddles to solve. The notion of an infinite ocean, one without bounds, was difficult for him to justify. Perhaps some of them did die of thirst on the open sea, but maybe someone reached a new land.

He stretched out his bare feet, and dropped his net over the side. A swarm of sardines was gathering beneath the glass, a potential gold mine of a catch, if he played it right. Calypso was right. The Grey had been good to him, lately. He dreamt sometimes that he was like the fish, and could walk among the coral and the anemone beneath the boat, down at the bottom of the sea, so very far away. He dreamt that he could breath the water, and meet the fish face to face, not as a predator, but as brethren. He dreamt that he could walk off into the distance, following the submerged sand to the end of its line, until the world ran out of ideas and repeated itself and just maybe, a new continent rose from the waters. He could always dream.

For now, he knew that every time he unreeled his line and slipped off into the mists, he was a bit further away from the land he knew. The beach drifted into eternity, and for a third of a throne, he was just a little closer to those strange places that stood beyond the wall of white. He dropped a fine net over the side, and prepared to grab a group of sardines. Someday, fortune might turn its eyes to him, and for better or for worse, he would be lost at sea. He would have an answer. For now, he had a little work to do.

 
Leave a comment

Posted by on October 12, 2014 in The Trials of Maegon Deep

 

Tags: , , , , , , , , ,