RSS

Tag Archives: Java 8

Effective OpenAL with LWJGL 3

Effective OpenAL with LWJGL 3

Jesus Bloody Christ it’s been a while.

So, a lot of you are likely interested in developing on Java with LWJGL3 instead of LWJGL 2.9.*; as you should be. LWJGL3 has support for a lot of modern industry trends with older versions did not; such as multi-monitor support without back flipping through flaming hoops, or basically anything involving GLFW. It’s still in beta, I know, but it’s a solid piece of work and the team on it is dedicated enough to make it a reliable and standing dependency for modern projects.

Except for every now and then, when it happens to be missing some minor things. Or, more importantly, when there’s a dearth of documentation or tutorials on a new trick you’re pulling.

I can contribute, at least in part, to both of those.

OpenAL is the audio world’s equivalent to OpenGL; it’s a sophisticated and sleek interface to sound hardware. Many common effects and utilities, such as 3D sound, are built into it directly; and it interfaces sublimely with code already designed for OpenGL. Additionally, it’s also a very tight interface that does not take long at all to learn.

In the past, I would suggest using JavaSound for Java game audio, which is also a tight API, but it lacks these features. Most major audio filters have to be built into it rather directly and often by your own hand; and there’s no official guarantee of hardware optimization. However, what LWJGL3’s OpenAL interface now lacks can easily be supported by readily-present JavaSound features; such as the audio system’s file loader.

This entry is on, step by step, how one would do such a thing.

Let’s start with a basic framework. I’ve tried to keep a balance between minimal dependencies and staying on-topic, so I’ll suggest that you have both LWJGL3 (most recent version, preferably), and Apache Commons IO, as dependency libraries.

class Lesson {
    public Lesson() throws Exception {
        //Start by acquiring the default device
        long device = ALC10.alcOpenDevice((ByteBuffer)null);

        //Create a handle for the device capabilities, as well.
        ALCCapabilities deviceCaps = ALC.createCapabilities(device);
        // Create context (often already present, but here, necessary)
        IntBuffer contextAttribList = BufferUtils.createIntBuffer(16);

        // Note the manner in which parameters are provided to OpenAL...
        contextAttribList.put(ALC_REFRESH);
        contextAttribList.put(60);

        contextAttribList.put(ALC_SYNC);
        contextAttribList.put(ALC_FALSE);

        // Don't worry about this for now; deals with effects count
        contextAttribList.put(ALC_MAX_AUXILIARY_SENDS);
        contextAttribList.put(2);

        contextAttribList.put(0);
        contextAttribList.flip();
        
        //create the context with the provided attributes
        long newContext = ALC10.alcCreateContext(device, contextAttribList);
        
        if(!ALC10.alcMakeContextCurrent(newContext)) {
            throw new Exception("Failed to make context current");
        }
        
        AL.createCapabilities(deviceCaps);
        
        
        //define listener
        AL10.alListener3f(AL10.AL_VELOCITY, 0f, 0f, 0f);
        AL10.alListener3f(AL10.AL_ORIENTATION, 0f, 0f, -1f);
        
        
        IntBuffer buffer = BufferUtils.createIntBuffer(1);
        AL10.alGenBuffers(buffer);
        
        //We'll get to this next!
        long time = createBufferData(buffer.get(0));
        
        //Define a source
        int source = AL10.alGenSources();
        AL10.alSourcei(source, AL10.AL_BUFFER, buffer.get(0));
        AL10.alSource3f(source, AL10.AL_POSITION, 0f, 0f, 0f);
        AL10.alSource3f(source, AL10.AL_VELOCITY, 0f, 0f, 0f);
        
        //fun stuff
        AL10.alSourcef(source, AL10.AL_PITCH, 1);
        AL10.alSourcef(source, AL10.AL_GAIN, 1f);
        AL10.alSourcei(source, AL10.AL_LOOPING, AL10.AL_FALSE);
        
        //Trigger the source to play its sound
        AL10.alSourcePlay(source);
        
        try {
            Thread.sleep(time); //Wait for the sound to finish
        } catch(InterruptedException ex) {}
        
        AL10.alSourceStop(source); //Demand that the sound stop
        
        //and finally, clean up
        AL10.alDeleteSources(source);
        

    }

}

The beginning is not unlike the creation of an OpenGL interface; you need to define an OpenAL context and make it current for the thread. Passing a null byte buffer to alcOpenDevice will provide you with the default device, which is usually what you’re after. (It is actually possible to interface with, say, multiple sets of speakers selectively, or the headphones instead of the speaker system, if you would like; but that’s another topic.)

Much like graphics devices, every audio device has its own set of capabilities. We’ll want a handle on those, as well. It’s safe to say that if a speaker can do it, OpenAL is capable of it; but not all speakers (or microphones) are created the same.

After this, OpenAL will want to know something of what we’re expecting it to manage. Note that it’s all passed over as a solid int buffer. We’re providing it with a notion of what features it will need to enact, or at least emulate; with a sequence of identifiers followed by parameters, terminated with a null. I haven’t begun to touch all that is possible here, but this attribute list should be enough for most uses.

After that, create the context, make it current, check to see that it didn’t blow up in your face, and register the capabilities. (Feel free to play with this once you’ve got the initial example going.)

So, before I get to the part where JavaSound comes in, let’s start with the nature of how OpenAL views sound. Sound, in its view, has three components: a listener, a source, and an actual buffer.

The listener would be either you or your program user; however, the program would want to know a little about your properties. Are you located something to the left or right? Are you moving (or virtually moving)? I usually set this first as it is likely to be constant across all sounds (kind of like a graphics context).

Next, we have a method of my own creation that builds and registers the audio file. Forgive me for the delay, but that’s where JavaSound’s features (in the core JKD) come in, and I’m deferring it to later in the discussion. You will note that the audio buffers have to be registered with OpenAL; as it needs to prepare for the data. There’s a solid chance that you will have sound-processor-local memory, much like graphics memory, and it will have to be managed accordingly by that processor before you can chuck any data at it.

Let’s look at that audio buffer creator.

     private long createBufferData(int p) throws UnsupportedAudioFileException, IOException {
        //shortcut finals:
        final int MONO = 1, STEREO = 2;
        
        AudioInputStream stream = null;
        stream = AudioSystem.getAudioInputStream(Lesson3.class.getResource("I Can Change — LCD Soundsystem.wav"));
        
        AudioFormat format = stream.getFormat();
        if(format.isBigEndian()) throw new UnsupportedAudioFileException("Can't handle Big Endian formats yet");
        
        //load stream into byte buffer
        int openALFormat = -1;
        switch(format.getChannels()) {
            case MONO:
                switch(format.getSampleSizeInBits()) {
                    case 8:
                        openALFormat = AL10.AL_FORMAT_MONO8;
                        break;
                    case 16:
                        openALFormat = AL10.AL_FORMAT_MONO16;
                        break;
                }
                break;
            case STEREO:
                switch(format.getSampleSizeInBits()) {
                    case 8:
                        openALFormat = AL10.AL_FORMAT_STEREO8;
                        break;
                    case 16:
                        openALFormat = AL10.AL_FORMAT_STEREO16;
                        break;
                }
                break;
        }
        
        //load data into a byte buffer
        //I've elected to use IOUtils from Apache Commons here, but the core
        //notion is to load the entire stream into the byte array--you can
        //do this however you would like.
        byte[] b = IOUtils.toByteArray(stream);
        ByteBuffer data = BufferUtils.createByteBuffer(b.length).put(b);
        data.flip();
        
        //load audio data into appropriate system space....
        AL10.alBufferData(p, openALFormat, data, (int)format.getSampleRate());
        
        //and return the rough notion of length for the audio stream!
        return (long)(1000f * stream.getFrameLength() / format.getFrameRate());
    }

We’re hijacking a lot of the older JavaSound API utilities for this. OpenAL, much like OpenGL, isn’t really “open”, nor is it technically a “library”. So, having something around for handling audio data is helpful, and why bother writing our own when it’s already built into the JDK?

For JavaSound, you work with either Clips, or (more frequently) AudioInputStreams. You can read most audio file formats directly via AudioSystem.getAudioInputStream(…); in this case, I’ve elected to use a WAV format of LCD Soundsystem’s “I Can Change”, because James Murphy is a god damned genius. However, you can use anything you would like; to get it to work with this just drop it in the same source directory.

Next up, grab the format of the sound with AudioStream.getFormat(). This will provide you with a lot of valuable information about the stream. If it’s a big endian stream (which most wave files are not), you might need to convert it to little endian or make proper alterations to OpenAL. I’ve glossed over this, as endian-ness is not really a part of the tutorial and there are plenty of good byte-management tutorials out there.

I’ve elected to use format to check for the mono/stereo status (more are actually possible), and whether the sound is 8-bit or more frequently 16-bit. (Technically 32- or even 64- bit sound is possible; but there is actually a resolution to the cochlea of the ear, and you’re not going to bump into that outside of labs with very funny looking equipment. Even Blu-ray doesn’t go above 24-bit. Seriously, there’s generally just no point in bothering.)

Afterward, we load the stream into a byte array (I’m using IOUtils for this for brevity, but you can do it however you like), and the byte array into a ByteBuffer. Flip the buffer, and punch it over to OpenAL, which will take care of the rest of the work with it. Afterwards, we will eventually need the length of the audio stream, so calculate it as shown and send it back to the calling method.

After the buffer’s been created and the length of it is known, we’ve got to create a source for it! This is where most of the cooler built-in effects show up. alGenSources() creates a framework for the source; alSourcei(source, AL10.AL_BUFFER, buffer.get(0)) ties it to the sound buffer. You’ll also see that I set up AL_GAIN and AL_PITCH, which are fun to play with.

You’re almost done!

To actually play the buffer, you use the source. alSourcePlay(source) starts it. After that, I have the Thread sleep for the calculated length of the sound, just so we have time to hear it. At the end, I call alSourceStop(source) to demand an end to the source.

Lastly, I delete all sources. You might also want to delete devices, if you’ve done anything silly with them; this is very low-level access. You now have everything you need to load audio into your games and programs, and if you happen to bump into an SPI for a preferred format, it will now also be enough to get you going on OpenAL as well.

Advertisements
 
Leave a comment

Posted by on July 4, 2016 in Java, Programming

 

Tags: , , , , , , , , , ,

A Case Against Using Null. (For Almost Anything.)

Java is my usual language, but this goes for everything.

I promise that this is not going to be another rant about NullPointerExceptions and their kin in other languages. This is not to say that such rants are not warranted, and even cheered; but I’m going to be a bit more academic about it. I’m also going to provide solutions, not only the ones available in Java 8, but what I used to do beforehand.

What Does “Null” Actually Mean?

Great question. Null as an adjective, according to the dictionary, means without value, effect, or significance. It also means, lacking, and nonexistent. It also means, empty. And, lastly, also probably most recently, it means zero. This is most likely a linguistic artifact, as everything is ultimately expressed in the binary on a computer. In C, null actually does equate to zero. However, this necessity has led all of us to a lot of abuse, because symbolically it isn’t what null is for. I’ll come back to that.

Null’s etymological origin comes from the latin nullus, meaning none, as in, it-has-not-been-set. While zero is reasonable, zero is an actual number. If you were enumerating the entries of a set of numbers, and you wanted to count the length of that set, you would not skip every entry that was zero, would you? However, you likely would for null, as it denotes no-entry in the set. Therein lies the critical difference.

In Java, objects are initialized to null, before they are set to any value. Object instances are the Java equivalent of C’s pointers; and while they cannot be without a value, they initialize to a language constant that reflects the absence of an intended one. Null is typically represented as zero, but not always, and I am unsure of the case with Java. However, a null pointer is a symbol, it is not reasonably a pointer to the zero position in memory. This position does, actually, exist; but on the reference of such a pointer, the virtual machine (or platform) throws up a red flag.

The nasty habit of using null as a return value when something goes wrong in an operation is almost ubiquitous, but unless this literally represents that no value has been set, it is a dangerous move. I’ve even seen it in the JDK. The response to such an ill-though-out method is usually a few lines of defensive programming, checking to see whether the object is null, and acting accordingly.

Java 8 Solutions

If you aren’t a Java programmer, you may wish to skip this section.

As it turns out, the defensive programming response to returned nulls is so similar, in every instance, that it can be encapsulated into an object itself. This would be Java 8’s Optional. Optional represents a possible value, that is, a value which cannot be guaranteed to exist. However, the Optional itself is never null.

On initialization of an Optional, it is best to set it to Optional.empty(), that is, an Optional with no contents. If a value is being wrapped in an Optional, use Optional.of(). If the presence of the value is unknown, use Optional.ofNullable(), it will do the defensive work for you. The rest of the methods of Optional apply Java 8’s influences from functional programming. What used to require complex if statements is now done primarily through ifPresent(…) and orElse(…).

This might seem like an overreaction to you. However, compared to the work that I used to have to do just to catch every wrench-in-gears value that might pass by, it is a miracle. If you disagree, you need only ask yourself how frequently you have been getting NullPointerExceptions. Adopt Optionals, and you won’t get them anymore.

Older Java Solutions

In previous versions, several further techniques have been added. The biggest problem with “!= null” is that it is an operation, and a mandatory operation, which will slow down code very slightly. This is imperceptible for the vast majority of programs, but if you need something to run searing fast, then it can be unacceptable.

If you are writing an API, I might suggest funnelling all input through a defensive checking method before passing it along to the meat methods; but if you are writing code that only you will access, there is a simpler solution: assertions. This is particularly true for unit testing with programs like JUnit.

This is exclusively functional during development, as in order to enable assertion testing, you need to pass the -ea parameter to the java compiler. Unless you can force this on users, it is exclusively meant to help you identify routes by which null, or any other unacceptable value, can make it to your methods.

The syntax is simple. Given parameter “x”:

assert x != null : "[error message]";

If the provided boolean expression evaluates to false, an AssertionError is thrown, with a message of the toString() value of whatever was passed on the right (for me, most typically an actual string).

I don’t generally like to see assertions making their way into production code today, as I am inclined toward Optionals; but this is quite effective for debugging. Additionally, such statements can be considered an essential part of JUnit tests. If you are in a rush, it is possible to ignore all assertions remaining in a slice of code by removing the “-ea” parameter from the compiler; but on the human end, this is bad practice and worth avoiding.

As an alternative, Apache Spring has a class called Assert which handles more or less the same tasks as the assert keyword.

Broader Solutions for Object Oriented Languages

At last, in the most general sense, there is the Null Object Pattern. This is, still, my ultimate preference when building a set of classes, as there is no need for Optional when null never enters the equation.

A Nullary Object is an object extending the appropriate interface, with defined behavior, denoted as equivalent to null. This has its ups, and its downs. As an example, suppose we had this interface:

public interface Animal {
    public String speak();
}

with these implementations:

public class Dog implements Animal {
    public String speak() { return "bark"; }
}

public class Cat implements Animal {
    public String speak() { return "meow"; }
}

public class Bird implements Animal {
    public String speak() { return "tweet"; }
}

And we had one further class that requires one unknown animal, which will indubitably call “speak()”. Which animal is beyond our control, and we don’t want our program to crash on a NullPointerException simply because no animal was specified. The solution is one further class:

public class NullaryAnimal implements Animal {
    public String speak() { return "…"; }
}

In the case of abstract classes, it is often helpful to have the nullary class be a member of the class itself. This is also particularly helpful when there are multiple behaviors which might, otherwise, be implemented as “null”. The potential down side is for people who were actually looking for an exception to be thrown; in such a case, simply fill speak() with an Apache Commons NotImplementedException or something relatable.

One extension of this pattern is such:

public abstract class Sequence {
    //...
    
    public static final Sequence ANY = new Sequence(…);
    public static final Sequence ALL = new Sequence(…);
    public static final Sequence NONE = new Sequence(…);
}

In this instance, a new Sequence can be initialized to Sequence.NONE, ALL, or ANY, and be replaced if a new value is provided. Additionally, since these are actual objects and constant values, they respond appropriately to equals checks.

There may be a name for this pattern, I’m honestly not sure. I came up with it on my own, but I very much doubt that I’m the first.

Conclusion?

Hardly. However, you now hopefully have a new set of tools to keep unfinished declarations and, even worse, “= null” statements out of your program. I hope I’ve made your life easier!

 
Leave a comment

Posted by on January 10, 2015 in Programming

 

Tags: , , , , , , , , , , , , , , , ,

Visual Feedback on an Abstract Parsing Tree with JavaFX

I honestly didn’t expect to be writing this, but it seems fair.

In the past few editions, I’ve been discussing the AST. It can be overwhelmingly complicated for a complete program; so I’ve been using a simple, single line equation as the sample. Unfortunately, that isn’t very realistic; and it would be very helpful to have a procedurally generated visual tree available. That tree is what this lesson is all about.

At first I considered using a graphical tree style, like javax.swing.JTree; but that can be painfully over-simplistic in itself. I would prefer to outline the material the same way I would draw it on a white board (which, if you’re wondering, I do). The best way to do this? JavaFX.

JavaFX whiteboard abstract syntax tree

Graphical AST tree rendering, through JavaFX/F3

If you aren’t familiar with JavaFX, please do me a favor and tolerate the name. It was originally F3, for Form-Follows-Function. I kind of liked F3, until some marketer decided that “JavaFX” sounded better. Functionally speaking, it’s an excellent revision on how user interfaces are designed in Java. I fully stand behind it. It allows for XML structuring and CSS styling, just like a web page, to more hard-coded controls. This is much, much faster; and it allows for significant beauty in user interfaces. However, it works very differently from things like Swing and AWT; and while I’m certain that it isn’t the first API to do so, it takes some getting used to.

I fully intend to write a true tutorial on all of JavaFX on some point. Do you need to understand it to understand translators? Absolutely not. However, this code does work. It is not part of the Github repository, as it is technically a tangential project; but the same license (GNU GPL) applies to it and you are welcome to copy it token for token. I’ll put it up on Github as I get the chance. I’ll make a few minor comments along the way to help you follow it.

1. The Basic Application

We have exceedingly few needs for our app. It simply reads a program from a stream, parses it, and feeds the parse tree to a custom node, which displays it graphically. Accordingly, the program code is rather small. I’ll begin by displaying it, then I’ll spend a moment piecing it together in English for you.

package oberlin.builder.gui;

import oberlin.algebra.builder.AlgebraicBuilder;
import oberlin.builder.parser.ast.AST;
import javafx.application.*;
import javafx.scene.*;
import javafx.scene.layout.*;
import javafx.stage.*;

public class GUIMain extends Application {

    public static void main(String...args) {
        launch(args);
    }

    @Override
    public void start(Stage primaryStage) throws Exception {
        Pane root = new Pane();
        root.getStyleClass().add("backing");
        
        Scene scene = new Scene(root);
        scene.getStylesheets().add(GUIMain.class.getResource("tree.css").toExternalForm());
        
        primaryStage.setScene(scene);
        root.setMinWidth(640.0);
        root.setMinHeight(480.0);
        
        populate(root);
        
        primaryStage.show();
        primaryStage.centerOnScreen();
    }
    
    private void populate(Pane p) {
        /*
         * This is simply an example, so I've ignored input for now.
         * In theory, you would replace the line below (containing
         * hard code) with an input loop.
         */
        AST program = (new AlgebraicBuilder()).getParseTree("1+2");
        
        p.getChildren().add(new GUITree(program));
    }
}

All JavaFX/F3 programs begin with Application.launch(String…args). JavaFX programs run in what is effectively their own thread, and more so than with Swing-based programs. Launch parses arguments and stores them in their own object, appropriately called Parameters. They can be accessed, at any point later on, via Application.getParameters(). Our available overloads and customizations cut out for a moment, then come back in in the start(Stage) method.

Stage is basically where Frame would be; but it’s a little more complicated than that. Unlike Swing and AWT, which were designed to be platform independent, JavaFX is designed to be hardware context independent. What you are writing here will work equally well on a PC, tablet, and smart phone; as well as anything else built (now or later) that maintains a JavaFX compatibility standard. Thus, what might otherwise be called a frame or window is referred to as a stage, as it might be neither of those things.

You’ll notice that the instantiated Pane is given a style class. If you aren’t familiar with CSS, a style class is what’s used to differentiate between one element and any number of others which, otherwise, would look exactly like it. Thus, it allows CSS to pick and choose which elements of the layout it is styling at a given moment. I’ve chosen “backing” as the name for this element, as it is the backboard of our tree. You will also note that, two lines later, the CSS file itself is loaded.

Next, a Scene is created. Scenes are critically important, and distinct from stages. While a stage represents the context that the layout is drawn in, the scene represents the actual controls and constraints within that space. Thus, while many aspects of Stage are immutable (and unknowable), Scene allows for greater flexibility. JavaFX sees to it that they correspond, so don’t worry about that.

Scene is styled through its root element, which in this case is our pane. You’ll notice that instead of the stricter setWidth() and setHeight() that you might be familiar with from Swing, we are setting a minimum on these bounds. That minimum is not guaranteed, as the display may not be capable of it, but it is treated as a general rule to be followed if at all possible. In this case, I’m going for classic analog low-def TV resolution, 640 width by 480 height. (Looking back, those numbers might be inadequate, but for now they’re quite functional.) If this is too small for you, the frame—if it is a frame, anyway—is easily resizable.

Populate() is a method I wrote to add the paraphernalia to the scene; but note that afterwards we call show(). This is very important, as otherwise our stage will be constructed in memory, but never displayed to the screen. Additionally, there will be no way to kill the JavaFX thread save for a hard interrupt. Once shown, the closing of the primary stage will flag the program to terminate.

1.1. Populate

It’s a generally good habit, but not a necessary one, to populate your frame in a separate and dedicated method. This is what I do here, even though for the moment, I only have one control to add.

The AST method should be old news; it’s a stand-in, for the moment, for an actual code-reading portion. (I’m assuming that you’re looking to compile more than just “1+2”.) GUITree is a custom JavaFX node, which I will explain next. Note that to add a node to a program, you must take some structure (not yet visible) in the scene graph (stemming from your chosen root), and get its Children as a modifiable list. Then, you must add that node to the list.

Note that after a stage is visible, precious little of the scene can be changed save for through the constraints built into it. I’m not going to touch on Expressions and Bindings here, but know that if you pull something that doesn’t play by JavaFX’s rulebook, it will throw an ApplicationException and your program will not launch. Thankfully, while exceedingly picky, that rulebook is small. If you call show() and then try and add a child, you will have problems; it must be the other way around.

If you’re curious, hiding a rendered stage does not count for making it modifiable. You must give it your entire concept first, then make it visible. If you’re familiar with OpenGL, you’ll already understand why.

2. The Tree Itself

The tree is a custom JavaFX node, which I admit is rarely necessary. Still, most of the entities that make it work are core to the API.

package oberlin.builder.gui;

import oberlin.builder.parser.ast.AST;
import javafx.geometry.BoundingBox;
import javafx.geometry.Bounds;
import javafx.geometry.Point2D;
import javafx.scene.control.Tooltip;
import javafx.scene.layout.*;
import javafx.scene.paint.Color;
import javafx.scene.shape.CubicCurve;

import java.util.function.IntSupplier;

public class GUITree extends AnchorPane {
    private Bounds bounds = new BoundingBox(0, 0, 640, 480);
    private AnchorPane framing = new AnchorPane();
    private double edgeSize = 0.10;    //ten percent additional length beyond edges of framing
    
    public GUITree(AST ast) {
        this.setMinWidth(bounds.getWidth() * (1 + edgeSize));
        this.setMinHeight(bounds.getHeight() * (1 + edgeSize));
        
        configureFraming();
        
        addNode(ast);
    }
    
    private void configureFraming() {
        framing.setLayoutX(edgeSize * (bounds.getWidth() / 2.0));
        framing.setLayoutY(edgeSize * (bounds.getHeight() / 2.0));
        framing.setMinWidth(bounds.getWidth());
        framing.setMinHeight(bounds.getHeight());
        
        this.getChildren().add(framing);
    }
    
    private ASTNode addNode(AST ast) {
        return this.addNode(ast, new Marker(0), new Counter(), 0, null);
    }
    
    private ASTNode addNode(AST ast, IntSupplier stepsDown, IntSupplier stepsAcross, int index, ASTNode parent) {
        ASTNode node = new ASTNode(ast, stepsDown.getAsInt(), stepsAcross.getAsInt());
        
        //AnchorPane stuff
        calculateAnchoring(node, parent);
        
        framing.getChildren().add(index ++, node);
        
        final StringBuilder tooltipText = new StringBuilder();
        IntSupplier across = new Counter();
        for(AST kid : ast.getContainedNodes()) {
            tooltipText.append(kid.getClass().getSimpleName()).append(" ");
            ASTNode child = addNode(kid,
                    new Marker(stepsDown.getAsInt() + 1),
                    across,
                    index,
                    node);
            CubicCurve cubic = createCubicCurve(node.getNoodleRoot(), child.getTopCenter());
            framing.getChildren().add(cubic);
        }
        node.getType().setTooltip(new Tooltip(tooltipText.toString()));
        
        return node;
    }
    
    private CubicCurve createCubicCurve(Point2D p1, Point2D p2) {
        CubicCurve curve = new CubicCurve();
        
        curve.setStartX(p1.getX());
        curve.setStartY(p1.getY());
        
        curve.setEndX(p2.getX());
        curve.setEndY(p2.getY());
        
        curve.setControlX1(p1.getX());
        curve.setControlY1(p2.getY());
        
        curve.setControlX2(p2.getX());
        curve.setControlY2(p1.getY());
        
        curve.setStroke(Color.BLACK);
        curve.setStrokeWidth(2.0);
        curve.setFill(Color.TRANSPARENT);
        
        return curve;
    }
    
    private void calculateAnchoring(ASTNode node, ASTNode parent) {
        node.setOrigin(new Point2D(parent == null ? (bounds.getWidth() - node.getBounds().getWidth())/2.0 :
            justifyX(node, parent), justifyY(node)));
        AnchorPane.setTopAnchor(node, node.getOrigin().getY());
        AnchorPane.setLeftAnchor(node, node.getOrigin().getX());
    }
    
    private Double justifyX(ASTNode node, ASTNode parent) {
        final double parentCenter = (parent.getOrigin().getX() + (parent.getBounds().getWidth() / 2.0)
                + parent.getNoodleRoot().getX()) / 2.0;
        final double center = parentCenter
                - node.getBounds().getWidth()
                        * (parent.getAST().getElementCount()) / 2.0; 
        return center + node.getOrigin().getX();
    }
    
    private Double justifyY(ASTNode node) {
        return node.getOrigin().getY();
    }
}

That was a bit much at once, I know. The central pane, called “framing”, is 640 by 480. Framing is offset in each direction by a 5% inset, via the convenient features of AnchorPane.

AnchorPane is one of the few prepared ways to control where a node is rendered, with precision, in JavaFX. You may often need to keep your own tabs on where it is rendered, as getMinX() and getMaxX() will return zero more often than you will believe. However, through direct layout control, you can still manage them.

The method addNode(…) adds a custom object called ASTNode. I’ll cite it for you here.

package oberlin.builder.gui;

import javafx.collections.FXCollections;
import javafx.collections.ObservableList;
import javafx.geometry.BoundingBox;
import javafx.geometry.Bounds;
import javafx.geometry.Point2D;
import javafx.geometry.Pos;
import javafx.scene.control.Label;
import javafx.scene.layout.StackPane;
import javafx.scene.layout.VBox;
import javafx.scene.text.TextAlignment;
import oberlin.builder.parser.ast.AST;

class ASTNode extends VBox {
    private Bounds bounds = new BoundingBox(0, 0, 100, 40);
    private Point2D origin = new Point2D(0, 0);
    private final double expanse = 1.10;
    
    private final AST ast;
    
    private Label type;
    private Label hash;
    
    private ObservableList<ASTNode> kids = FXCollections.observableArrayList();
    
    public ASTNode(AST ast) {
        this.ast = ast;
        type = new Label(ast.getClass().getSimpleName().toString());
        type.setTextAlignment(TextAlignment.CENTER);
        type.setAlignment(Pos.CENTER);
        
        hash = new Label(Long.toHexString(ast.hashCode()).toUpperCase());
        hash.setTextAlignment(TextAlignment.CENTER);
        hash.setAlignment(Pos.CENTER);
        
        VBox vbox = new VBox(new StackPane(type), new StackPane(hash));
        vbox.getStyleClass().add("node");
        vbox.setMinWidth(bounds.getWidth());
        vbox.setMinHeight(bounds.getHeight());
        this.getChildren().add(vbox);
        
        for(AST kid : ast.getContainedNodes()) {
            addKid(new ASTNode(kid));
        }
        
    }
    
    public Point2D getNoodleRoot() {
        return new Point2D(getOrigin().getX() + (getBounds().getWidth() / 2),
                getOrigin().getY() + getBounds().getHeight());
    }

    public ASTNode(AST ast, int level) {
        this(ast);
        
        origin = new Point2D(0, level * bounds.getHeight() * expanse);
    }
    
    public ASTNode(AST ast, int levelDown, int levelAcross) {
        this(ast);
        
        origin = new Point2D(getStepAcrossSize(levelAcross), getStepDownSize(levelDown));
    }
    
    public double getStepDownSize(int steps) {
        return steps * bounds.getHeight() * expanse;
    }
    
    public double getStepAcrossSize(int steps) {
        return steps * bounds.getWidth() * expanse;
    }
    public void addKid(ASTNode astNode) {
        this.kids.add(astNode);
    }
    
    public ObservableList<ASTNode> getKids() {
        return kids;
    }
    
    public Bounds getBounds() {
        return bounds;
    }
    
    public Point2D getOrigin() {
        return origin;
    }
    
    public Point2D getTopCenter() {
        return new Point2D(
                getOrigin().getX() + (getBounds().getWidth()/2),
                getOrigin().getY());
    }
    
    public Label getType() {
        return type;
    }
    
    public void setOrigin(Point2D p) {
        this.origin = p;
    }
    
    public AST getAST() {
        return this.ast;
    }
}

ASTNode is a JavaFX Node as well. It simply maintains a reference to the AST itself, and the general presentation of that AST on the tree. There isn’t a lot here. If you’re wondering what VBox is, it’s an abbreviation for “vertical box”. (Naming a class after an abbreviation is bad practice, but it’s long since done by powers above me; I tolerate it as much as I do “AST”.)

Speaking of bad practice, this would ideally actually use Bindings, but I wrote this in a bit of a rush today and will have to correct that in the future. It is also bad practice to repeat data, which is exactly what this program is doing by re-storing the label text in a separate field. All the same…

I’m going to gloss over a lot of the configuration of the labels, as it’s relatively standard. Know that like any other pane in JavaFX, a VBox can be initialized with a list of its bounded nodes; also, a StackPane has the default behavior of centering its own bounded nodes.

The last thing done in the constructor is the creation of additional ASTNodes for each child node of the abstract syntax tree.  Each of them, in turn, renders their own children. This is not perfect, there is a substantial chance that two lists of nodes will overlap one another; however, it is already excellent for debugging visitor pattern based content. In the end, the GUITree renders each node in an assigned place, with a curved cubic line (technically called a “noodle”) connecting it to its parent and its children.

How does it do that? With IntSuppliers.

3. The IntSuppliers

There are only two of these.

package oberlin.builder.gui;

import java.util.function.IntSupplier;

/**
 * For downward counts; always returns provided number.
 * 
 * @author © Michael Eric Oberlin Dec 23, 2014
 *
 */
class Marker implements IntSupplier {
    private int fix;
    
    public Marker(int fix) {
        this.fix = fix;
    }
    
    @Override
    public int getAsInt() {
        return fix;
    }
}


package oberlin.builder.gui;

import java.util.function.IntSupplier;

/**
 * For counts across; always returns next consecutive number.
 * 
 * @author © Michael Eric Oberlin Dec 23, 2014
 *
 */
class Counter implements IntSupplier {
    private int count;
    
    @Override
    public int getAsInt() {
        return count++;
    }
    
}

IntSuppliers (and really all Suppliers) are part of the java.util.function package, new to Java 8. The great advantage of this package is that a function, or any functional interface, allows you to specify a method that serves as a primitive with conditionally defined values. I know that’s a leap, but I’ve been doing it since long before it was formally adopted into the language and it’s a central totem of functional languages.

We could, in theory and practice, use incrementing and decrementing integers in place of either of these. The problem is that the code gets a lot longer and a lot more cluttered when you do. I prefer the sublime simplicity of packing such behavior into an interface.

Of course, these are not everything. There is one, final, issue.

4. What was that that you said about “CSS”?

The CSS is specific to JavaFX; a complete listing of all of the properties is available here. If you are unfamiliar with the syntax of CSS, you can find an excellent tutorial on it (for HTML, at least) at W3Schools. It isn’t as versatile as Java or C, but its creators pulled many of its properties from C-like languages.

tree.css:

.backing {
    -fx-background-color: lightyellow;
    -fx-insets: 0;
    -fx-padding: 15;
    -fx-spacing: 10;
}

.node {
    -fx-background-color: lightblue;
    -fx-background-radius: 5.0;
    -fx-border-color: black;
    -fx-border-radius: 5.0; 
}

Keep this in the same folder as GUIMain, and it will find it as written.

The CSS styling of JavaFX controls is capable of everything HTML 5 is and then some. It’s an excellent fusion of programming and markup. I encourage you to play with the layout of GUIMain’s scene, and the actual program fed to the builder.

 

Tags: , , , , , , , , , , , , , , , ,

Software Language Engineering: Terminals, Nonterminals, and Bears (Early Edition)

The Notion of a Terminal AST

So, in the last chapter, I explained what an AST was structurally. There are formally two kinds of extensions to it. I usually implement them in their own classes, extending the base class of AST.

The first is the terminal. If you’ve programmed in Java for even a month, you know that having a method which accepts two different kinds of unrelated classes in the stead of one another is a bad idea for all kinds of reasons.

It is, actually, possible.

That just doesn’t mean that you should do it.

ASTs are formally called trees, but what they are is nodes on a tree. A program is a single AST, with an extension typically called “Program” as the overarching root node. The branch nodes, or nodes with child nodes, are called non-terminal nodes; the end points are called terminal nodes.

Each of those tokens that your Scanner kicked out? Yeah, that’s a terminal node, disguised as a String.

Let me offer you some of my code for ASTs, Terminals, and NonTerminals. (As before, there are major issues that I’m leaving out until later on. See if you can catch them.)

package oberlin.builder.parser.ast;

import oberlin.builder.codegenerator.RuntimeEntity;
import oberlin.builder.visitor.*;

/**
 * Abstract Syntax Tree, capable of representing any sequence of 
 * statements or the entire program.
 * 
 * @author © Michael Eric Oberlin Nov 3, 2014
 *
 */
public interface AST {
    /**
     * @return number of sub-elements contained in this tree node.
     */
    public int getElementCount();
}
package oberlin.builder;

import oberlin.builder.parser.ast.AST;

/**
 * Basis of all complete abstract syntax trees. Terminals are basically isolated-tokens known only by their spellings.
 * 
 * @author © Michael Eric Oberlin Nov 5, 2014
 *
 */
public class Terminal implements AST {
    private final String spelling;
    
    public Terminal(String spelling) {
        this.spelling = spelling;
    }
    
    public final String getSpelling() {
        return this.spelling;
    }

    @Override
    public String toString() {
        return getSpelling();
    }

    @Override
    public final int getElementCount() {
        return 1;
    }
}

Let those soak in a little. Note that each Terminal has a length of one, meaning that it is the only member of its immediate tree. That will be important when we develop our Parser.

A terminal is an instance of an AST, and can be created by simply passing its token to it. The token is stored in the field “spelling”. Terminal is also fully extensible, as even though their token is consistently their only members, there is a significant difference between an equals operator, a binary operator, and numerical data; and nonterminal nodes take that difference quite seriously.

The Notion of a Nonterminal AST

A nonterminal AST is an AST built not from characters in a String, but from a sequence of other ASTs. The constituent ASTs can be terminals, or nonterminals. Remember BNF? Ever listed item before may-consist-of (::=) was a nonterminal. In instances of single-member representation, such as:

Noun ::= ProperNoun

“Noun” is still a nonterminal, as the implication is that (at least in theory) multiple types of items can be its constituents.

The “Program” node is of course always a nonterminal. I’ve written a nice slice of code for them, too.

package oberlin.builder;

import java.util.*;

import oberlin.builder.parser.ast.AST;

public abstract class NonTerminal implements AST {
    private final List<AST> astList;
    
    public NonTerminal(AST... astList) throws MismatchException {
        if(!checkTypes(astList)) throw new MismatchException("Nonterminal class " + this.getClass() + " does not match " +
                "expression.");
        
        List<AST> list = new ArrayList();
        for(AST ast : astList) {
            list.add(ast);
        }
        this.astList = list; 
    }
    
    public NonTerminal(List<AST> astList) throws MismatchException {
        try {
            this.astList = resolveTypes(astList);
        } catch(BuilderException ex) {
            throw new MismatchException(ex);
        }
    }
    
    public abstract List<Class<? extends AST>> getExpectedASTTypes();
    
    /**
     * Check to see that all provided ASTs are some extension of the expected class of AST,
     * and create the internal list of ASTs from it if possible.
     * 
     * @param astList current list of program ASTs 
     * @return true if the first ASTs match the expected ones, false otherwise
     */
    private List<AST> resolveTypes(List<AST> astList) throws BuilderException {
        List<AST> ownASTs = new ArrayList<>();
        List<Class<? extends AST>> astTypes = getExpectedASTTypes();
        
        for(int i = 0; i < astTypes.size(); i++) {
            Class<? extends AST> provided, expected;
            provided = astList.get(i).getClass();
            expected = astTypes.get(i);
            if(!expected.isAssignableFrom(provided)) {
                throw new BuilderException("Cannot get " + expected + " from " + provided.getClass());
            }
            ownASTs.add(astList.get(i));
        }
        return ownASTs;
    }
    
    /**
     * Check to see that all provided ASTs are some extension of the expected class of AST.
     * 
     * @param astList current array of program ASTs 
     * @return true if the first ASTs match the expected ones, false otherwise
     */
    private boolean checkTypes(AST... astList) {
        List<Class<? extends AST>> astTypes = getExpectedASTTypes();
        
        for(int i = 0; i < astList.length; i++) {
            Class<? extends AST> provided, expected;
            provided = astList[i].getClass();
            expected = astTypes.get(i);
            if(!provided.equals(expected)) {
                return false;
            }
        }
        return true;
    }
    
    @Override
    public int getElementCount() {
        return astList.size();
    }
}

This is where the critical detail is left out of my code. If you already see it, then congratulations! In any case, I’ll let you all in on it at the end of the chapter. It goes back to the advantage of regular expressions over manually checking lists.

In “Programming Language Processors in Java”, David A. Watt and Deryck F. Brown use a visitor pattern to scan for these. That’s a perfectly valid approach. I get the same advantages through my constructor-or-exception pattern, which you have seen before. In fact, if you’re careful, you may notice that it’s another format of the same general pattern; without the traditional Visitor and Element classes. In my github repo, you might notice that I have the outline of a Visitor pattern implemented on these. Pay it no mind, it is proving unnecessary.

Still a good idea. Just unnecessary.

What Do They Mean to a Parser?

A Parser’s job is thus. Iterate through a list of Terminal syntax tree nodes. Compare the beginning of the list to established members of a list of nonterminal nodes. Find a match.

Truncate the matched beginning of the list of ASTs; this is, remove every element that is a member of the new NonTerminal. Now that they are removed, append that NonTerminal to the beginning of the list, in their place. Repeat this process until the list of ASTs is of size one, and return that singular AST.

And of course, if the program cannot be parsed down to a singular AST, throw a Parser Error.

Caveats

These are meant to be identified through members of an enumeration; just as tokens were. However, you might remember BNF items such as:

Command ::= Command+

That is, a variable number of Command nodes can be condensed into the immediate tree of a single Command node. This is an issue for a simple list of ASTs, as in its provided implementation, NonTerminal simply attempts to match a fixed sequence of AST types to a provided list of ASTs, one by one. This is almost always inadequate for a programming language.

I’m going to step around writing a whole new series on how regular expressions work. (For now.) The important detail is that a minimum and maximum number of items, extending a specific AST class, need to be matched. I was tempted to call these “Clauses”, but that just barely relates to the actual definition of “clause”; so instead, we’ll borrow from regular expression terminology and call them Groups.

The down side? I’m still implementing Groups. They will have their own chapter though.

Now, we learn how to build a Scanner. (Again. Sort of.)

 

Tags: , , , , , , , , , , , , ,

Software Language Engineering: Intro and the Scanner

(Early Edition)

[My project has evolved a little since I started this. Allow me to make a few modifications for clarity.]

Introduction

So, back to Software Language Engineering. Well, not really; I literally just wrote the preface, and I’m a little burnt out. However, I think it’s only fair that I walk as much as I talk, so allow me to begin this by explaining the core concept of a scanner, and providing you with a little example code. This entry will be fair game for revision for a very long time; but that’s for the sake of humility, not inaccuracy. I might even separate it into two distinct parts; one on theory, the other on implementation. If anyone’s voting for that discretion, please mention it in the comments. Also note that I am reserving the right to back-propagate any major change I make to any part of this program later on; thus even after completion this article may never be static. However, I do promise to compile and run the code myself before putting it up. You may quite liberally scream at me in the comments if I still muck something up.

The Scanner

The notion of a build tool is to take code, often but not always human-written, and convert it to a set of native software structures that can be understood and executed by a CPU. The two initial phases of that, one of which I will be addressing here, are scanning, and parsing. Scanning is the earliest phase, an operation which is far too easy to accidentally amalgamate with parsing. Scanning is separating the code into tokens with atomic meaning to the language. They are not organized or checked for semantic sensibility, they are simply separated and cleaned up a little.

For the human end, it’s often easier to allow as much whitespace as we care to use, wherever we should care to put it. It’s easy to do, it’s streamlined, and it isn’t hard to read. For the later stages of a typical build tool, unless otherwise specified, whitespace is noise. It is syntax without semantic, ignorable data that serves no purpose. It stops at the scanner. Likewise, a build tool is not typically concerned with the content of a structure like a text string. It is concerned with where it begins, and where it ends; its literal contents are ultimately irrelevant.

Additionally, comments are treated much like whitespace. Whether your comment character is a double forward slash “//” or a hash sign “#”, you don’t want your program worrying about what’s between it and the end of the line. These are also responsibilities of the scanner. If you’re smart, you’ll be setting up abstract classes to contain these. Most languages (every one of them that I can think of off of the top of my head) can be interpreted in this manner, so you are likely to use them many times over.

We’re going to begin with an example builder for basic algebra, with the end goal of simplifying the equation. I suggest that you create four packages in your project, as I have: oberlin.builder, oberlin.builder.scanner, oberlin.algebra.builder, and oberlin.algebra.builder.scanner. (My own code, actively under development, can be found on Github, but please note that I have pushed this to Github only sporadically of late; given the nature of the tutorial, I am more concerned with pushing working code to it than whatever I’m experimenting with.) In builder, create a launch class called Main. This will be your testing facility.

All code © Michael Eric Oberlin, October 11th 2014; and subject to the terms of the GNU General Public License Agreement.

package oberlin.builder;

/*
 * © Michael Eric Oberlin, October 11, 2014
 * 
 * This program is free for educational usage, and free to be modified in any manner; however
 * its creator cannot be held responsible for any damage done to end-users' computers from it
 * or any alteration of it.
 */
import java.io.*;
import java.util.*;

import oberlin.algebra.builder.*;
import oberlin.algebra.builder.scanner.*;

public class Main {
    
//    private Builder builder = new NullaryBuilder();
    private Builder builder = new AlgebraicBuilder();
    
    public static void main(String...args) throws IOException, BuilderException {
        new Main();
    }
    
    public Main() throws IOException, BuilderException {
        try(BufferedReader in = new BufferedReader(new InputStreamReader(System.in));
                BufferedWriter out = new BufferedWriter(new OutputStreamWriter(System.out));) {
            
            out.write("Type \"!exit\" to exit the program\n");
            String input = "";
            Object built = null;
            while(!input.trim().toLowerCase().equals("!exit")) {
                if(input.length() != 0) {
                    built = builder.build(input);
                    
                    out.write("Builds to: " + built);
                    out.newLine();
                    out.flush();
                }
                
                out.write("Enter text: ");
                out.flush();
                input = in.readLine();
            }
        }
    }
}

The notion here is to have the least likely token that I can think of, bang-exit, be used to exit the loop. You could remove that tag, if you wanted to rely on Control-C or Eclipse’s red square to exit; but generally it’s safe to assume that not every end user has, or knows, an easy way out of an arbitrary program.

I am using BufferedReader and BufferedWriter because they’re easier to work with than System.in and System.out directly; and in all but the most desperate debugging situations, System.out.print* is fairly deprecated code—a one-line anti-pattern, really. It’s 2014, people! Learn to use loggers!

You may also note that you have two Scanner objects declared, one of which is commented out. I strongly recommend creating a Nullary* version of every interface, in place of using null. Null is more a natural state than an idea, and the inclination to use it to express any other meaning tends to lead to predicaments. (In Java 8, if you must use null, or contend with old code that uses null, consider wrapping each instance in Optionals.)

So, on running this program, you are continually presented with an inquiry for a line of text. Our build target is that line. On entry, it is sent to a Builder, which returns an object. The program reports that object, and you’re good to go for another line entry. Generally this structure is programmer-directed, but it isn’t a bad basis for something along the lines of an inline interpreter, such as the one Python uses. All of our experiments will stem from this program.

So, what is Builder?

Builder is a class that encapsulates the build process in entirety. It will reference the scanner, the parser, and every other major component. For end users, it is a “black box”; something identified entirely by its function. The programs “javac”, “rhino”, and “g++” could all stand in as a Builder. We’ll begin with an abstraction of it, like so.

package oberlin.builder;

import java.util.List;
import java.util.logging.*;

import javax.naming.OperationNotSupportedException;

import oberlin.algebra.builder.nodes.Program;
import oberlin.builder.scanner.*;
import oberlin.builder.parser.*;
import oberlin.builder.parser.ast.*;

public abstract class Builder {
    private Logger logger = Logger.getLogger("Builder");
    private Scanner<?> scanner = new NullaryScanner();
    private Parser<?> parser;
    
    public Object build(String code) throws BuilderException {
        List<AST> tokens = (List<AST>) scanner.apply(new Terminal(code, new SourcePosition()));
        
        parser = createParser(tokens);
        
        //We now have a list of Terminals, ready to be constructed
        //into an AST via a Parser.
        
        //NOTE: This is technically an exclusive part of the algebra
        //builder. It should be abstracted out.
        AST program = parser.parse(Program.class);
        
        //DEBUG
        analyzeTree(program);
        
        //TODO: Generate Object Program with code generator
        //TODO: Return Object Program
        
        //TODO: At this point, objectProgram should be our completed and
        //runnable binary. The following code is more for
        //embedded testing than anything else, and should be removed 
        //before release.
        
        StringBuilder szBuilder = new StringBuilder();
        for(AST token : tokens) {
            szBuilder.append(token).append("; ");
        }
        return szBuilder.toString();
        
    }
    
    //TODO: MICK: Remove redundancies before release.
    public AST getParseTree(String code) {
        List<AST> tokens = (List<AST>) scanner.apply(
            new Terminal(code, new SourcePosition()));
        
        parser = createParser(tokens);
        
        //We now have a list of Terminals, ready to be constructed
        //into an AST via a Parser.

        //NOTE: This is technically an exclusive part of the algebra
        //builder. It should be abstracted out.
        AST program = parser.parse(Program.class);
        
        return program;
    }
    
    public Parser<?> createParser(List<AST> tokens) {
        return new NullaryParser(tokens);
    }

    public Scanner<?> getScanner() {
        return scanner;
    }

    public void setScanner(Scanner<?> scanner) {
        this.scanner = scanner;
    }

    protected Parser<?> getParser() {
        return parser;
    }

    protected void setParser(Parser<?> parser) {
        this.parser = parser;
    }
    
    //DEBUG
    private void analyzeTree(AST ast) {
        try {
            System.out.println(ast.getContainedNodeNames());
            
            for(AST node : ast.getContainedNodes()) {
                analyzeTree(node);
            }
        } catch(OperationNotSupportedException ex) {
            System.out.println(ex.getMessage());
        }
    }
    
}

[I’m sure you see the skips and bumps characteristic of a work in progress.]

Calling the build(…) method will in turn run each component, beginning with the implemented scanner, and derive a program object from the code. So, given the rough outline of a scanner’s function, this is my recommendation for your Scanner class. A scanner’s job is not to interpret code. Keep that in the back of your mind. It is, succinctly, to divide it into a sequence of specialized tokens usually called “lexemes”. Its job is also not to determine the meaning of those lexemes, that task belongs to a parser.

Confession

One of the first complete textbooks I ever read, on the direct subject of building a build tool, was “Programming Language Processors in Java — Compilers and Interpreters”, by David A. Watt and Deryck F. Brown, Prentice Hall. It’s an excellent introduction to the subject, and I had some familiarity before I first picked it up; but it was written in 2000.

That was back in the Java 2 days. Those were dark times. A lot of wonderful and exciting things have happened in the past ten years or so, and a decent part of my compulsion to write this tutorial is to take my old methods and, well, refurbish them for the modern era. My initial draft of this code had some embarrassing parts. It scanned not only for comments and valid tokens, but for delimiters (separately), whitespace (in spite of its identical treatment to a comment), and a lot of other nonsense.

Please forgive any relic of a less experienced time that slips into my new material; Java 2 was before we even had enumerations, let alone generics, and if you were going to pick up any bad habits, it was a great time for it. I actually wrote a haiku about that. It was a long time ago, and the longer the better. Java 8, or even Java 6, is much better suited to writing such material.

End Confession

Scanners aren’t ultimately that different from Parsers, so if I wanted to, I could jump to Parser, have it back reference a specialization of itself, and write out the rules in Backus-Naur Form. But if I did that, you guys would be really mad at me and no better off, so let’s go through it logically.

Every language differs by grammar, and Scanner’s interest in that grammar is key spellings for the language lexicon. The best way to represent these spellings is through an enumeration of self-contained checks against a regular expression. It is also possible to store them in a map, but I find that to be extraneous here.

Let’s start with the outline of the abstracts.

package oberlin.builder.scanner;

import java.util.*;
import java.util.function.*;
import java.util.logging.*;

import oberlin.builder.*;
import oberlin.builder.parser.ast.*;

/**
 * New design for scanners. Requires an enumeration of type
 * TerminalSpelling to iterate over, and a code string.
 * Returns a list of tokens.
 * 
 * @author © Michael Eric Oberlin Nov 2, 2014
 *
 */
public interface Scanner<E extends Enum<E> & TerminalSpelling>
    extends Function<AST, List<AST>> {
    @Override
    public default List<AST> apply(AST code) {
        Logger logger = Logger.getLogger("Scanner");
        
        //Start with a list of tokens, with the singular pre-token of the bulk of code
        List<AST> tokens = new LinkedList<>();
        tokens.add(code);
        
        //For each item found in the code, parse it, and replace
        //the contents of tokens with the parsed contents
        for(int index = 0; index < tokens.size(); index++) {
            //maintain for check
            AST origToken = tokens.get(index);
            
            List<AST> newTokens = new ArrayList<>();
            for(TerminalSpelling terminalSpelling : getSpelling().getEnumConstants()) {
                
                try {
                    newTokens.addAll(terminalSpelling.matchToken(tokens.get(index)));
                    
                    replaceItemWithCollection(tokens, index, newTokens);
                    break;
                } catch (MismatchException e) {
                    //didn't match, so continue to the next item
                    continue;
                }
            }
            
            //Final defensive check: if one token was received, and one token provided, and
            //the old is not the same as the new, then the only thing that happened was the
            //removal of irrelevant data. Thus, index should be decremented so that this new
            //item may be scanned again.
            if(newTokens.size() == 1 && !newTokens.get(0).equals(origToken)) index--;
        }
        
        //This algorithm will always terminate the token list with an empty token, the one item
        //which cannot have semantic value. So, remove it here.
        tokens.remove(tokens.size() - 1);
        
        return tokens;
    }
    
    /**
     * Internally used method which replaces an entry in a
     * provided list with a collection.
     * 
     * @param list list to be altered
     * @param entry numeric index of replaced entry
     * @param replacement item to insert in place of current data
     */
    static <E> void replaceItemWithCollection(List<E> list,
        int entry, Collection<E> replacement) {
        list.remove(entry);
        list.addAll(entry, replacement);
    }
    
    /**
     * 
     * @return TerminalSpelling allocated to token recognition.
     */
    public Class<E> getSpelling();
}
package oberlin.builder;

import java.util.LinkedList;
import java.util.List;
import java.util.regex.*;

import oberlin.builder.parser.SourcePosition;
import oberlin.builder.parser.ast.*;

public interface TerminalSpelling {
    
    public default List<AST> matchToken(AST ast) throws MismatchException {
        List<AST> returnable = new LinkedList<>();
        
        Pattern pattern = getPattern();
        Matcher matcher = pattern.matcher(ast.toString());
        if(matcher.find()) {
            returnable.addAll(manageToken(matcher, ast.getPosition()));
            returnable.add(new Terminal(ast.toString().substring(matcher.end()), ast.getPosition()));
            return returnable;
        } else throw new MismatchException("String \"" + ast + "\" does not match grammar pattern + \"" + pattern + "\"");
    }
    
    public Pattern getPattern();
    
    /**
     * Determines, often through enumeration self-inspection, what should be done with a passed token.
     * 
     * @param token the original token removed from the complete String by matchToken
     * @return appropriate value given the circumstances and implementation of TerminalSpelling.
     */
    public List<AST> manageToken(Matcher matcher, SourcePosition position);
}

There is one item here that I might spend a few lines of English on. In Scanner, there is one chunk that you might not recognize, unless you’re some kind of an enumerations ninja, or an experimenter like me.

for(TerminalSpelling terminalSpelling : getSpelling().getEnumConstants())

The important detail is to see to it that for every implementation, getSpelling() always returns the class of the enumeration. This class has a method, getEnumConstants(), which returns a list of all members of the enumeration (provided that the class is an enumeration). Therefore, this line allows for the paging through of every single type of TerminalSpelling in the enumeration, one by one, via external iteration.

So what is TerminalSpelling in detail? Each spelling is an example of a valid or ignorable token, and how it should look. In a properly formed program in your target language, every single chunk, regardless of whether it ends up in a token or not, should match these TerminalSpelling members. Scanner is easy enough, all you really need to do to override it (in this instance, at least) is set a spelling rule for it to follow. Your language might have a self-modification feature, so making it final may not be a good idea, but the default will work for most languages. As for TerminalSpelling, that’s where the interesting stuff happens. Take, as an example, our implementation for algebraic simplification:

package oberlin.algebra.builder;

import java.util.LinkedList;
import java.util.List;
import java.util.logging.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import oberlin.builder.TerminalSpelling;
import oberlin.builder.TerminalSpellingHandler;
import oberlin.builder.parser.SourcePosition;
import oberlin.builder.parser.ast.AST;
import oberlin.algebra.builder.nodes.*;

/*
 * Major changes:
 * For starters, if a chunk of the string matches a pattern, then that group
 * needs to be returned as the spelling of a token.
 * <Specific Atomic Type> ← Token ← Terminal ← AST
 * 
 * All tokens should contain a reference to their regular expression/grammar.
 * 
 */
public enum AlgebraicSpelling implements TerminalSpelling {
        //COMMENTS
        WHITESPACE(Pattern.compile("[\\s&&[^\\r\\n]]+"/*"^\\s+"*/), GrammarType.COMMENT, new TerminalSpellingHandler<Whitespace>(){
            //NOTE: We are looking for all whitespace other than newlines! Thus, \r and \n must be ignored.
            //Hence, the strange regex above. (To look for the same item in two places has undefined and
            //generally unpleasant behavior.)
            @Override
            public Whitespace getTerminal(String spelling, SourcePosition position) {
                return new Whitespace(spelling, position);
            }
            
        }),
        BLOCK_COMMENT(Pattern.compile("^/\\*.*?\\*/"), GrammarType.COMMENT, new TerminalSpellingHandler<BlockComment>(){

            @Override
            public BlockComment getTerminal(String spelling, SourcePosition position) {
                return new BlockComment(spelling, position);
            }
            
        }),
        LINE_COMMENT(Pattern.compile("^//.*+$"), GrammarType.COMMENT, new TerminalSpellingHandler<LineComment>(){

            @Override
            public LineComment getTerminal(String spelling, SourcePosition position) {
                return new LineComment(spelling, position);
            }
            
        }),
        
        //VALIDS
        NOMINAL(Pattern.compile("^[\\D&&\\w]\\w+"), GrammarType.KEEP, new TerminalSpellingHandler<Nominal>(){

            @Override
            public Nominal getTerminal(String spelling, SourcePosition position) {
                return new Nominal(spelling, position);
            }
            
        }),
        NUMERIC(Pattern.compile("^\\d+"), GrammarType.KEEP, new TerminalSpellingHandler<Numeric>(){

            @Override
            public Numeric getTerminal(String spelling, SourcePosition position) {
                return new Numeric(spelling, position);
            }
            
        }),
        OPERATOR(Pattern.compile("^[+-/\\\\÷\\*×\\^]"), GrammarType.KEEP, new TerminalSpellingHandler<Operator>(){

            @Override
            public Operator getTerminal(String spelling, SourcePosition position) {
                return new Operator(spelling, position);
            }
            
        }),
        EQUATOR(Pattern.compile("^!?=?[=><]"), GrammarType.KEEP, new TerminalSpellingHandler<Equator>(){

            @Override
            public Equator getTerminal(String spelling, SourcePosition position) {
                return new Equator(spelling, position);
            }
            
        }),
        
        //DELIMITERS
        LPAREN(Pattern.compile("^\\("), GrammarType.KEEP, new TerminalSpellingHandler<LParen>(){

            @Override
            public LParen getTerminal(String spelling, SourcePosition position) {
                return new LParen(spelling, position);
            }
            
        }),
        RPAREN(Pattern.compile("^\\)"), GrammarType.KEEP, new TerminalSpellingHandler<RParen>(){

            @Override
            public RParen getTerminal(String spelling, SourcePosition position) {
                return new RParen(spelling, position);
            }
            
        }),
        //NOTE: "\\r?\\n" also qualifies as whitespace! Remove it from the whitespace search!
        NEWLINE(Pattern.compile("\\n"), GrammarType.KEEP, new TerminalSpellingHandler<NewLine>(){
            @Override
            public NewLine getTerminal(String spelling, SourcePosition position) {
                return new NewLine(spelling, position);
            }
        })
    ;
    
    //PRIVATE FIELDS
    private Logger logger = Logger.getLogger("AlgebraicSpelling");
    
    //ENUMERATIONS
    private enum GrammarType {
        KEEP,
        COMMENT;
    }
    
    //FIELDS
    private final Pattern pattern;
    private final GrammarType type;
    private final TerminalSpellingHandler<?> handler;
    
    //CONSTRUCTORS
    /**
     * The two underlying details for any syntax pattern are its regular expression, and its semantic meaning.
     * The pattern is simply a compiled regular expression for matching the item. The type is a clue to its
     * semantic meaning, which is checked in a switch-case statement when called.
     * 
     * @param pattern compiled regular expression identifying the token
     * @param type clue as to what should be done with the token once found
     */
    private AlgebraicSpelling(Pattern pattern, GrammarType type, TerminalSpellingHandler<?> handler) {
        this.pattern = pattern;
        this.type = type;
        this.handler = handler;
    }
    
    //GETTERS/SETTERS
    @Override
    public Pattern getPattern() {
        return pattern;
    }

    @Override
    public List<AST> manageToken(Matcher matcher, SourcePosition position) {
        List<AST> ret = new LinkedList<>();
        switch(this.type) {
        case KEEP:
            ret.add(handler.getTerminal(matcher.group(), position));
            break;
        case COMMENT:
            //Just ignore it
            break;
        }
        return ret;
    }

}

So, every member is initialized to a specific regex pattern, and a notion of whether it is meant to be kept, or thrown out. You may also note that pure whitespace, aside from any new line character, is treated as a comment; this may not be the case for your language, but it is for most. In Java, as an example, whitespace only serves to delimit one token from another before compilation, unless it’s surrounded by other delimiters like quotation marks.

Scanner has a List, initially containing the bulk code. This code is passed, one token at a time, to the each member of TerminalSpelling; when one of them matches it, they return a new list of internal tokens. Those new tokens are spliced into the original list where the old token was.

Unless the recognizing spelling returned a single token that is, somehow, different from the token passed in (such as when whitespace or a comment is removed), index is incremented; otherwise, measures are in place to keep the index the same, and the token is scanned again until it is a single token in and of itself.

In the end, of course, the last psuedo-token (the remaining bulk of code) will be (or should be) an empty string. It serves no further purpose, so it is removed. At this point, we have a stream of tokens, ready to be parsed.

Afterthoughts

See that? Cheesecake. My original code, based on tripe I coughed out long ago, involved a lot of unnecessary steps. Simple code is functional code; code with more effort than strictly necessary results in at least two things: programmer cognitive fatigue, and often-unrepeatable or nigh-untraceable errors. If I have any further mistakes that I need to correct, please inform me in the comments. I’ll be updating the Github repo with the new code soon. Enums are a beautiful thing.

So Am I Finished?

Maybe. If you’re just looking for a text scanner, and no further formatting needs to be done to the token, then yeah, you’re ready to go. If, however, you are looking at attaching this thing to a parser, then there’s one more thing we need to add. It’s not big, but it makes life easier. It’s Terminal AST formatting for each token. I’ll explain that in a later chapter, after I’ve justified the need for it. It’s written into the Scanner class, but it’s conceptually part of the Parser module. Essentially, you’re at least pretty much done.

 
5 Comments

Posted by on October 12, 2014 in Software Language Engineering

 

Tags: , , , , , , , , ,

Software Language Engineering: Preface

The truth is, there are very few things that I can write an original tutorial on when it comes to software, and I have no intention of doing anyone the disservice of insisting on my own tutorial when someone else may have done a perfectly good job, and likely a superior one. But there are some things, things that not every programmer does, that there are excruciatingly few tutorials for on the internet.

If there’s one thing that is eternally true about the internet, it’s that whatever the topic, there are always at least a few people browsing for it. This is for you. This whole article chain is going to be about writing programming languages, be they scripts or natives, and most specifically it will be about writing compilers for them.

[For those interested in downloading my source code for the tutorials (which are being written and tested as I create this), you may do so via the github repository.]

There is, pleasantly, an MIT course available to the public on the subject. Unfortunately, by the nature of a sequence of lectures, it is likely to take you much more time than reading through these articles, and time is not a commodity for us. All the same, I do encourage you to at least browse through it sometime; the video is fairly low-resolution, but the material is great. As mentioned in it, there are two programs commonly used for creating the scanner and parser, respectively JLex and CUP. I will not be using them, but if you’re in a pinch, you might consider it.

I will be using Java 8, as it has a lot of wonderful features for scanning through collections (even vast, nay, gargantuan ones), and pretty good regular expression tools. If you’re concerned with the double-escaping of regular expressions that is inherent to the structure of the Java language, I recommend using this regex parser, as it allows you to work with singly-escaped regular expressions and converts them to a Java-comprehensible string. Most of us don’t have a natural knack for any regular expression syntax, so it comes in handy to have a testing environment. There are a number of other good ones out there, and you’re of course also welcome to produce your own kit if the fancy strikes you.

If you haven’t caught on, regular expressions are invaluable for this stuff.

Every build process has a number of standard components, the presence of each of which describes the nature of the build. They share a few things in common, which thankfully can be modularized into their own abstract classes. I’ll be showing you my style as we work through this.

Given that you’re even bothering to read the preface (it’s certainly no insult if you don’t), you may be expecting a little history on compilers. The first one was built in 1952 and simply denoted “A-0“, for “Arithmetic Language Version Zero”. It was created by the late Rear Admiral Grace Hopper of the United States Navy. This is actually a long stride after the first programmable machine, a loom built by Joseph Marie Jacquard introduced to the world in 1801. Programming, before that point, consisted basically of hex-editing and the notion of “coding” was, well, not really a notion at all. If you are a hardcore programmer (and I’m assuming you are), then memorize her name; we lost her in 1992, but the face of computing has forever been changed by her work.

You might also memorize Jacquard, as he was basically treated like Doctor Frankenstein by the local villagers and had the barn in which he kept his programmable looms burnt down by a bunch of Luddites, while he was off at war fighting for his nation. It didn’t work, of course. Torches and pitchforks aside, he still, also, changed the world, and I’m not just talking about the garment industry.

A-0 went through a number of phases, inclusive of A-0, A-1, A-2, A-3 (also known as ARITH-MATIC), AT-3 (also known as MATH-MATIC), and finally B-0 (FLOW-MATIC). I’m not particularly sure where this chain ended, but I am fairly certain that it did end and I am quite confident that I’ve never seen a word of any of them in my life. The most bronze-age language which is still in reasonably common use (C) derived from B, which was created from BCPL (backronymed “Before C Programming Language”, which was created from ALGOL 60 (which came from ALGOL 58), which came from FORTRAN, which came from a loose collection of macros known as Speedcode which IBM developed in 1953. (Note that it’s still a full year after Read Admiral Hopper immaculately created the first compiler without any other compiler to build through. Let that sink in.) C itself has been through three ANSI standards, C89, C99, and most recently C11, but they’re pretty much the same game with keywords and native structures made more appropriate to their time. C led to C++, C++ led to Java, Ruby and Python came out of pretty much nowhere, and then there’s all these other guys over there, and quite a few domain-specific little guys, and more languages than any of us will ever personally be fluent in. (I don’t care what your resumé says.)

They all have a few things in common, though. As an example, they are textual. (Don’t let this imply to you that there’s no such thing as a non-textual language, though; LabView and even HyperCard made quite a few advances in that direction.) They’re all stored in data structures, which are usually what we would think of as files. They’re all, most importantly, interpreted and encoded by another program. We will be building that program, and if you keep to a certain code of ethics in your construction, you will find it to be a fairly simple process.

Believe it or not, it’s a bit of a tradition to develop a compiler for a language in the language itself. It’s called “bootstrapping“. Obviously this isn’t so readily done with the first one (this isn’t an improbability drive), but the tradition began in 1958 with a language you’ve never heard of called NELIAC (basically another kind of ALGOL, which you may also never have heard of). It was popularized in 1962 by Lisp. Today, it’s commonly used for C, Java, Python, Scheme, and a plethora of others. It offers a few incidental advantages, inclusive of reducing the compiler’s object code, applying compiler output improvements to its own code, and the HR benefit of having no need for the programmers to know more than the compiled language. I’m sure I’ll come back to this, as when you can do it, you should do it. You get to catch the wave of every other programmer’s improvements to the compiler’s code, and add your own two bits to it.

Wrapping up, in the dendrites of the great tree of software language tendencies, we’re moving in a lot of different directions right now. Some of them are contradictory, sometimes both choices have advantages, and I can easily say that more people know how to program right now than at any other point in history. Software and hardware advances, like Google’s QScript and the OpenCL Standard maintained by Khronos, are splitting the once monotonous course of digital electronics into a thousand different rays, and it can go literally anywhere we want it to. Remember that what I am describing to you in this series is the standard method of engineering a computer language; it is hardly the only possibility. It’s an honor to help you follow it, and possibly, for quite a few of you, lead it.

 
 

Tags: , , , , , , , , , , , , , , , , , , ,