The foundation of the Ensō programming model is the representation and manipulation of data. We are borrowing ideas from database and information modeling, and turning that into a data structuring mechanism that makes sense in a programming language. When you think of data structures as mini databases some interesting things can happen. Most importantly, it encourages thinking of data holistically rather than as individual objects or individual data types. It also enables us to rethink the idea of “types” in programming languages.
Programming languages can be divided into two camps: those with static type checking and those with dynamic type checking. The static ones allow programmers to describe the structure of their data and then create and manipulate values that have the specified structure. Since these type systems are checked statically, they traditionally only describe the structure of values, not semantic constraints. It is also possible to define abstract data types, which wrap a primitive data structure with operations that enforce semantic constraints. More recently there has been extensive work on adding more semantic constraints into static types, although there is significant tension due to the need for static verification of the types. Existing dynamically typed languages have a built-in set of predefined data structures, which the programmer uses to create larger structures. In many cases these higher-level structures are never described explicitly. Dynamically-typed object-oriented languages allow programmers to wrap dynamic data with methods to enforce semantic constraints. Objects implement the constraints locally and operationally. There typically is no declarative global view of the semantic constraints that the programmer intends on the data.
Ensō adopt the descriptive and declarative view typical of static type systems, but uses dynamic checking so that arbitrary behavioral constraints can be included in the descriptions. Because we are exploring new territory, we avoid using the word “type” in talking about Ensō data. We use the term “schema” instead, borrowed from the database literature.
This leads to the first principle of Ensō, which is that all data is described by schemas. A schema specifies the properties and constraints on a kind of data. For example, if we want to describe 2D points, we might say “A point has an x and a y component, which are integers.” This is corresponds to the following schema:
class Point x: int y: int
This defines a class named Point with two fields, named x and y. We use the generic term field to describe a named component of a structure. In the next post I’ll talk about how to write a grammar for points, but for now we can assume that points can be written in the normal way, e.g. (3, 6). Also, given a point P, we can access the fields in the normal way, as P.x and P.y. We will also talk about constraints later. For now I want to talk about some more complex structures and then circularity of descriptions. Continuing with larger structures, we can define rectangles and polygons:
class Rectangle top_left: Point bottom_right: Point class Polygon points: Point*
The * means that a polygon has a list of (zero or more) points. This example also shows that the type of a field can either by a primitive type, int in the case of x and y, or a complex type Point the case of top_left and points. Note that some of the typical data structuring ideas, like List<Point>, which are usually explicit in a programming language, are implicit in Ensō. We will also talk about how to create and manipulate rectangles and polygons later. There are many things to talk about, and I’m sorry I can’t do it all at once. For now I have a specific agenda, as you will see in a minute.
The schema above is called the Point Schema or Geometry Schema. This is because it is a schema and it is about points, or geometry. We can create many kinds of schemas in Ensō, to represent many kinds of data. For example, a Genealogy Schema could describe the structure of genealogical data, a Mail Schema could describe the emails in an email program, a Diagram Schema could describe the structure of visual diagrams, and a File System Schema could describe the structure of files and directories in a file system. I will talk about all these examples, and more, some other time. Buts its not rocket science. If you know Entity Relationship (ER) modeling, or Class Diagrams in UML, then this process of creating schemas will be familiar. What might be a little strange is writing these schemas down and calling them a “programming language”. I mean, I haven’t even told you how to write “Hello World!” yet. But trust me, these schemas are programs, in the sense that they can be used to create/manipulate data. (Oh, and I never will tell you how to write Hello World… it’s not interesting). For now I want to focus on something that might be more interesting, which is circular descriptions.
These descriptions are also data. From this and principle #1, it immediately follows that schemas are described by a schema. This means we need to write down a schema the describes the structures we’ve be writing above. In programming terms, what we need is a type of types. This meta-schema needs to say something like “a class has a name and a list of fields, which specify their name and type”.
class Class name: string fields: Field* class Field name: string type: ???
This is a good start at a schema that describes schemas. We had a little problem writing it down, because the type of a field can either be a class or a primitive type (Point or int in the examples above). The structure we need is familiar: a generic category called Type that has two specific realizations Primitive and Class. This is a class hierarchy. One thing that Primitive and Class have in common is that they both have names, so we can put that field into the base class. Another problem with the schema is that it has two classes, which work together to describe classes. Thus it would be good to be able to organize collections of classes. Lets introduce this into our schema:
class Schema types: Type* class Type name: string class Primitive < Type class Class < Type fields: Field* class Field name: string type: Type
The notation < Type means that Primitive and Class are subclasses of Type. This is now a very nice description, and it is capable of describing the Geometry Schema given above. Using YAML notation, the explanation of the Point Schema in terms of the meta-schema would look something like this:
class Point x: int y: int
Schema: - Class: name: "Point" fields: - Field: name: "x" type: *int - Field: name: "x" type: *int - Primitive &int name: "int"
On the left is the point schema, and on the right is an encoding of the Point Schema where its schema structure is made explicit. In our use of YAML, the top level is the class name, and then the fields are listed below. The fields values can either be primitive data, or more complex fields. The label &int identifies the primitive type named “int” which is referred to via *int for the type of the x and y fields. (Technically the label should come first, but this is pseudo-YAML, not real YAML). The left is quite nice, but the explanation on the right is not a pretty thing to look at. But it is useful to understand the mapping between them. As another simple example, a point would look like this:
Point x: 3 y: 7
The above example shows that our proposed Schema Schema is able to describe the Point schema. Note how the complexity goes up rapidly as we move from points to descriptions of points to descriptions of descriptions of points.
If you think about the corollary above, that schemas are described by a schema, it should be clear that there is potential for infinite regress. If everything is described by a schema (principle #1), and schemas are described by schemas (corollary #1) then the only way to avoid regress is if some schemas describe themselves. This is what the Schema Schema must do. It is a schema that describes the structures of schemas, including itself. As it stands, there are two features that are used in the proposed Schema Schema that it is not able to explain: subtyping and collection fields. These are represented by the < and * symbols above. Its easy to explain collection fields, by simply adding a boolean flag to each field that specifies whether the field is single-valued or multi-valued. A super class can also be listed on a class definition:
class Schema types: Type* class Type name: string class Primitive < Type class Class < Type super: Class fields: Field* class Field name: string type: Type single: bool primitive string primitive bool
Ooops! We have a problem: not every class has a superclass. Thus we need to allow the superclass field to be optional. If we do this, then we have to have an explanation for fields whose value is optional as well. We will use the symbol ? to indicate optional values, and then add an optional attribute to the description of fields:
class Schema types: Type* class Type name: string class Primitive < Type class Class < Type super: Class? fields: Field* class Field name: string type: Type single: bool optional: bool primitive string primitive bool
For completeness, we also added descriptions for the primitive types used in the schema.
Now this is a schema that describes itself. It is a slightly simplified version of the Schema Schema in Ensō. The explanation of how the Schema Schema describes itself is ugly, but it is interesting to see how the structural description contains data that is analogous to the structure:
Schema: types: - Class: name: "Schema" fields: - Field: name: "types" type: *Type single: false optional: false - Class &Type name: "Type" fields: - Field: name: "name" type: *string single: true optional: false - Class name: "Primitive" super: *Type - Class &Class name: "Class" super: *Type fields: - Field: name: "super" type: *Class single: true optional: true - Field: name: "fields" type: *Field single: false optional: false - Class &Field name: "Field" fields: - Field: name: "name" type: *string single: true optional: false - Field: name: "type" type: *Type single: true optional: false - Field: name: "single" type: *bool single: true optional: false - Primitive &string name: "string" - Primitive &bool name: "bool"
Explanations are often more complex than the simple facts. The original scheme is highly circular, and this explanation makes the circularity explicit. In particular, the definition of the class Class is labeled &Class, and this is used in the type of the super field that is inside the class Class. Remember that Ensō means circle 🙂
This kind of meta-circular description is well known in the literature. It is used in Smalltalk and the the Meta Object Facility (MOF) of UML. It has also been studied in programming languages. Some languages have explored the idea of allowing types to have types, and the possibility of a type of types. There might be theoretical issues with having a type of types, but we are having too much fun programming with Ensō to worry about it. Also, in Ensō we merely say that the schema describes itself. We don’t think of schemas as “sets of values” so there is no issue of a set that contains itself.
The Schema Schema given above is not exactly the one we use in Ensō right now, but it is close. I’ll talk about some other features over the next few weeks. We are implementing Ensō in Ruby, but last week I played with encoding the Schema Schema in Haskell. I’m not sure what we could do with it yet, but it is fun to look at.