Ensō on MOdeling LAnguages blog

Jordi Cabot invited us to introduce Ensō on his MOdeling LAnguages blog and the result was this manifesto introducing Ensō philosophy and goals. The idea of a manifesto is to state what you believe strongly as a guide to future action. The result is bound to be somewhat controversial.

Presentations on Ensō

Slides based on two recent presentations about Ensō. The first presentation was recorded, so I may be able to put it online sometime in the future.
  • Microsoft Research in Summer 2012. This was the first public talk about Ensō. Previous talks have been mostly demos.
  • Viewpoints Research in Fall 2011. These slides were used for a small group discussion.

Object Grammars: Compositional & Bidirectional Mapping Between Text and Graphs (Ensō Papers 2 of 6)

Ok, we have completed the second Ensō paper:

Object Grammars: Compositional & Bidirectional Mapping Between Text and Graphs (Ensō Papers 2 of 6)
Tijs van der Storm, William R. Cook, Alex Loh

Comments welcome!

Managed Data: Modular Strategies for Data Abstraction (Ensō Papers 1 of 6)

We have finally released the first Ensō paper:

Managed Data: Modular Strategies for Data Abstraction (Ensō Papers 1 of 6)
Alex Loh, Tijs van der Storm, and William R. Cook

This is just the first of 6 or so papers we are working on. Next up will be grammars, then interpreters, GUIs (diagrammatic and text-oriented), web applications, and case studies. We are focusing the papers on functional features of the system, but there are other concepts that are woven throughout, including bootstrapping, self-description, security, optimization, analysis, composition and modularity, etc.

Ensō presentatons

I gave a demo of Ensō at the IFIP WG 2.16 Working Group on Language Design in London.

I’m giving a public demo at Lang.NEXT, a free workshop hosted by Microsoft.

Alex and Tijs are presenting our solution to the 2012 Language Workbench Challenge on March 27th in London.


Why I don’t consider Programs to be Models

I attended Zef Hemel’s PhD defense in Delft earlier this year. Congratulations, Zef! Zef defended his thesis on Methods and Techniques for the Design and Implementation of Domain-Specific Languages. It was my first experience with a traditional European style thesis defense, complete with robes, a large silver staff, and a formalized interrogation of exactly one hour. I had a great time. You can’t see me in his photo of the event, because I am directly behind Zef and Eelco. Unfortunately when it came time to interrogate Zef, my first question was “What is a model”? It seemed like fair game, given that the word appears in the thesis title. On second thought, I realize that anyone would struggle to define the word.

Zef also has an active blog, I am dr. Zef. I seem to remember that it used to be called “I am Zef”, but I’m not sure. One of his posts is Programs are Models. This idea is consistent with many people in the model-driven community. The mantra is that “everything is a model”, but we have high-level models which describe the system clearly and low-level models (aka code) that can be executed. You get from high-level models to low-level code models by applying a transformation. Transformations are also models. Since everything is a model (including Java source code) it all works great: everything is a model, including transformations between models.

Zef points out that high to low level transformations have typically been called “compilers”. He also argues that internal/embedded DSLs are favored in industry and are a better way to go. One small point is that embedded DSLs are quite popular in academia too, especially for those working with Haskell.

The debate about the merits of internal versus external DSLs is far from over. They both have strong advantages and significant disadvantages. For example, internal DSLs tend to have very poor error messages and debugging abstractions. They also can be difficult to analyze, because they are mixed with general-purposed code. External DSLs require a lot more tooling, as he points out. But the debate is far from other.

My main point here, however, is that I prefer to not think of programming languages as modeling languages. The reason is that, for me, a modeling language must be about what behavior is desired, not how to implement that behavior. This is the difference between a regular expression that concisely describes a pattern and code that implements the steps to recognize the pattern. As I have said before, models are descriptions written in an executable specification language. Programming languages do not operate at the level of “specification” so they cannot be modeling languages.

One consequence of this decision is that a model-to-programming-language transformation is not really a model-to-model transformation, because programs are not models. I believe that transformations between high-level modeling languages are a fine idea, but using transformations to generate code is a bad idea. The Enso team is investigating a view of model-driven development that is completely based on interpretation, with no explicit code generation at all. This is a good discussion of some of the issues, but I don’t think it touched on some of the more fundamental questions. For example, my working hypothesis is that it is easier to compose, modify, and extend interpreters than compose, modify, and extend compilers, when combining multiple languages together.

Enso is built on the following principles and strategies:

  • External DSLs, not internal/embedded
  • Transformations are essential, but not for generating code
    • Grammars are models that define bi-directional transformations between models and text
    • GUIs are models that define bi-directional transformations between models and presentations
  • Interpretation, not code generation
    • Interpreters are written using code. Code is good!
    • The interpreter language must be able to access/modify models easily, as if they were the native data of the interpretation language
  • It is never the case that “Everything is an X”

In conclusion, I have to say that I agree with Zef that the most interesting work on modeling is being done in projects like Ruby on Rails, Play, JQuery, etc. Note that even these systems use a blend of internal and external DSLs. One other thing in common is that most of them don’t spend a lot of time generating code, but interpret models directly. Industry people are making great progress using the tools that are at hand (especially dynamic languages), but that doesn’t mean there can’t be a better way to do things.


Viewpoints Research Trip

I visited Viewpoints Research Institute last week and talked with Alan Kay and his team. I presented an overview of Enso, including the concept of managed data, schemas, web applications, security, diagrams and stencils. One point of confusion was my frequent reference to “data”. The VPRI people do not talk about data much. My impression is that they take a more pure object-oriented viewpoint and only think about behavior. I teach exactly this concept in my classes, that one way to understand data is via its behavior, and that this is a very flexible way of thinking about computation, including data.

This led to a fairly long discussion about browser/internet architecture and mobile code. My understanding is that Alan prefers a model in which browsers are simply mini operating systems that download code and run it within a “page” with constrained resources. This would allow any kind of page to be rendered, and would in some ways be more open and heterogeneous than our current model. On the other hand, we would have to agree on a code representation (byte code) and also on APIs. I pointed out that this would make it more difficult for blind people to work with pages than the current model. They countered that HTML is not perfect either, so blind people cannot always access the pages properly.  Alan believes that content must come with its interpretation (the behavior/code) or else it will not be able to run in the future. I think the opposite: code is fragile while declarative languages like HTML are more able to be interpreted in the future. It depends in part whether you want to retain the pixel-perfect creators interpretation, or whether you want to be able to re-interpret the information on some some future platform. Alan favors the former while I favor the latter.

There is also the question of why mobile code hasn’t been widely successful to date. As Alan points out, the technology to make it work exists. I think that mobile code is not so much a technical problem as a social and economic problem. So far nobody has figured out how to get it accepted and make it work broadly across many platforms with wide adoption. He pointed to the example of postscript, which allows very flexible printing. But I countered that postscript is not good for many things, which is why PDF was invented. Postscript is mobile code, while PDF is more like HTML. In the end they many of the viewpoints people said that the mobile code issue is not central to their approach. But Alan was out of the room at this point.

The Viewpoints people demoed Nile, their POL (problem oriented language) for 2D graphics rendering and other kinds of media processing. It is very nice. I think it resembles Matlab in some ways. There are multiple dimensions of concurrency built in, and the runtime is able to adjust the granularity of concurrency to the hardware.

They also demoed their office application, for making integrated documents/presentations. I appreciated and agree with their idea of eliminating unnecessary distinctions in the application at this level. They showed a 2D layout model based on a form of iterated property evaluation, with some form of change propagation. I was not exactly clear if this was most like constraint programming, reactive programming, or attribute grammars. However, my overall impression is that they are not following their “problem oriented language” philosophy as strongly at this level of their design. The system seemed to be mostly object-oriented, but they are experimenting with mechanisms for global constrain/property evaluation. The work is at a fairly low level, as they are currently tackling algorithmic issues like layout and animation. It was at this level that the behavioral approach became strongest, and any notion of “data” vanished. We had some good discussion, but I am not sure I got a clear picture of the overall system. My suggestion was to focus on extracting “Frank” from the Smalltalk system, and also think about how to extract “Frank” documents so that they can be sent and interpreted on non-Frank machine.

It is very interesting that the issue of “behavior” versus “data” permeated much of our conversation. While Enso uses OO programs to define the interpretation of data, I do not think that data should be stored “as objects with methods”. I believe that information must have a representation other than code. It must have meta-data that describes its structure and constraints (and this meta-data is also ordinary data, not code!). I believe that “problem oriented languages” are well-suited to this task. Examples include HTML, PDF, SVG, PNG, CSV, MathML, etc. Much of the work of the last 15 years has been creating representations that can be shared. It may be that all this work is misguided, but if anyone wants to present an alternative it has to solve the whole problem of data representation, interpretation, transmission, and re-interpretation, not just part of the problem.

I left the office stimulated to try to come up with an “Enso Office” that would support Word, Excel, and PowerPoint style documents in one unified model. It is not an easy thing to do! But I’m going to try. One thing that the Enso version will have is a “document format” that represents documents as data, not as code. One thing that will be tricky is that this format will include both data and meta-data.


Channel 9 video

I talked with Channel 9 about Enso, Batches, partial evaluation, Orc, while I was at SPLASH.


Domain-Specific Languages

Freddy Mallet asks what is the difference between an Executable Specification Language and a Domain-Specific Language. They are clearly closely related, but I think there are some differences that are significant.

The main problem, from a purely technical viewpoint, is that being “domain specific” is, by itself, not sufficient to ensure that a language is defined at the level of specifications (what), rather than implementations (how). For example, we might say assembly language is domain-specific. Matlab is a domain-specific language, but it is essentially a programming language. One might even argue that all languages are domain-specific. This includes C, FORTRAN, COBOL, Lisp, PHP, Perl, Bash, and even Java. It is difficult to think of them without immediately thinking of their domain of application. I have often felt that many languages created by programming language researchers, including ML and Haskell, tend to be best at writing compilers, interpreters, and type systems… because these are the kinds of programs that programming language researchers are interested in writing. The danger is when you believe that your own narrow application domain is representative of the kind of programs that everyone wants to write.

There is also a lot of uncertainty about what a “domain” is. To some people domains are defined by a kind of product/service, e.g. insurance, real estate, banking, heavy manufacturing, retail, transportation, hospitals, and government. These are sometimes called verticals, or vertical markets. Horizontal domains are functional areas that cut across industries, including human resources, logistics, legal, etc. In programming we distinguish between problem domains and solution domains. Most, but not all, domain-specific languages are tied to a particular horizontal problem domain, often linked to a widely-used but specialized solution or specific kind of problem.

Finally, there can be domain-specific languages that are not directly executable. One might say that UML fits into that category. Or specialized logics. That last issue is annoying, because it is usually the limitation to a specific domain that enables a specification language to be executable. This means that being domain-specific is often necessary, but alone is not sufficient, to ensure that a language is an executable specification language.

Thus I find that the characterization “domain specific” is not a precise term for the kind of language that I am focusing on.

On the other hand, the phrase “Executable Specification Language” combines two terms that tend to be at odds with each other. When we say that language X has the property that X is executable (you can run it!), and that X is a specification language (more at the level of “what” than “how”… that is, a “what-oriented” language), then you end up with a very precise description of a set of interesting languages. If you compare the sets of languages defined in this way:

L1 = what-oriented  and  runnable

L2 = domain-specific

You will find that L1 is a subset of L2. But I believe that L1 contains exactly the languages I’m interested in, while L2 includes many languages that are not interesting to me. That is one reason why I used the unconventional phrase “Executable Specification Language”. The other reason is that I wanted to make an analogy with verification and lightweight verification, to cast domain-specific languages into a new light.

But in practice, current common usage for the phrase “Domain-Specific Language” seems to be fairly close to what I have in mind when I wrote “Executable Specification Language”. To add to the confusion, there is also a common usage of the term Executable Specification in the context of automated testing and agile development. In this context, the executable specification is really a test case that can be run to check if a system meets its specification. The key point is that these are specifications, not languages.

In conclusion, I apologize for introducing yet another term. However, it is sometimes useful to think about well-known ideas in new ways.

Executable Specification Languages

We believe that a new software development paradigm is struggling to be born. It appears in many guises and contexts, under many names, and is as yet still unformed and incomplete. We are not surprised at the struggles, for we do not believe that paradigms arise from a single blinding flash of insight, but rather they come about by careful assembly of many good ideas into a package that makes sense and works, as a whole, in some profound way. We believe we see initial attempts at creating such an assemblage. On the other hand, we suspect that some of birth pains are self-inflicted, because of confusion about the goal and narrow-mindedness in approaches to achieving this goal.

To us, the essence of this new paradigm is a shift toward descriptions of what software should do and away from how it should do it, while still providing enough operational information so that the desired behavior can be generated automatically.

A description of “what a system should do” is normally called a specification. Thus it would be natural to assume we are talking about general-purpose specification languages, for example CASL or B/Z, as the foundation for this new paradigm. But this is not the case, because of the requirement that desired behaviors can be generated automatically. By this, we mean that the specifications must be executable, so that the generated behaviors are sufficiently efficient and reliable to be usable in practice. There is as yet no general way to convert from a specification in a general-purpose specification language to an executable program. Note that this kind of synthesis is an important research area (my student Srinivas Nedunuri working with Doug Smith on it, but this is another topic), but it is not not yet ready for widespread adoption.

One way to solve this problem in the short term is to limit the expressiveness of the languages to a particular kind of problem. We believe that executable special-purpose specification languages will be the foundation of the next widely-used programming paradigm. Here is one way to see how this approach fits into the landscape of computer science research:

There has been great success exploring lightweight verification techniques. The key point is that these approaches narrow the expressiveness of the properties that can be specified, then apply powerful automation technology to verify the limited theorems. We believe that equal progress can be made by narrowing the expressiveness of the specification languages used for synthesis, and applying powerful optimization techniques to the problem.

There are already many successful examples of this approach, including grammars (Yacc), databases (SQL), spreadsheets (Excel), security (XACML), state machines (StateCharts), network protocols (NDLog, Frenentic),  user interfaces (XUL), and many more. These systems all allow users to specify what a computation should do, without saying exactly how to do it. Most of them have deep theory and optimization techniques that make them expressive and efficient. But they are also specialized to a particular task. There are many more examples in practical use, or experimental design.

There is a subtle point in our initial statement. We emphasize that behavior is generated automatically, but this does not necessarily mean that code is generated. The behavior may be generated by an interpreter. The examples mentioned above span the spectrum between interpretation and compilation.

The key question, for the development of a new paradigm, is whether this approach can be generalized so that it can be used to build complete (or nearly complete) systems. Currently specialized specification languages are used as small components of systems, but the majority of the system is created using general-purpose code. The question is whether this emphasis can be inverted, so that most of a system is build with specialized languages, with general-purpose used for the underlying language engines, or as application-specific code plugged in where necessary. The success of this approach will depend in part on the amount of regularity in the systems we want to build.

Given the goal, to move from how towards what while still supporting executability, it is not surprising that there is a lot of experimentation necessary. There are many problems that must be solved before this vision is realized. In this blog we will elaborate on these themes and also discuss our specific work on realizing this vision in Ensō.

We apologize for continuing to blog at the level of polemic, rather than giving more examples of code in Ensō. The reason is that blogs are well-suited to polemic, and our technical results are going into academic papers. When we have the papers ready (and the first one should be done soon) we will post a summary here. We were giving demos at SPLASH, so if you want to know more you should come next year.

Get plugin http://www.fastemailsender.com