Demystifying DSLs

Introduction

To design software effectively engineers need to understand and consider many possible techniques. After all, if all you have is a hammer, everything looks like a nail. Realistically the field is so diverse that there’s no way that you can expect any individual to be familiar with everything, so engineers are always using a limited toolbox. Unfortunately DSLs are thought of by many as an academic pursuit, and so most engineers never learn much about them. While this post doesn’t delve deeply into implementation details, it is aimed at a technical audience, and is written in the hope that more people will add DSLs to their technique toolbox.

 

What is a DSL?

DSL stands for domain-specific language. That is, a computer language designed to meet the needs of a narrow domain. Because DSLs are designed for a narrow domain they’re typically able to meet flexibility requirements while being declarative and extremely high-level. Because of this, DSLs can make it easy to express concepts that would be difficult in a general-purpose programming language. Below are some examples of DSL use, which would all be far more complex to create with a general-purpose programming language. It is not important that you understand these examples specifically, they’re just here to give you some idea of what DSLs can look like and what they can do.

 

DSL Usage Examples

Spreadsheet Formulas

Spreadsheet formulas are likely the most well-known DSL, especially outside of software engineering. Spreadsheet formulas can be used to calculate a value for a cell.

=SUM(A1:A4)

This cell should be the sum of the cells A1 to A4.

 

=IF(A1 < 10, “”, “Invalid”)

A basic validation check: this cell should be empty if A1 is less than 10, otherwise it should be “Invalid”.

 

Regular Expressions

Regular expressions express the structure of strings.

\(\d{2}\) \d{4}\d{4}

This example represents the pattern of an Australian home phone number.

 

SQL

SQL is a widely-used language for interacting with relational databases.

SELECT city, AVG(rain)
FROM
Weather
GROUP BY
city;

Given a `Weather` table with suitable data, this example calculates average rain amount for each city in the table.

 

Unix Tools

The Unix design philosophy of making small tools that do a narrow task well resulted in many domain-specific languages. At the time most of these were invented they were referred to as ‘little languages’.

awk 'NR % 2 == 0' data

Show even-numbered lines.

 

sed '/pattern/d' file.txt

Delete lines matching “pattern”.

 

find ./ -name '*.png' -or -name '*.jpg'

Find .png and .jpg files in working directory.

 

Why Use a DSL?

DSLs are designed to express requirements in such a way that they’re separated from implementation details. They’re higher-level than even the most abstract general-purpose programming languages because their design does not carry the burden of being applicable to many different types of problems. As shown in the examples above, it is easier to solve a problem using a DSL because the language model fits the problem space and you only need to specify the requirement in the DSL’s formal format.

Code written in a high-level, declarative language such as a DSL is a powerful asset that maintains value over time. This is because the code represents only the high-level logic, the business rules, distilled into a clear and unambiguous form that isn’t polluted by implementation details. It is important to preserve this logic even though the implementation details may be fleeting, which is why it is so useful that a DSL separates those two concerns.

For example, as part of a research project in collaboration with an industry partner I proposed and developed a DSL to specify the requirements for under which scenarios various pieces of information are displayed in their real-time enterprise system. By creating a suitable DSL, and encoding the business rules formally into code for that DSL, there is now clear and unambiguous documentation of each rule. This means that as technologies and implementation details change, the DSL compiler can be modified to keep up while the business rules remain untouched.

Figure 1: Requirements flow – Separating requirements from implementation details means that requirements can be verified by domain experts and remain persistant over time regardless of technology changes.

 

Another important benefit of encoding business rules in a DSL is that domain experts may be able to verify them. Unfortunately that is a big “may”; domain experts vary in technical ability, and DSLs in readability. In this particular project we had access to some domain experts with a mathematical background who were able to sanity check the business rules encoded in the DSL. Verifying that requirements have been understood and expressed correctly is much more efficient in eliminating the confusion of requirements gathering than only testing the end system.

 

How Does a DSL Work?

Like with many software engineering problems there are many different approaches to implementing a DSL. A DSL is merely a programming language with different design goals, so they both share similar implementation strategies.

Figure 2: High-level model of a compiler – The boxes represent artefacts while the arrows represent the transformation steps. Firstly a parser transforms input text into a model. Then some processing can optionally be done on the model such as semantic error checking or optimisation. Finally an emitter transforms the model into an executable.

 

As in Figure 2, to create a DSL you need to implement the following:

  1. Parser: Maps the input text to a model. Here two broad categories of DSLs come into play, external DSLs and internal (or embedded) DSLs. In the case of an external DSL this means writing a parser to transform your input into an abstract syntax tree. An internal DSL leverages an existing general-purpose programming language to host the DSL, so the mechanism to capture the model from the text may be simpler. For example a basic approach could be to write some functions to create objects, and the user of the DSL writes code by combining calls of those functions. Executing the DSL definition and DSL code in the host language will result in a model being generated. When making an internal DSL you’re constrained by the host language, so you have limited control over the DSL’s syntax. Metaprogramming techniques are often used to get around this to some degree.
  2. Semantic error checks and/or optimiser: Once you have a model, prior to using it you can optionally perform some processing on it. This might include semantic error checks or optimisation for example.
  3. Emitter: Now you need to execute the model. The diagram shows an emitter that generates an executable, which is a feature of a compiler, but it is equally valid to directly execute the model using an interpreter. An interpreter is often easier to write, and that allows you to integrate the interpretation of the model with the rest of the application.

 

Conclusion

Hopefully that clears up the concept of DSLs, what they’re used for, and gives you a general idea of how they’re implemented. DSLs can be quite a powerful architectural tool to separate concerns. A future post in the series will go through the implementation of a basic DSL so you can get a more concrete idea of how they work. For now you at least have an awareness of DSLs for your technique toolbox, so you know if it’s something you should pursue for a project.

 

Cover image courtesy of Gus Ruballo.


Thanks to Shannon Pace, Tanya Frank, Simon Vajda and Antonio Giardina for proofreading and providing suggestions.