vendredi 10 novembre 2017

Write a DSL in less than 20 lines of code of python.



Imagine a customer (Nick Fury) ask you to develop a system to handle the directories of different organizations and let them navigate through it. But, they want to be able to describe themselves the organizations (probably for security reasons).
What will be the possibilities to let final users do that?

Let's ask to google what are good configuration formats.

First result leads you to INI files

Which is known as a good and simple format, easily readable and understandable by humans. OK, let's try this one!
Write the ini file for a member of the organization:
File bruce_wayne.ini
[user]
lastname=Wayne
firstname=Bruce
organization=DCC

Okay, looks readable. Let's write the second one.
File dick_grayson.ini
[user]
lastname=Grayson
firstname=Dick
organization=DCC
managed_by=bruce_wayne

Hmmm, the managed_by which declare relation with an external file sounds like a warning, how to ensure that this reference will exists in the system? Moreover it will soon become a nightmare when facing to non latin name (and if a person had 2 managers, ini will die!).

Okay, maybe INI files won't be the best solution, what else could we do?

Next result in Google is XML. 

Data seems hierarchical, with relations, we could use one xml file to store everything, let's try :
<organizations>
 <organization>
  <id>1</id>
  <name>DCC</name>
 </organization>
 <organization>
  <id>2</id>
  <name>MCU</name>
 </organization>
</organizations>
<user>
 <id>1</id>
 <lastname>Wayne</lastname>
 <firstname>Bruce</firstname>
 <organization_rel>1</organization_rel>
</user>
<user>
 <id>2</id>
 <lastname>Grayson</lastname>
 <firstname>Dick</firstname>
 <organization_rel>1</organization_rel>
 <managed_by>1</managed_by>
</user>
... arf, already tired to write those <tags> 
 and have to manage these internal stuff like ids.

So, letting users write XML is not a solution neither
#ProTip: XML is good for machine to machine communication, written by a machine and read by a machine, definitively not a human friendly language.

What else could we do?

Finally, a solution appeared, it was to use a Domain Specific Language (DSL)

We could imagine a description language easier to write and read by final users.
After some discussion with the customer, you have agreed on this proposal:
robin=User(firstname="Dick", lastname="Grayson")
batman=User(firstname="Bruce", lastname="Wayne", subordinates=[robin])
Organization(name="DCC", employees=[robin, batman])

ironman=User(firstname="Tony", lastname="Stark")
warmachine=User(firstname="James", lastname="Rhodes")
pepper=User(firstname="Pepper", lastname="Pots", subordinates=[ironman, war])
Organization(name="MCU", employees=[ironman, war, pepper])

Pretty straightforward, isn't it?

Now, let's do this with some python code:

First, to handle our organization we'll need some class :
class Organization:
    def __init__(self, name=None, employees=None):
        self.name = name
        self.employees = employees or []

class User:
    def __init__(self, firstname=None, lastname=None, subordinates=None):
        self.firstname = firstname
        self.lastname = lastname
        self.subordinates = subordinates or []

Of course, we miss the real stuff to do with those items, but for our example, it will be enough.
Then, how to load/read/interpret the file and transform this into python class? Here is what I propose:
class OrganizationConfigurator(object):
    def __init__(self):
        self.__symbols = {
             "User" : self._create_user,
             "Organization" : self._create_organization}
        self.organizations = []
        self.users = []

    def __read_file(self, filename):
        with open(filename, "r") as the_file:
            return the_file.read()

    def read_configuration_from_file(self, filename):
        exec(
            compile(self.__read_file(filename), filename, "exec"), 
            self.__symbols
        )

    def _create_user(self, **kwargs):
        new_user = User(**kwargs)
        self.users.append(new_user)
        return new_user

    def _create_organization(self, **kwargs):
        new_organization=Organization(**kwargs)
        self.organizations.append(new_organization)
        return new_organization


And that's all! You can count the lines to create the DSL: 10 lines for the core (open, read and interpret DSL) and 8 lines for the stuff related to user/organizations creation, can you do it in less?

So, let's try to explain a little bit what is done here.
First, the __init__ part :
class OrganizationConfigurator(object):
    def __init__(self):
        self.__symbols = {
             "User" : self._create_user,
             "Organization" : self._create_organization}
        self.organizations = []
        self.users = []
We populate a list of symbols that will be used to evaluate the configuration. Here, we declare the keywords "User" and "Organization" that will become available in DSL.

Then, the main method is read_configuration_from_file


def read_configuration_from_file(self, filename):
        exec(
            compile(self.__read_file(filename), filename, "exec"), 
            self.__symbols
        )

This function use 2 built-ins from python, compile and exec.
Compile will transform the content of the file into python "bytecode" (well, it's an AST object, if you want to dig into those wonderful world, go there). Then, exec will (beware of the surprise...) execute the code!
By giving the symbols dictionary as a parameter to exec, we populate the DSL world with our keywords, this way, User and Organization are known and callable objects.

Now, to launch this parser use
configurator = OrganizationConfigurator()
configurator.read_configuration_from_file("name_of_the_file_containing_the_configuration")

And you'll find in the members organizations the list of organizations, and in users, the list of users.

SECURITY WARNING:
Of course, if you don't trust your customer (or the config writers) you should restrict the world by setting manually under the __builtins__ key a dictionnary  where you will remove some dangerous keywords (as exec, compile...), and remove/override some other dangerous methods (import for example).

Typical arguments to avoid to write automated tests (and their counter arguments :) )

As a test aficionado, I often have to deal with some people who are refractory to tests, with many stereotypes and preconceptions. Here is...