Dhall, gradual typing and AWS CloudFormation

We've been having a problem at work: the CloudFormation template deploying the necessary infrastructure grew into a giant sheet with many repetitive parts. To add a new resource (SNS topic) we have to edit three lists each time: the topic list, their permissions and triggers which fire the changes.

Ideally we'd want to list the topics in one place and generate the full template automatically. It's easy to write such a generator but it'd be nice to use a ready-built solution which we wouldn't have to onboard the current and new team members to specifically.

There are several CloudFormation generators, for example, in Scala and Python: CloudFormation Template Generator, troposphere. We've tried CloudFormation Template Generator, but it doesn't support cyclical references between resources, so we've stopped working on the problem for a while.

This time we've decided to try Dhall instead of a specialised generator. It is similar to JSON in syntax, can call functions, is strongly typed and can generate YAML and JSON (there are many other benefits not directly related to CF that I'm omitting here).

The advantage of Dhall over concrete generators is that despite fully static type checking, Dhall supports progressive typing: you don't have to explicitly mark the type of each value, it can be inferred automatically. This allows to skip defining all the possible types of resources and such beforehand. On the other hand, if a generator doesn't support a particular AWS resource type, the generator itself must be modified.

We've taken the existing CloudFormation template as the base and started to implement it in Dhall straight on. The first level only required cosmetic changes:

let mkParam = \(desc: Text) -> { `Type` = "String", Description = desc } in
{ AWSTemplateFormatVersion = "2010-09-09"
, Description = "ACME bucket"
, Parameters = { BucketName = mkParam "Name of the ACME bucket"
               , LogsBucketName = mkParam "Name of the ACME logs bucket"
               , DeploymentsBucketName = mkParam "Name of the ACME deployments bucket"

The mkParam function already saved a few lines, but the most interesting is ahead - the Resources object.

The resource list in the existing template looks like this:

    Type: AWS::S3::Bucket
    # ...
    Type: AWS::S3::Bucket
    # ...
    Type: AWS::SNS::Topic
      TopicName: ACMEFooCreated
    Type: AWS::SNS::Topic
      TopicName: ACMEBarCreated
    Type: AWS::SNS::Topic
      TopicName: ACMEBazCreated
    Type: AWS::SNS::Topic
      TopicName: ACMEQuuxCreated
  # And tens of similar topics

According to YAML (and Dhall's YAML converter) Resources is an object, that is, a structure type with fields of ACMEFooCreatedTopic, ACMEBarCreatedTopic and so on. This is no good if we want to generate many similar resources.

Fortunately, the YAML converter automatically converts arrays of objects with mapKey and mapValue keys into objects with keys that we need, for example:

[ { mapKey = "one"
  , mapValue = 1
, { mapKey = "two"
  , mapValue = 2

gets converted to:

two: 2
one: 1

But it turns out this isn't the main problem.

The end of gradual typing

So we have a lot of similar topics which we can represent as:

{ name: Text
, resourceName: Text

and put into the resource array using a trivial function converting this representation to an SNS resource.

And here the gradual typing stops and problems start:

In Dhall the function argument types must be specified explicitly.

Therefore, we must write:

let Topic = { name: Text
            , resourceName: Text
            } in
let topicResource = \(topic: Topic) -> { mapKey = topic.topicResourceName
                                       , mapValue = { `Type` = "AWS::SNS::Topic"
                                                    , Properties = { TopicName = topic.name }
                                       } in

But at least the Topic type isn't that large? We just need to define the topics and convert them to a resource list using map?

In Dhall polymorphic functions accept types parametrising them explicitly.

Therefore, map must always have four arguments. All of those must be specified explicitly.

let Resource = { mapKey: Text
               , mapValue: { `Type`: Text
                           , Properties: { TopicName: Text }
               } in
map Topic Resource topicResource topics

Okay, but at least the resource corresponding to topics isn't too complex?

Unfortunately there are things other than topics in the resource list, and all their possible properties have to be included in Resource, which in addition is going to become a sum type:

let Resource = < Topic: { `Type`: Text
                        , Properties: { TopicName: Text }
               | Bucket: { `Type`: Text} |
                         , DependsOn: List Text
                         , Properties: { AccessControl: Text
                                       , BucketName: Text
                                       , LoggingConfiguration: { DestinationBucketName: Text
                                                               , LogFilePrefix: Text
                                       , VersioningConfiguration: Optional { Status: Text }
                                       , LifecycleConfiguration: Optional { Rules: List LifecycleRule }
                                       -- ...
               > in
let topicResource = \(topic: Topic) -> Resource.Topic { ... }

Note that in real CloudFormation a !Ref SomeOtherResource, !Sub and so on can appear at any point. Because our template uses those, we had to replace Text with AWSValue, also declared as a sum, in all those places, and call AWSValue.Ref and AWSValue.Sub where needed (and AWSValue.Text everywhere else).

Here my attempt to translate the whole template to Dhall ended. Defining all the possible CloudFormation parameters that are used in our template and add new ones every time one is required is too costly.

Furthermore, two other problems not as critical:

All record fields are mandatory.

To simulate an optional field, it must be declared as, for example, Optional Text in the record constructor, and every concrete value must still have it defined, only the YAML converter will remove it (with the --omitNull option).

let Foo = { Always: Text
          , Sometimes: Optional Text } in
{ Always = "there"
, Sometimes = None Text } : Foo

(Since Optional is parametrised by the inner type, one must specify Text explicitly).

The official recommendation when there is a lot of optional fields is to define an "empty" record and update it:

let defaultFoo = { Always = ""
                 , Sometimes = None Text
                 } in
defaultFoo { Always = "There" }

let introduces only one variable, and several nested let are converted by dhall-format to a ladder JavaScript would be proud of.

Here is the start of the file describing our template after running it through dhall-format:

let map = https://prelude.dhall-lang.org/List/map

in  let Notification =

	in  let Export =

		in  let Resource =

			in  let exports =

				in  let wholeBucket =

					in  let LifecycleRule =

						in  let mkParam =
									λ(desc : Text)
								  → { `Type` = "String", Description = desc }

							in  { AWSTemplateFormatVersion =
                                -- ...

What to do next?

If you need to generate CloudFormation using Dhall, most likely you'll have to take Amazonka or something like it as a base and generate all the CloudFormation types automatically, specifying precise types wherever needed.

How to change Dhall to support gradual typing?

If we take as granted the fact that the type parameters have to be passed explicitly, maybe we can allow passing _ as the type or a part of it, and infer the missing type during the unification process, something like:

let Resource = { `Type`: Text
               , Properties: _ }

A more radical, but not as elegant a solution - introduce an explicit Dump type which any value can be converted to, but so that the values of type Dump support no operations on them. The only thing that can be done is transform them into YAML/JSON, the result of which would be the transformation of the original value. Then the CloudFormation functions will return Dump, which we'll be able to combine but not modify, and, for example, the list of resources would look like:

List { mapKey: Text
     , mapValue: Dump }

This approach doesn't guarantee the correctness of the values but doesn't require specifying the Resource type fully. However, the type can be refined, for example, by writing instead of Dump:

let BucketProperties = { BucketName: Text } //\\ Dump

getting a type with one guaranteed field and arbitrary other.