The Kubernetes Effect
In my many years of creating Open Source software and talking about it at conferences, some of the most productive development times are often those leading up to the presentation.
In the last post, I mentioned that I was going to present a 90 minute YAMLScript tutorial at KubeCon (November 15th in Salt Lake City).
The conference was amazing and the YAMLScript tutorial was a huge success. I came away with the feeling that YAML and YAMLScript had found their community. KubeCon felt like YAMLCon!
But today's post is about the work leading up to the conference and the new data oriented features that were added to YAMLScript as a result.
At the start of October we realized there were a few things we wanted to add to the language to make it great for defining dynamic data in YAML files.
YAMLScript is a complete programming language and while you could already do almost anything with it, we knew that it had blend smoothly into the existing YAML files that people use for Kubernetes and other uses. We started by focusing on Helm charts, and seeing how well YAMLScript could fit along side the Go templating system (or replace it entirely). In the end it all worked out very well, but a few things needed to be added before tutorial time.
TL;DR for Helm Users: https://yamlscript.org/doc/helmys
In this post I want to cover some of the new (and old) features of YAMLScript that you can use to do cool things in your YAML files that are not possible with YAML alone. And remember, your YAML files are already valid YAMLScript files; but if you load them (or preprocess them) with YAMLScript, you can do a lot more in there.
What YAML Can Do Now
YAML 1.2 has a few things that let you do things a bit fancier than with a data only format like JSON:
- YAML has anchors and aliases for reusing specific nodes by naming them, and then later referring to them by name.
- YAML has a special
<<
"merge key" that you can use to merge mappings. This was actually removed from the 1.2 spec but many popular implementations still support it (albeit with inconsistent behavior between implementations). - YAML allows you to tag nodes and this can sometimes be used creatively, though not consistently across different YAML implementations.
- YAML supports multiple "documents" in a single file or stream, although unfortunately the spec doesn't allow you to alias nodes across documents.
YAMLScript supports all of these features too. It needs to since it claims to support the current YAML files of the world. But YAMLScript being a complete programming language lets you go so much further!
Let's take a closer look…
External (and Internal) Data Sources
Even though YAML lets you reuse nodes by name, those nodes need to be part of your YAML document. Say you have a section at the top of your YAML file that defines some default values and names them with anchors, to be aliased later throughout the file.
This is problematic because the node of defaults is also going to be a part of your data when you load it. It would be nice if you could have 2 documents in a YAML file where you define the data sources to be referred to in the first document, and then refer to them in the second document. If the loader returned the final (second) document then you could get the data you wanted without also getting the data that you don't.
Let's try it out:
# file.yaml
---
- &map1
name: Bobbi Datamon
- &list1
- fun
- games
- more: stuff
--- !yamlscript/v0:
person:
<<: *map1
likes: *list1
Now we can load it with the YAMLScript command line interpreter, ys
:
$ ys --load file.yaml
{"person":{"name":"Bobbi Datamon","likes":["fun","games"]}}
Looks like it worked, but ys --load
prints the result in compact JSON. Before we discuss what happened, let's show the result in YAML:
$ ys --load --yaml file.yaml
person:
name: Bobbi Datamon
likes:
- fun
- games
Nice! ys --load
gave us the data from the final document that included data from the first document.
More about loading YAMLScript
You can just use ys --yaml
(or even ys -Y
) instead of ys --load --yaml
. Use -J
(for --json
) to format --load
output as a prettier JSON.
YAMLScript's "load" operation defaults to JSON because YAMLScript's load operation is designed to output data in an interoperable form. JSON is very interoperable and the JSON data model is a subset of the YAML data model.
Instead of using ys
to load YAML/YAMLScript files, you can use a YAMLScript library to replace other YAML loaders in 10 (and counting) common programming languages, including Python, Go, Ruby, Rust, Java and JavaScript.
For example, in Python you could do this:
from yamlscript import YAMLScript
ys = yamlscript.YAMLScript()
text = open("db-config.yaml").read()
data = ys.load(text)
and similar in any other language that has a YAMLScript binding library.
The careful reader will have noticed that we broke the rules of YAML. We aliased nodes that were anchored in a different document. What's up?
Well, the two YAMLScript documents each have a different YAMLScript mode and that makes things work differently.
The first document has no !yamlscript/v0
tag and thus is in "bare mode". In bare mode all the rules are the same as YAML 1.2 (using the Core schema).
The second document has the !yamlscript/v0:
tag. The !yamlscript/v0
tag tells YAMLScript that the document is in "code mode" (the content starts as code but can switch to data mode at any time). The :
in !yamlscript/v0:
tells YAMLScript to switch to data mode right away.
In code mode and data mode, aliases are very different than in bare mode. They can access anchored nodes in the same document or any previous document. Not only that, you can access parts of the anchored node with a path syntax. For instance *map1.name
would produce the string "Bobbi Datamon" and *list1.1
would produce the string "games".
That means that this works the same way as the previous example:
--- &data
map1:
name: Bobbi Datamon
list1:
- fun
- games
more: stuff
--- !yamlscript/v0:
person:
<<:: -*data.map1
likes:: -*data.list1
Here we only anchored the entire first document and then used the path syntax to access the parts we wanted. Note that to do this we used ::
to switch to code mode and we also needed to used -
to escape the values and have them be treated as YAMLScript expressions.
YAMLScript has special +++
symbol that evaluates to an array of all the prior documents in the stream. That means we don't need anchors at all:
map1:
name: Bobbi Datamon
list1:
- fun
- games
more: stuff
--- !yamlscript/v0:
person:
<<:: +++.last().map1
likes:: +++.$.list1
The +++.last()
function returns the last document in the stream so far (the first document in this case) The +++.$
is a shorthand for +++.last()
.
Another approach to take here is to make the first document use YAMLScript in code mode and define variables to use in the second document:
--- !yamlscript/v0
map1 =::
name: Bobbi Datamon
list1 =::
- fun
- games
--- !yamlscript/v0:
person:
<<:: map1
likes:: list1
We use =:
for assignment expressions in YAMLScript. And =::
does the same thing but toggles the mode of the value.
What if list1
was a huge list and you really wanted to keep it in a separate file?
No problem:
# big-list.yaml
- fun
- games
# ...
Now we just change the first document to load the list from the file:
--- !yamlscript/v0
map1 =::
name: Bobbi Datamon
list1 =: load('big-list.yaml')
Not only can we access external data from a file, YAMLScript supports fetching data from the web with the curl
function and also getting data from databases!
Inline Code in Data Mode
We can do the same things in a single data mode document. The trick is that we need to have a way to evaluate code in a way that doesn't affect the data.
The ::
syntax is our new friend here.
This lets us do code things like define variables and even define new functions in a way that doesn't affect the data we are defining.
--- !yamlscript/v0:
::
defn flip(array):
reverse: array
map1 =::
name: Bobbi Datamon
list1 =: load("big-list.yaml")
person:
<<:: map1
likes:: list1:flip
We defined a new function called flip
which is a bit contrived since we could have called reverse
directly; but it proves the point.
We also defined our data variables. We can actually define variables without ::
:
--- !yamlscript/v0:
map1 =::
name: Bobbi Datamon
list1 =: load("big-list.yaml")
person:
<<:: map1
likes:: list1
In a big document, it's sometimes nice to define the data variables closer to where they are used.
--- !yamlscript/v0:
map1 =::
name: Bobbi Datamon
person:
<<:: map1
list1 =: load("big-list.yaml")
likes:: list1
In fact, since we only use map1
and list1
once, we could have just inlined them:
--- !yamlscript/v0:
person:
<<:
name: Bobbi Datamon
likes:: load("big-list.yaml")
How Do I merge
Thee?
Let me count the ways :)
We are still using the <<
merge key to merge mappings, but YAMLScript has a a standard merge
function (among 100s of others).
--- !yamlscript/v0:
map1 =::
name: Bobbi Datamon
list1 =: load("big-list.yaml")
person::
merge::
- ! map1
- likes:: list1
Note the !
in front of map1
. It toggles from data mode to code mode. We need to use !
for that purpose in data mode sequences. For mappings we can use key:: variable
but it is just a shorthand for key: ! variable
.
Another way to write that is:
person::
merge map1::
likes:: list1
Since the merge
key is already in code mode we can just put the map1
variable there.
Sometimes you want to use a function like merge
without needing to further indent the data you are applying the function to.
YAMLScript lets you put a function in a tag if you prefix it with a:
:
--- !yamlscript/v0:
map1 =::
name: Bobbi Datamon
list1 =: load("big-list.yaml")
person: !:merge
- ! map1
- likes:: list1
Conditional Insertions in Mappings and Sequences
A big missing feature (and one definitely needed for Helm charts) was the ability to conditionally insert key/value pairs into mappings depending on some value being true or false.
Functionally that's a bit weird for a data language. You can always apply any function to any data node to change its value, but how do you make it control whether or not it exists at all?
You really need to apply a function to its parent mapping to make that happen…
…or do you?
YAMLScript ended up solving this using the ::
syntax, with a special rule.
!yamlscript/v0:
foo: 111
::
when a > b::
bar: 222
baz: 333
The rule is that if the code under ::
evaluates to nil then ignore it. If it evaluates to a mapping then merge it into the parent mapping.
The
when
function returns nil if the condition is false, and the body evaluation value if it is true.
To best understand this we can simply compile this YAMLScript to Clojure code.
$ ys -U --compile file.yaml # or -c
(merge {"foo" 111} (merge (when (> a b) {"bar" 222}) {"baz" 333}))
You should know by now that every YAMLScript program is compiled to Clojure code and then evaluated. Well, data files that use YAMLScript are no different!
Given that merge
ignores nil values, this is exactly what we want.
As we worked through the standard Helm templates we found that while this worked just fine, it was a bit verbose. We "fixed" that by letting you put the condition test "inside" the ::
key:
!yamlscript/v0:
foo: 111
:when rand(100) > 50::
bar: 222
baz: 333
I've changed a > b
here to something you could actually run yourself. Note that before we never defined a
or b
, so that would have failed.
We can even get this into a single line by using YAML's flow style:
!yamlscript/v0:
foo: 111
:when rand(100) > 50:: {bar: 222}
baz: 333
Nice!
Now that you are up to speed, take a look at this page that shows how to completely convert a stock Helm chart to use YAMLScript instead of Go templating: https://yamlscript.org/doc/helmys
After the KubeCon we realized that this was also needed for sequences. You should be able to conditionally insert items into a sequence at any point.
All you need to do is use all of the above on a - ...
sequence entry (returning a sequence or nil):
!yamlscript/v0:
- aaa
- :when rand(100) > 50::
- foo
- bar
- zzz
Again let's compile this to Clojure code to see exactly what it does:
$ ys -c file.yaml
(concat ["aaa"] (concat (when (> (rand 100) 50) ["foo" "bar"]) ["zzz"]))
Similar to the mapping case, but we get concat
instead of merge
to do the right thing with sequences.
Now we run it a couple times:
$ ys -Y file.yaml
- aaa
- zzz
$ ys -Y file.yaml
- aaa
- foo
- bar
- zzz
and Voilà!
Conclusion
I hope this post gave you some good ideas about how cleanly you can extend your YAML data files with YAMLScript. And also how this is applicable today in major YAML consumers like Helm.
Please let us know where YAMLScript can be made even better.
That's our goal!