YS Mapping Key Uniqueness

I've said it before, but YS code is 100% valid YAML.

Here's a YS program:

!YS-v0
say: 'You say Goodbye'
say: 'I say Hello'

YAML says that mapping keys must be unique.

That program seems to be a YAML mapping with two keys that are the same.

How can this be valid YAML?

YAML's Implicit Tagging🔗

I'm pretty sure you know what YAML tags are by now.

They're the annotation syntax that looks like: !foo or !bar/baz. Words prefixed with a !.

In YAML, you can put a tag on any node in the document. Here's an example:

--- !tag1
!tag2 foo: !tag3 bar
!tag4 bar:
- !tag5 baz
- !tag6 [!tag7 aaa, !tag8 bbb]

In real world YAML, you almost never see tags. Even in YS, except for !YS-v0 and !, you almost never see tags.

Here's the other YS tags I've shown you so far:

!code
!data
!bare
!clj

I also showed you function calling tags like !:merge and !:reverse.

It seems like tags aren't used much in YAML or YS.

That's not the case!!

YAML has the concept of implicit tagging. When a YAML loader (the YS compiler is a YAML loader) loads a YAML stream, it adds a tag to every node that doesn't have one that was specified explicitly.

All YAML loaders work this way.

Explicit Tagging🔗

Consider this plain YAML:

foo: one
bar:
- 2
- 3.14159
- true

This is the same as this explicitly tagged YAML:

--- !!map
!!str 'foo': !!str 'one'
!!str 'bar': !!seq
- !!int '2'
- !!float '3.14159'
- !!bool 'true'

Let's test it by loading it with YS:

$ ys -J file.yaml
{"foo":"one", "bar":[2, 3.14159, true]}

You would never write YAML that way, but inside a YAML loader, that's what happens.

Let's try it with a different YAML loader.

My YAML core team colleague Tina External link wrote a YAML loader module for Perl called YAML::PP . When you install it, it also installs a CLI called yamlpp-load-dump.

Let's try it:

$ yamlpp-load-dump file.yaml
---
'foo': 'one'
'bar':
- 2
- 3.14159
- true

Bingo!

Why is the YS program valid YAML?🔗

Here it is again:

!YS-v0
say: 'You say Goodbye'
say: 'I say Hello'

In short, we've tagged the top level mapping with !YS-v0.

YAML's unique key constraint applies to nodes tagged with !!map which tells the loader to turn this node into "an unordered association of unique keys to values".

The YS compiler is loading this YAML into a YS AST which is something very different than a plain mapping data structure.

Let's look at something a bit more complex:

!YS-v0
mapping =::
  foo: bar

defn main(*args):
  say: "Here is a merged mapping:"
  say: merge(mapping hash-map(args*))

Let's run it like so:

$ ys -U program.ys aaa bbb ccc ddd
Here is a merged mapping:
{foo bar, ccc ddd, aaa bbb}

We have the same deal here, except there's no explicit tag on the node that appears to be a mapping with duplicate keys.

We need to see what implicit tags were assigned to the nodes, during the YS compilation (loading) process.

Luckily, that's easy to do with the ys -d or -D flags.

$ ys -cU -Dbuild program.ys
*** build     *** 0.199973 ms

{:xmap
 [[{:Sym def} {:Sym mapping}]
  {:Map [{:Str "foo"} {:Str "bar"}]}
  [{:Sym defn} {:Sym main} nil {:Vec [{:Sym &} {:Sym args}]}]
  {:xmap
   [{:Sym say}
    {:Str "Here is a merged mapping:"}
    {:Sym say}
    {:Lst
     [{:Sym merge}
      {:Sym mapping}
      {:Lst [{:Sym hash-map} {:Sym args*}]}]}]}]}

(def mapping {"foo" "bar"})
(defn
 main
 [& args]
 (say "Here is a merged mapping:")
 (say (merge mapping (apply hash-map args))))
(apply main ARGS)

This shows us the data structure created by the build phase of the YS compiler stack, and then shows us the final Clojure code result. Use -d to see all 7 stages!

The important thing is that we defined an actual data mapping {foo: bar} and that got tagged with :Map which means !!map internally in YS. The mapping with the 2 say keys got tagged with :xmap which tells the compiler that this is a code AST node.

Duplicate Keys in Real World YAML🔗

Even though YAML doesn't allow loading to mappings with duplicate keys, not all YAML loaders will throw an error.

Often they will just use the last pair. Sometimes they use the first pair. Sometimes they will throw an error. Some of them let you configure the behavior.

I think that makes sense. It's practical to support different behaviors for duplicate keys in YAML.

Currently YS will use the last pair specified in the YAML (as long as it's in data mode or bare mode, of course).

This was a fairly deep post, but I hope I explained things well enough.

If you have any questions, please ask in the comments below!!

YS Mapping Key Uniqueness

YAML's Implicit Tagging🔗

Explicit Tagging🔗

Why is the YS program valid YAML?🔗

Duplicate Keys in Real World YAML🔗

Comments