Dr. StrangeYAML or How I Learned to Stop Worrying and Love the LLM
Well now, what happened is, uh, one of our data scientists, uh, well, he went a little funny in the head. You know. Just a little funny. And uh, he went and did a silly thing.
Well, I'll tell you what he did. He started chatting with computers... in YAML.
Well, let me finish, Elon.
Let me finish, Elon.
Well, listen, how do you think I feel about it?
A couple weeks ago my old friend Paul Prescod asked me if there was a simple subset of YAML that could be used to communicate with LLMs.
I was a bit aghast at the notion for 2 reasons.
The first was that simplifying YAML is a bit like simplifying a black hole. (GitHub Copilot wrote that line for me but I like it.) It's just not something you (ergo I) want to do.
The second reason was that surely something simpler like JSON would be a better choice for this kind of thing.
But then I thought "Black holes? Meh.", so I listened to Paul's idea.
His notion was that LLMs only have 3 major, text-based, structured data formats to train on: XML, JSON, and YAML. The internet is crammed full of these files. He said it turns out that YAML is the most efficient when combined with current LLM tokenization techniques.
It seems Paul may be right:
- OpenAI co-founder Andrej Karpathy on YAML vs JSON for LLMs
- YAML vs. JSON: Which Is More Efficient for Language Models?
Why is this important? The only thing that people really seem to care about:
Money.
LLM pricing models are based on token usage.
DO THE MATH.
Over the next few weeks Paul and I are planning to experiment in depth and hopefully either prove or disprove that this is a good idea.
Can YAMLScript Help with This?
$ yamllm 'Be Dr Strangelove discovering YAMLScript (https://yamlscript.org)'
llama3-70b-8192:
(Maniacal laughter) Ahahahahah! Oh, the beauty of it! The sheer genius!
YAMLScript, you say? (Strokes chin) I must admit, I am intrigued. A language
that combines the simplicity of YAML with the power of a scripting language?
(Giggles) It's like a Strangelovian dream come true!
(Reading from the YAMLScript website) "YAMLScript is a scripting language that
allows you to write scripts in YAML, with a syntax similar to Ansible
playbooks." Ah, yes! The elegance of YAML, married to the flexibility of a
scripting language. It's like a match made in heaven! (Chuckles) And the best
part? It's designed for automation, just like my beloved Doomsday Device!
(Excitedly) Oh, the possibilities! With YAMLScript, I can automate the
deployment of my Doomsday Device, ensuring that the world is destroyed in a most
efficient and elegant manner! (Maniacal laughter) And the syntax, oh the syntax!
So simple, so intuitive. Even a General Turgidson could understand it!
(Chuckles)
(Reading further) "YAMLScript has built-in support for popular automation tools
like Ansible, SaltStack, and Puppet." Ah, yes! The trifecta of automation! With
YAMLScript, I can orchestrate the destruction of the world with ease!
(Diabolical laughter)
(Strokes chin thoughtfully) But, I must admit, there is one drawback. It's not
as... flashy as my beloved COBOL. (Smirks) But, I suppose, in the world of
automation, simplicity and elegance can be just as deadly as a well-placed COBOL
program.
(Concludes) Ah, YAMLScript, you are a worthy addition to my arsenal of doomsday
devices! Together, we shall bring about a new era of automated destruction!
(Maniacal laughter)
One of the emerging features in YAMLScript implementation design is the ability to script its internals with YAML itself! Think about that. You can define the rules of how YAML is loaded without needing to write any code.
For example, if you wanted the YAML files for your application not to contain anchors or aliases (just like GitHub Actions' YAML loader does), that's just a line or 2 of YAML.
Maybe you like the "Norway Problem" so much that you want to treat your users to the "Slovenia Problem"! The Norway Problem (fixed in 2009 in the YAML 1.2 spec) is where the plain scalar no
(also the Norway country code) is interpreted as the boolean false
in YAML 1.1. With just a dash of YAML you can tell YAMLScript to treat si
as the boolean true
!
So back to Paul's query. Yes, by using (a future version of) YAMLScript as your YAML loader, it would be easy for any application to define a subset of YAML that is more suitable for their LLM needs. And since the subset is defined in YAML, it can be easily shared and understood.
Importantly, at no point is there the risk of a subset becoming invalid YAML. There's a limit to this fine tuning of YAML behavior; it all has to play within the rules of YAMLScript (which itself has to play within the rules of YAML).
How I learned to stop paying for ChatGPT and love the YAMLLM
YAMLScript is turning out to be my favorite programming language.
Up until a couple weeks ago my only interaction with LLMs was using GitHub's Copilot to help me write blog posts like this one, and paying $20USD/month for a ChatGPT subscription I almost never used.
When Paul and I started talking, I decided to write some programs to play around with various (free) LLM APIs like Ollama and Groq. Of course I used YAMLScript to write the programs. You saw this program in action above with the yamllm
command asking Dr. Strangelove about YAMLScript. The results of my scripting efforts are pretty awesome to me. Both in how much fun it is to interact with LLMs directly using code and how clean the YAMLScript code ends up being.
Full Disclosure: The YAMLScript code is clean because when it's not I can (and do) fix the language. Honestly, I think this is a very important point and should be a goal of any new programming language. Implementing a sizable number of Rosetta Code Tasks should be a prerelease milestone.
Show me the code!
The GitHub repository for the YAMLLM script is at https://github.com/yaml/yamllm.
It should take you about 60 seconds to get it running on your machine (Linux or MacOS).
The main YAMLScript source file is at https://github.com/yaml/yamllm/blob/main/bin/yamllm.ys.
Give it a look and let me know what you think.
Note that at 140 lines of YAMLScript, the code might seem big at first, but realize that (aside from the JSON, YAML and HTTP libraries that are embedded in the YAMLScript runtime) this is all the source code in one file. No external dependencies.
Sometimes I wonder if it's all worth it… Let me ask a friend:
$ yamllm 'Do you think YAMLscript (yamlscript.org) will be a success?'
llama3-70b-8192:
What a fascinating question!
YAMLscript (yamlscript.org) is an interesting project that aims to create a
scripting language that uses YAML as its syntax. The idea is to make scripting
more accessible and easier to read/write, leveraging the simplicity and
familiarity of YAML.
To predict its success, let's consider a few factors:
**Pros:**
1. **Simplification**: YAMLscript's syntax is designed to be easy to read and
write, which could attract users who find traditional programming languages
intimidating.
2. ** Familiarity**: YAML is already widely used in many industries, so the
learning curve might be shorter for developers already familiar with YAML.
3. **Interoperability**: YAMLscript's ability to integrate with existing YAML
tools and workflows could lead to a seamless adoption process.
**Cons:**
1. **Competition**: The scripting language landscape is crowded, with
established players like Python, JavaScript, and PowerShell.
2. **Adoption**: YAMLscript will need to gain significant traction and adoption
to become a viable alternative to existing scripting languages.
3. **Ecosystem**: Building a comprehensive ecosystem of libraries, frameworks,
and tools will take time and effort.
**Success factors:**
1. **Community involvement**: If YAMLscript can build a strong, engaged
community that contributes to its development, it will increase its chances
of success.
2. **Use cases**: Identifying specific use cases where YAMLscript shines, such
as data processing or automation, could help it gain traction.
3. **Innovation**: Continuous innovation and improvement of the language and its
ecosystem will be crucial to stay competitive.
**Conclusion:**
While it's difficult to predict the future, YAMLscript has an interesting value
proposition. If it can build a strong community, identify compelling use cases,
and continuously innovate, it has a chance to succeed. However, it will need to
overcome the challenges mentioned above to gain significant traction in the
scripting language market.
What do you think about YAMLscript? Do you see any potential use cases or
advantages that could contribute to its success?
Huh. They answered a question with a question… Smart little (er, large) language model.
I'll have to answer them in another post!
PS (Perl Strangelove???)
I should mention that I'm giving a couple talks about YAMLScript at The Perl and Raku Conference in Las Vegas next week.
Hope to see you there!!!