Departure from previous API

With version 0.15.0 ruamel.yaml starts to depart from the previous (PyYAML) way of loading and dumping. During a transition period the original load() and dump() in its various formats will still be supported, but this is not guaranteed to be so with the transition to 1.0.

At the latest with 1.0, but possible earlier transition error and warning messages will be issued, so any packages depending on ruamel.yaml should pin the version with which they are testing.

Up to 0.15.0, the loaders (load(), safe_load(), round_trip_load(), load_all, etc.) took, apart from the input stream, a version argument to allow downgrading to YAML 1.1, sometimes needed for documents without directive. When round-tripping, there was an option to preserve quotes.

Up to 0.15.0, the dumpers (dump(), safe_dump, round_trip_dump(), dump_all(), etc.) had a plethora of arguments, some inhereted from PyYAML, some added in ruamel.yaml. The only required argument is the data to be dumped. If the stream argument is not provided to the dumper, then a string representation is build up in memory and returned to the caller.

Starting with 0.15.0 load() and dump() are methods on a YAML instance and only take the stream, resp. the data and stram argument. All other parameters are set on the instance of YAML before calling load() or dump()

Before 0.15.0:

from pathlib import Path
from ruamel import yaml

data = yaml.safe_load("abc: 1")
out = Path('/tmp/out.yaml')
with'w') as fp:
    yaml.safe_dump(data, fp, default_flow_style=False)


from pathlib import Path
from ruamel.yaml import YAML

yaml = YAML(typ='safe')
yaml.default_flow_style = False
data = yaml.load("abc: 1")
out = Path('/tmp/out.yaml')
yaml.dump(data, out)

If you previously used a keyword argument explicit_start=True you now do yaml.explicit_start = True before calling dump(). The Loader and Dumper keyword arguments are not supported that way. You can provide the typ keyword to rt (default), safe, unsafe or base (for round-trip load/dump, safe_load/dump, load/dump resp. using the BaseLoader / BaseDumper. More fine-control is possible by setting the attributes .Parser, .Constructor, .Emitter, etc., to the class of the type to create for that stage (typically a subclass of an existing class implementing that).

The default loader (t is a direct derivative of the safe loader, without the methods to construct arbitrary Python objects that make the unsafe loader unsafe, but with the changes needed for round-trip preservation of comments, etc.. For trusted Python classes a constructor can of course be added to the round-trip or safe-loader, but this has to be done explicitly (add_constructor).

All data is dumped (not just for round-trip-mode) with .allow_unicode = True

You can of course have multiple YAML instances active at the same time, with different load and/or dump behaviour.

Initially only the typical operations are supported, but in principle all functionality of the old interface will be available via YAML instances (if you are using something that isn’t let me know).


Duplicate keys

In JSON mapping keys should be unique, in YAML they must be unique. PyYAML never enforced this although the YAML 1.1 specification already required this.

In the new API (starting 0.15.1) duplicate keys in mappings are no longer allowed by default. To allow duplicate keys in mappings:

yaml = ruamel.yaml.YAML()
yaml.allow_duplicate_keys = True

In the old API this is a warning starting with 0.15.2 and an error in 0.16.0.



On your YAML() instance you can set attributes e.g with:

yaml = YAML(typ='safe', pure=True)
yaml.allow_unicode = False

available attributes include:

Defaults to True if Python’s Unicode size is larger than 2 bytes. Set to False to enforce output of the form \U0001f601 (ignored if allow_unicode is False)

Transparent usage of new and old API

If you have multiple packages depending on ruamel.yaml, or install your utility together with other packages not under your control, then fixing your install_requires might not be so easy.

Depending on your usage you might be able to “version” your usage to be compatible with both the old and the new. The following are some examples all assuming from ruamel import yaml somewhere at the top of your file and some istream and ostream apropriately opened for reading resp. writing.

Loading and dumping using the SafeLoader:

if ruamel.yaml.version_info < (0, 15):
    data = yaml.safe_load(istream)
    yaml.safe_dump(data, ostream)
    yml = ruamel.yaml.YAML(typ='safe', pure=True)  # 'safe' load and dump
    data = yml.load(istream)
    yml.dump(data, ostream)

Loading with the CSafeLoader, dumping with RoundTripLoader. You need two YAML instances, but each of them can be re-used:

if ruamel.yaml.version_info < (0, 15):
    data = yaml.load(istream, Loader=yaml.CSafeLoader)
    yaml.round_trip_dump(data, ostream, width=1000, explicit_start=True)
    yml = ruamel.yaml.YAML(typ='safe')
    data = yml.load(istream)
    ymlo = ruamel.yaml.YAML()   # or yaml.YAML(typ='rt')
    ymlo.width = 1000
    ymlo.explicit_start = True
    ymlo.dump(data, ostream)

Loading and dumping from pathlib.Path instances using the round-trip-loader:

# in
if ruamel.yaml.version_info < (0, 15):
    class MyYAML(yaml.YAML):
        def __init__(self):
            self.preserve_quotes = True
            self.indent = 4
            self.block_seq_indent = 2
# in your code
    from myyaml import MyYAML
except (ModuleNotFoundError, ImportError):
    if ruamel.yaml.version_info >= (0, 15):

# some pathlib.Path
from pathlib import Path
inf = Path('/tmp/in.yaml')
outf = Path('/tmp/out.yaml')

if ruamel.yaml.version_info < (0, 15):
    with as ifp:
         data = yaml.round_trip_load(ifp, preserve_quotes=True)
    with'w') as ofp:
         yaml.round_trip_dump(data, ofp, indent=4, block_seq_indent=2)
    yml = MyYAML()
    # no need for with statement when using pathlib.Path instances
    data = yml.load(inf)
    yml.dump(data, outf)

Reason for API change

ruamel.yaml inherited the way of doing things from PyYAML. In particular when calling the function load() or dump() a temporary instances of Loader() resp. Dumper() were created that were discarded on termination of the function.

This way of doing things leads to several problems:

  • it is virtually impossible to return information to the caller apart from the constructed data structure. E.g. if you would get a YAML document version number from a directive, there is no way to let the caller know apart from handing back special data structures. The same problem exists when trying to do on the fly analysis of a document for indentation width.

  • these instances were composites of the various load/dump steps and if you wanted to enhance one of the steps, you needed e.g. subclass the emitter and make a new composite (dumper) as well, providing all of the parameters (i.e. copy paste)

    Alternatives, like making a class that returned a Dumper when called and sets attributes before doing so, is cumbersome for day-to-day use.

  • many routines (like add_representer()) have a direct global impact on all of the following calls to dump() and those are difficult if not impossible to turn back. This forces the need to subclass Loaders and Dumpers, a long time problem in PyYAML as some attributes were not deep_copied although a bug-report (and fix) had been available a long time.

  • If you want to set an attribute, e.g. to control whether literal block style scalars are allowed to have trailing spaces on a line instead of being dumped as double quoted scalars, you have to change the dump() family of routines, all of the Dumpers() as well as the actual functionality change in emitter.Emitter(). The functionality change takes changing 4 (four!) lines in one file, and being able to enable that another 50+ line changes (non-contiguous) in 3 more files resulting in diff that is far over 200 lines long.

  • replacing libyaml with something that doesn’t both support 0o52 and 052 for the integer 42 (instead of 52 as per YAML 1.2) is difficult

With ruamel.yaml>=0.15.0 the various steps “know” about the YAML instance and can pick up setting, as well as report back information via that instance. Representers, etc., are added to a reusable instance and different YAML instances can co-exists.

This change eases development and helps prevent regressions.