> An alternative to the + operator is the pipe | operator, which is used for set...

dan-robertson · on March 6, 2019

The most compelling reason to not do this is that (I claim) it’s not super obvious what to do when the keys are equal. In:

  { 'a' : 1 } | { 'a' : 2 }

Should the result be:

  { 'a' : 1 }

(prioritise the left hand side), or

  { 'a' : 2 }

(prioritise the right hand side), or should it raise an error? Maybe a fourth option would be do downgrade to sets of keys and give:

  { 'a' }

A fifth option is to magically merge values:

  { 'a' : 3 } or { 'a' : (1,2) }

For the first two choices one loses commutativity which means that code then suddenly has to have previously cared about it (or it will do the wrong thing), even though it didn’t previously matter, and one is always potentially losing data. The third choice is safe but could cause unforeseen problems later if shared keys only happen rarely. The fourth choice also forgets a bunch of information held in the dict.

In a language like Haskell, one can use traits to specify how to merge values (Monoid) but without traits (and a way to choose which trait to use) I think some kind of magic merge is not great.

I claim the operations one should really want with dicts are not set operations but rather more relational ones, ie {inner,outer,left,right} joins on the keys followed by some mapping to decide how to merge values.

zwegner · on March 6, 2019

While I agree with you, I will note that even set union in Python is not commutative. a | b should equal b | a in the sense of __eq__, but the actual objects in the result set depend on the order of the arguments (and in the opposite way from dict + dict). This happens with objects that are distinct but compare/hash equally (x is not y and x == y). Whether that actually matters for any useful program is another story...

Dumb program to illustrate this point:

    class Dummy:
        def __init__(self, value):  self.value = value
        def __repr__(self):         return 'Dummy(%s)' % self.value
        def __hash__(self):         return 0
        def __eq__(self, other):    return True

    a = {Dummy(0)}
    b = {Dummy(1)}
    print(a | b)
    print(b | a)
    print(a | b == b | a)

endgame · on March 6, 2019

Unfortunately, even in Haskell Data.Map.Map's monoid instance is left-biased. There is the monoidal-containers package which newtype-wraps Data.Map.Map to have instance Monoid m => Monoid (MonoidalMap k m), which I think is much more sensible.

dan-robertson · on March 6, 2019

I think I wasn’t even sure that Haskell had a Monoid instance for Data.Map, I knew it wasn’t the interface which I would naturally expect though. I agree that the interface for MonoidalMap is more natural.

sametmax · on March 6, 2019

Besides, anytime somebody compare Python to Haskell, the battle is over. They have completly different use cases and philosophy. If you want something in Haskell, you probably want the opposite in Python.

mjburgess · on March 6, 2019

Its not clear what you're saying here.

The comparison was to say, "this decision is difficult everywhere" -- which lang seems beside the point.

zimablue · on March 6, 2019

Great post, sets have nice properties that dictionaries don't have. Making them act similarly seems like a trap

kbd · on March 6, 2019

> it’s not super obvious what to do when the keys are equal

    d1 | d2 | d3 | ...

is equivalent to:

    {**d1, **d2, **d3, ...}

dan-robertson · on March 6, 2019

Now read the above but instead of “it’s not super obvious what

  d1 | d2

should be because losing information/desirable properties/weird errors”, read “it’s not super obvious what

  {**d1, **d2}

should be because losing information/desirable properties/weird errors”.

Except I guess one could throw in something about TOOWTDI too.

kbd · on March 6, 2019

I actually think it is obvious what a dictionary merge should do (overwrite keys on the left with keys on the right), but this is besides the point because it's already been determined for

    {**d1, **d2}

In other words, there are no new semantics to discuss here. I'm just saying the two syntaxes should be equivalent.

bocklund · on March 6, 2019

> For the first two choices one loses commutativity which means that code then suddenly has to have previously cared about it (or it will do the wrong thing)

Since this is a new operator, that shouldn’t be an issue.

I think losing commutivity is okay. After all, d1.update(d2) != d2.update(d1) if keys conflict.

dan-robertson · on March 6, 2019

What you have written doesn’t look at all symmetrical but d1 | d2 looks very symmetrical. Operators being symmetrical around a vertical axis tends to imply being commutative (although there are many exceptions e.g. a divide symbol (but note fractions aren’t symmetrical) or a minus sign or using ^ for exponentiation (but superscripting is not symmetrical) or matrix multiplication (but maybe one could argue this is an abbreviation of function application))

Secondly I claim that the issue with using | is that it is not a new operator. It is a new, incompatible meaning for an old operator. Old code might not bother checking that its arg is a set because of it weren’t a set then | or in would fail. New programmers might see dicts as being basically sets and wrongly assume functions for sets would correctly work on dicts.

dstola · on March 6, 2019

In case the values match you could supply a collision callback to define what to do, eg to add the values,

  d1 = {'a': 1}
  d2 = {'a': 2}

  d3 = {**d1, **d2, add_func)

  def add_func(a, b):
      return a+b

Or something along those lines

rbanffy · on March 6, 2019

Why not raise a ValueError and let the programmer figure out what The Right Thing To Do is when you add two dicts that have the same key with a different value?

I assume the same key with the same value would be OK, but I'm not really sure it's a good idea for it to be OK.

zimablue · on March 6, 2019

You can't do value comparison without making dict item comparison a pissed in function or making dict values immutable. If you're doing something that really looks like a mathematical Union that will raise if there's any overlap then it's a really confusing abuse of notation. I don't think there's a way out.

dan-robertson · on March 6, 2019

That is one thing you could do to merge dicts. To expand on my last paragraph above, I think I would imagine the following operations (stupid syntax):

  a & b = { k: (a[k], b[k]) for k in a.keys() | b.keys() }
  a | b = { k: (a.get(k, None), b.get(k, None)) for k in a.keys() | b.keys() }
  a |& b = { k: (a.get(k,None), v) for k, v in b.items() }
  a &| b = { k: (v, b.get(k,None)) for k, v in a.items() }
  a |_| b = { k: only(a,b,k) for k in a.keys() | b.keys() }
  def only(a,b,k):
    if k in a && k in b:
      throw DuplicateKey(a,b,k)
    elseif k not in a && k not in b:
      assert(false)
    elseif k in a:
      return a[k]
    else:
      return b[k]

This doesn’t work well if values can be None so maybe instead of pairs there should be objects Left(x), Right(y), and Both(x,y)

mturmon · on March 6, 2019

That syntax doesn't make sense. The

  {**d1, **d2}

idiom is just a clever mashup of Python's dictionary construction literal {}, and * * unpacking. That's why it only works with string-valued keys (which is a major limitation).

Adding a third item to the dictionary literal would require special-casing the {} dictionary construction literal.

jessaustin · on March 6, 2019

  >>> { 'a' : 1 } | { 'a' : 2 }

ISTM the most logical result would be:

  { 'a' : { 1, 2 } }

...but I could certainly understand throwing an exception.

xiao_haozi · on March 6, 2019

While I see your point, I don't think this makes sense historically. Dictionaries never supported such behavior before so you'd be introducing a new core concept to a dictionary. But moreover, you'd be changing the type of the value only on duplicated keys, and what about if you were to add another value of 2 to a? Are you making this a set, and why? I think it would come with too many caveats and assumptions in the PEP.

I'm not saying you have a bad idea/logic here, just that I'm not sure it's the best thing for the dict.

dan-robertson · on March 6, 2019

Note that this forgets the order of the arguments, which may not be desirable

kqr · on March 6, 2019

If the property we want to achieve is "a | b == b | a" we necessarily have to forget the order of the arguments.

eesmith · on March 6, 2019

"dicts would subclass sets, as dicts are essentially sets with values attached"

Such a derivation would violate the Liskov substitution principle. Consider the following with set:

  x = {"one", "two"}
  y = set()
  y.update(x)
  y

It result in y being {'two', 'one'} .

Now, do the same with dict:

  y = dict()
  y.update(x)

This gives the exception: "ValueError: dictionary update sequence element #0 has length 3; 2 is required"

This means that dict cannot be used anywhere that a set can be used, which means it violates the Liskov substitution principle (see https://en.wikipedia.org/wiki/Liskov_substitution_principle ) which means that if covariant methods are needed for good design then dict cannot be a subclass of set.

kbd · on March 6, 2019

If dicts did subclass sets, then sets would be dicts whose values are all None. In other words, your last example would be defined as:

    >>> s = {'one', 'two'}
    >>> d = {}
    >>> d.update(s)
    >>> d
    {'two': None, 'one': None}

pdpi · on March 6, 2019

If sets are dicts with values of None, then they're dicts, not a superclass of dict.

eesmith · on March 6, 2019

Would s["one"] = 1 raise an exception? Or convert the set into a dict? Or change the sentinel value for all the set elements?

None seem like a good design since it means either the instance change its class on the fly (which Python does support) or that a dict does not act like its parent set object, breaking the is-a relationship most people expect from an OO design.

It seems like the circle/ellipse problem, and the current implementation is the "drop all inheritance relationships" solution to that problem. https://en.wikipedia.org/wiki/Circle-ellipse_problem#Drop_al...

kbd · on March 7, 2019

> Would s["one"] = 1 raise an exception?

Sets don't support indexing, so it would still raise an exception. Dicts do, which is an example of them supporting more operations than sets, which is an example of why (if there is to be any subclass relation) dicts are subclasses of sets.

Edit: I suppose there's some confusion about my language above. "then sets would be dicts whose values are all None" could more helpfully read "sets would be equivalent to dicts whose...".

rocqua · on March 6, 2019

Liskov substitution and meaningful method-mutability forbid any kind of sub-typing relationship.

Sean1708 · on March 6, 2019

> as dicts are essentially sets with values attached.

Interestingly enough some languages actually do the opposite. In Rust for example a set is literally just a dictionary with unit as the value[0] and unit is essentially a way of expressing the absence of a value (it takes up no space in memory, and you just can't do anything with it).

[0]: https://doc.rust-lang.org/stable/src/std/collections/hash/se...

For posterity, the above link shows:

  pub struct HashSet<T, S = RandomState> {
      map: HashMap<T, (), S>,
  }

cube2222 · on March 6, 2019

Same as Go, where everybody just uses map[Key]struct{} as sets.

bpicolo · on March 6, 2019

To be fair, this is because the language does not support a type safe set type. I would use one frequently if it did

majewsky · on March 6, 2019

map[T]struct{} is a type-safe set type. It's just not an ergonomic one.

FWIW, I'm usually using map[T]bool and only ever inserting `true` values. It uses a bit more space, but membership checks read like

  if set[key] {

instead of

  if _, ok := set[key]; ok {

zephyrfalcon · on March 6, 2019

Before Python grew a set type, it was common to implement them in a similar way, i.e. as a dict with some kind of default value (0, 1, None, etc).

kevin_thibedeau · on March 6, 2019

Python sets are internally just a dict hash table for the keys with no associated values.

sqd · on March 6, 2019

I think internally C++ STL set<T> is similarly just map<T, void>

kbd · on March 6, 2019

> Interestingly enough some languages actually do the opposite.

Of course you can represent sets as dictionaries with empty values (ask anyone who programmed Perl). You're supporting my point that dicts logically subclass sets, because they can represent sets where the values can be other things as well.

You're also getting at what the behavior should be if you union a dictionary and a set. Hypothetical Python:

    >>> s = {'a','b','c'}
    >>> d = {'d': 'D'}
    >>> d | s
    >>> {'d': 'D', 'a': None, 'b': None, 'c': None}

meatball-farmer · on March 6, 2019

I think he's flipping your point: sets are a subclass of dicts/maps, not vice-versa. Thinking of a maps as a set where it's value maps to something else sounds backwards because values in sets don't map to values arbitrarily (or at all in some cases); maps are maps.

dtech · on March 6, 2019

> It's always been on my Python wish list that dicts would subclass sets, as dicts are essentially sets with values attached.

> There's no reason you shouldn't have always been able to say d | d2, same as sets.

I don't agree with this view, mainly because merging a dict is not associative, while unionizing a set is.

The actual operation for "a + b" is "add everything from b set to a", and + more closely resembles that than |.

guitarbill · on March 6, 2019

dict.keys() pretty much does just that:

    >>> a = {"foo": 1}
    >>> b = {"bar"}
    >>> a.keys() | b
    {'bar', 'foo'}

As an aside, I like the plus operator. Begin able to merge two dictionaries in one line and have the result be a new dict is something I've needed often enough.

    {**d, **d2}

works, but is pretty recent and still feels weird to me (not coming from a language that makes use of destructuring a lot, like Javascript).

kbd · on March 6, 2019

> dict.keys() pretty much does just that... a.keys() | b

You're illustrating my point:

    >>> set(a) | b
    {'bar', 'foo'}

guitarbill · on March 6, 2019

Yeah, I largely agree. Is that unusual these days? :D

And I'd be sort of okay if dicts implemented set operators, although I don't think using `set()` or `.keys()` is a big ask. But using the pipe operator for the operation in this PEP would be a bad idea IMO: For sets, a | b == b | a. For dicts, not necessarily. So if they used the pipe operator, that could lead to surprising or unintuitive results.

On the plus side, not using pipe still leaves it open for future usage, so you might get your wish yet.

kbd · on March 6, 2019

Your exact point that for sets, union operator is commutative while for dictionaries it wouldn't be is one of the main objections I foresaw if I ever wrote this up as a PEP. Counterpoint: for numbers, + is commutative but for lists it's not, so it's normal for the same operator to have different commutativity depending on the type of the operands. IMO it's worth using the set operators because of the subclass relation of sets and dicts despite some small (but really, predictable) changes in behavior, but I can see how someone could have a different opinion.

makapuf · on March 6, 2019

The extension operator still resembles more a set union than an addition. The fact that it is not commutative is not imo an argument since sum is even more frequently used with a commutative semantics.

Someone · on March 6, 2019

”that dicts would subclass sets, as dicts are essentially sets with values attached.”

I think a variant of https://en.wikipedia.org/wiki/Composition_over_inheritance applies here. Inheritance, in general, only is a good idea if there is a strong isa relation. A dictionary isn’t a set of keys, it has a set of keys.

If you want to see a dictionary as a set, I think the better view would be to see it as a set of (key,value) pairs where equality of pairs is defined as equality of the key parts, ignoring the ‘value’ parts.

I think it makes sense to require that such a set should behave identical to a dictionary, and that providing a ‘real’ dictionary is just an optimization, plus the addition of convenience functions, e.g. to get the set of keys.

If one sees things that way, one could even define the dictionary interface as taking an equality operation on the keys and a ‘value combiner’ function that combines values, and will be used in the cases you outline (that function could add integers, concatenate strings, keep the larger value, or whatever the programmer specifies)

103e · on March 6, 2019

If you treat dictionaries as sets of tuples union doesn’t work as expected: {(‘a’,1)} | {(‘a’, 2)} = {(‘a’,1), (‘a’, 2)} Same key maps two values.

coldtea · on March 6, 2019

That's not what the parent says though.

He says that sets are like dictionaries without values, e.g. a set is akin to the dictionary keys.

So, in your example the ('a', 1) and ('a', 2) are the keys (in how the parent argues about) -- it's not 'a' that's the key.

Same like you can do today:

  d = {}
  d[('a', 1)] = 6
  d[('a', 2)] = 89

We can still express that dictionary as a set of tuples, it's just:

  {(('a', 1), 6), (('a', 2), 89)}

103e · on March 6, 2019

Meant to respond to top level parent who said:

> dicts would subclass sets

Was trying to make the case as to why treating dictionaries as a child of <generic collection> with an extend/merge operation using ‘+’ rather than a child of set with a union operation ‘|’ makes more sense (to me).

Changing the behaviour of a well defined operation like union seems bad - although my case is somewhat undermined by python’s overloading of ‘+’ to mean extend.

sametmax · on March 6, 2019

Practically, what your wish would accomplish ? Will it make most people more productive ? Produce less bug ? Learn faster ?

Most Python coders don't even use sets more than once a year. Hell, I use collections.deque more than sets.

But dicts ? We use it all the time. In fact, failing a {} + {} is a recurring disapointment in all my classrooms.

Plus, in PHP and JS, arrays/objects are the "do-it-all" data structure. And it's horrible. You see the same data structure everywhere. You have to read in details what something is, and what it's for.

It very nice that dict and set are very distincts, and that they have a distint set of operators. This way it's super easy to scan the code and know what the data structure is, and what it's used for. That's why I always teach set([1, 2]) and not {1, 2} first. It helps people to make a clear distinction in their mind.

zimablue · on March 6, 2019

I use sets a lot, sample size of one. I do data science/engineering stuff, sets of columns/sets of keys. I'm very sceptical of your claim that most developers don't use sets.

sametmax · on March 6, 2019

Actually, a lot of Python users don't even know sets exist. Or they forgot, and use them randomly after they google "remove duplicates". Even among the sets users, a lot of them don't even know you can use ^, | and ~ with them.

Size sample: a few hundred students and colleagues over 15 years.

It make sense: for columns, it's quite commonto use dicts, or pandas dataframe. The sets of key is just the dict keys memory view.

A web dev rarely needs sets. A GUI coders, a sysadmin or a geograph neither.

It's not that sets are not useful, it's just that in the huge numbers of things you need to do in programming, accross all fields that Python attends to, they are pretty niche.

rocqua · on March 6, 2019

> It very nice that dict and set are very distincts, and that they have a distint set of operators. This way it's super easy to scan the code and know what the data structure is, and what it's used for.

This is exactly why I want explicit typing in Python.

sametmax · on March 6, 2019

You do with type annotations.

But type declaration may not be in the view port.

Or you may read a script or snippet, which is a type or code that won't bother with typing.

klibertp · on March 6, 2019

> But type declaration may not be in the view port.

Type declaration doesn't have to be in the view; the very point of type annotations is to support static type systems, which you can query about any type. For example, if you use mypy, you can add `reveal_type( any_kind_of_expr )` anywhere in your code and you'll get the most precise type available, be it inferred or declared manually, for that expression at given point in code.

benj111 · on March 6, 2019

Yes, its not so bad for the + case, but the - case seems non obvious at best.