Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The most compelling reason to not do this is that (I claim) it’s not super obvious what to do when the keys are equal. In:

  { 'a' : 1 } | { 'a' : 2 }
Should the result be:

  { 'a' : 1 }
(prioritise the left hand side), or

  { 'a' : 2 }
(prioritise the right hand side), or should it raise an error? Maybe a fourth option would be do downgrade to sets of keys and give:

  { 'a' }
A fifth option is to magically merge values:

  { 'a' : 3 } or { 'a' : (1,2) }
For the first two choices one loses commutativity which means that code then suddenly has to have previously cared about it (or it will do the wrong thing), even though it didn’t previously matter, and one is always potentially losing data. The third choice is safe but could cause unforeseen problems later if shared keys only happen rarely. The fourth choice also forgets a bunch of information held in the dict.

In a language like Haskell, one can use traits to specify how to merge values (Monoid) but without traits (and a way to choose which trait to use) I think some kind of magic merge is not great.

I claim the operations one should really want with dicts are not set operations but rather more relational ones, ie {inner,outer,left,right} joins on the keys followed by some mapping to decide how to merge values.



While I agree with you, I will note that even set union in Python is not commutative. a | b should equal b | a in the sense of __eq__, but the actual objects in the result set depend on the order of the arguments (and in the opposite way from dict + dict). This happens with objects that are distinct but compare/hash equally (x is not y and x == y). Whether that actually matters for any useful program is another story...

Dumb program to illustrate this point:

    class Dummy:
        def __init__(self, value):  self.value = value
        def __repr__(self):         return 'Dummy(%s)' % self.value
        def __hash__(self):         return 0
        def __eq__(self, other):    return True

    a = {Dummy(0)}
    b = {Dummy(1)}
    print(a | b)
    print(b | a)
    print(a | b == b | a)


Unfortunately, even in Haskell Data.Map.Map's monoid instance is left-biased. There is the monoidal-containers package which newtype-wraps Data.Map.Map to have instance Monoid m => Monoid (MonoidalMap k m), which I think is much more sensible.


I think I wasn’t even sure that Haskell had a Monoid instance for Data.Map, I knew it wasn’t the interface which I would naturally expect though. I agree that the interface for MonoidalMap is more natural.


Besides, anytime somebody compare Python to Haskell, the battle is over. They have completly different use cases and philosophy. If you want something in Haskell, you probably want the opposite in Python.


Its not clear what you're saying here.

The comparison was to say, "this decision is difficult everywhere" -- which lang seems beside the point.


Great post, sets have nice properties that dictionaries don't have. Making them act similarly seems like a trap


> it’s not super obvious what to do when the keys are equal

    d1 | d2 | d3 | ...
is equivalent to:

    {**d1, **d2, **d3, ...}


Now read the above but instead of “it’s not super obvious what

  d1 | d2
should be because losing information/desirable properties/weird errors”, read “it’s not super obvious what

  {**d1, **d2}
should be because losing information/desirable properties/weird errors”.

Except I guess one could throw in something about TOOWTDI too.


I actually think it is obvious what a dictionary merge should do (overwrite keys on the left with keys on the right), but this is besides the point because it's already been determined for

    {**d1, **d2}
In other words, there are no new semantics to discuss here. I'm just saying the two syntaxes should be equivalent.


> For the first two choices one loses commutativity which means that code then suddenly has to have previously cared about it (or it will do the wrong thing)

Since this is a new operator, that shouldn’t be an issue.

I think losing commutivity is okay. After all, d1.update(d2) != d2.update(d1) if keys conflict.


What you have written doesn’t look at all symmetrical but d1 | d2 looks very symmetrical. Operators being symmetrical around a vertical axis tends to imply being commutative (although there are many exceptions e.g. a divide symbol (but note fractions aren’t symmetrical) or a minus sign or using ^ for exponentiation (but superscripting is not symmetrical) or matrix multiplication (but maybe one could argue this is an abbreviation of function application))

Secondly I claim that the issue with using | is that it is not a new operator. It is a new, incompatible meaning for an old operator. Old code might not bother checking that its arg is a set because of it weren’t a set then | or in would fail. New programmers might see dicts as being basically sets and wrongly assume functions for sets would correctly work on dicts.


In case the values match you could supply a collision callback to define what to do, eg to add the values,

  d1 = {'a': 1}
  d2 = {'a': 2}

  d3 = {**d1, **d2, add_func)

  def add_func(a, b):
      return a+b
Or something along those lines


Why not raise a ValueError and let the programmer figure out what The Right Thing To Do is when you add two dicts that have the same key with a different value?

I assume the same key with the same value would be OK, but I'm not really sure it's a good idea for it to be OK.


You can't do value comparison without making dict item comparison a pissed in function or making dict values immutable. If you're doing something that really looks like a mathematical Union that will raise if there's any overlap then it's a really confusing abuse of notation. I don't think there's a way out.


That is one thing you could do to merge dicts. To expand on my last paragraph above, I think I would imagine the following operations (stupid syntax):

  a & b = { k: (a[k], b[k]) for k in a.keys() | b.keys() }
  a | b = { k: (a.get(k, None), b.get(k, None)) for k in a.keys() | b.keys() }
  a |& b = { k: (a.get(k,None), v) for k, v in b.items() }
  a &| b = { k: (v, b.get(k,None)) for k, v in a.items() }
  a |_| b = { k: only(a,b,k) for k in a.keys() | b.keys() }
  def only(a,b,k):
    if k in a && k in b:
      throw DuplicateKey(a,b,k)
    elseif k not in a && k not in b:
      assert(false)
    elseif k in a:
      return a[k]
    else:
      return b[k]
This doesn’t work well if values can be None so maybe instead of pairs there should be objects Left(x), Right(y), and Both(x,y)


That syntax doesn't make sense. The

  {**d1, **d2}
idiom is just a clever mashup of Python's dictionary construction literal {}, and * * unpacking. That's why it only works with string-valued keys (which is a major limitation).

Adding a third item to the dictionary literal would require special-casing the {} dictionary construction literal.


  >>> { 'a' : 1 } | { 'a' : 2 }
ISTM the most logical result would be:

  { 'a' : { 1, 2 } }
...but I could certainly understand throwing an exception.


While I see your point, I don't think this makes sense historically. Dictionaries never supported such behavior before so you'd be introducing a new core concept to a dictionary. But moreover, you'd be changing the type of the value only on duplicated keys, and what about if you were to add another value of 2 to a? Are you making this a set, and why? I think it would come with too many caveats and assumptions in the PEP.

I'm not saying you have a bad idea/logic here, just that I'm not sure it's the best thing for the dict.


Note that this forgets the order of the arguments, which may not be desirable


If the property we want to achieve is "a | b == b | a" we necessarily have to forget the order of the arguments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: