> An alternative to the + operator is the pipe | operator, which is used for set union. This suggestion did not receive much support on Python-Ideas.
That's disappointing. It's always been on my Python wish list that dicts would subclass sets, as dicts are essentially sets with values attached. Pretty much everywhere you can use a set you can use a dict and it acts like the set of its keys. For example:
>>> s = {'a','b','c'}
>>> d = {i: i.upper() for i in s}
>>> list(d) == list(s)
True
Dictionaries have been moving in this more ergonomic direction for a while. Originally, to union two dictionaries you had to say:
There's no reason you shouldn't have always been able to say d | d2, same as sets. Now I finally get my wish that dictionaries will behave more similarly to sets and they use the wrong set of operators.
The most compelling reason to not do this is that (I claim) it’s not super obvious what to do when the keys are equal. In:
{ 'a' : 1 } | { 'a' : 2 }
Should the result be:
{ 'a' : 1 }
(prioritise the left hand side), or
{ 'a' : 2 }
(prioritise the right hand side), or should it raise an error? Maybe a fourth option would be do downgrade to sets of keys and give:
{ 'a' }
A fifth option is to magically merge values:
{ 'a' : 3 } or { 'a' : (1,2) }
For the first two choices one loses commutativity which means that code then suddenly has to have previously cared about it (or it will do the wrong thing), even though it didn’t previously matter, and one is always potentially losing data. The third choice is safe but could cause unforeseen problems later if shared keys only happen rarely. The fourth choice also forgets a bunch of information held in the dict.
In a language like Haskell, one can use traits to specify how to merge values (Monoid) but without traits (and a way to choose which trait to use) I think some kind of magic merge is not great.
I claim the operations one should really want with dicts are not set operations but rather more relational ones, ie {inner,outer,left,right} joins on the keys followed by some mapping to decide how to merge values.
While I agree with you, I will note that even set union in Python is not commutative. a | b should equal b | a in the sense of __eq__, but the actual objects in the result set depend on the order of the arguments (and in the opposite way from dict + dict). This happens with objects that are distinct but compare/hash equally (x is not y and x == y). Whether that actually matters for any useful program is another story...
Dumb program to illustrate this point:
class Dummy:
def __init__(self, value): self.value = value
def __repr__(self): return 'Dummy(%s)' % self.value
def __hash__(self): return 0
def __eq__(self, other): return True
a = {Dummy(0)}
b = {Dummy(1)}
print(a | b)
print(b | a)
print(a | b == b | a)
Unfortunately, even in Haskell Data.Map.Map's monoid instance is left-biased. There is the monoidal-containers package which newtype-wraps Data.Map.Map to have instance Monoid m => Monoid (MonoidalMap k m), which I think is much more sensible.
I think I wasn’t even sure that Haskell had a Monoid instance for Data.Map, I knew it wasn’t the interface which I would naturally expect though. I agree that the interface for MonoidalMap is more natural.
Besides, anytime somebody compare Python to Haskell, the battle is over. They have completly different use cases and philosophy. If you want something in Haskell, you probably want the opposite in Python.
I actually think it is obvious what a dictionary merge should do (overwrite keys on the left with keys on the right), but this is besides the point because it's already been determined for
{**d1, **d2}
In other words, there are no new semantics to discuss here. I'm just saying the two syntaxes should be equivalent.
> For the first two choices one loses commutativity which means that code then suddenly has to have previously cared about it (or it will do the wrong thing)
Since this is a new operator, that shouldn’t be an issue.
I think losing commutivity is okay. After all, d1.update(d2) != d2.update(d1) if keys conflict.
What you have written doesn’t look at all symmetrical but d1 | d2 looks very symmetrical. Operators being symmetrical around a vertical axis tends to imply being commutative (although there are many exceptions e.g. a divide symbol (but note fractions aren’t symmetrical) or a minus sign or using ^ for exponentiation (but superscripting is not symmetrical) or matrix multiplication (but maybe one could argue this is an abbreviation of function application))
Secondly I claim that the issue with using | is that it is not a new operator. It is a new, incompatible meaning for an old operator. Old code might not bother checking that its arg is a set because of it weren’t a set then | or in would fail. New programmers might see dicts as being basically sets and wrongly assume functions for sets would correctly work on dicts.
Why not raise a ValueError and let the programmer figure out what The Right Thing To Do is when you add two dicts that have the same key with a different value?
I assume the same key with the same value would be OK, but I'm not really sure it's a good idea for it to be OK.
You can't do value comparison without making dict item comparison a pissed in function or making dict values immutable. If you're doing something that really looks like a mathematical Union that will raise if there's any overlap then it's a really confusing abuse of notation. I don't think there's a way out.
That is one thing you could do to merge dicts. To expand on my last paragraph above, I think I would imagine the following operations (stupid syntax):
a & b = { k: (a[k], b[k]) for k in a.keys() | b.keys() }
a | b = { k: (a.get(k, None), b.get(k, None)) for k in a.keys() | b.keys() }
a |& b = { k: (a.get(k,None), v) for k, v in b.items() }
a &| b = { k: (v, b.get(k,None)) for k, v in a.items() }
a |_| b = { k: only(a,b,k) for k in a.keys() | b.keys() }
def only(a,b,k):
if k in a && k in b:
throw DuplicateKey(a,b,k)
elseif k not in a && k not in b:
assert(false)
elseif k in a:
return a[k]
else:
return b[k]
This doesn’t work well if values can be None so maybe instead of pairs there should be objects Left(x), Right(y), and Both(x,y)
idiom is just a clever mashup of Python's dictionary construction literal {}, and * * unpacking. That's why it only works with string-valued keys (which is a major limitation).
Adding a third item to the dictionary literal would require special-casing the {} dictionary construction literal.
While I see your point, I don't think this makes sense historically. Dictionaries never supported such behavior before so you'd be introducing a new core concept to a dictionary. But moreover, you'd be changing the type of the value only on duplicated keys, and what about if you were to add another value of 2 to a? Are you making this a set, and why? I think it would come with too many caveats and assumptions in the PEP.
I'm not saying you have a bad idea/logic here, just that I'm not sure it's the best thing for the dict.
"dicts would subclass sets, as dicts are essentially sets with values attached"
Such a derivation would violate the Liskov substitution principle. Consider the following with set:
x = {"one", "two"}
y = set()
y.update(x)
y
It result in y being {'two', 'one'} .
Now, do the same with dict:
y = dict()
y.update(x)
This gives the exception: "ValueError: dictionary update sequence element #0 has length 3; 2 is required"
This means that dict cannot be used anywhere that a set can be used, which means it violates the Liskov substitution principle (see https://en.wikipedia.org/wiki/Liskov_substitution_principle ) which means that if covariant methods are needed for good design then dict cannot be a subclass of set.
Would s["one"] = 1 raise an exception? Or convert the set into a dict? Or change the sentinel value for all the set elements?
None seem like a good design since it means either the instance change its class on the fly (which Python does support) or that a dict does not act like its parent set object, breaking the is-a relationship most people expect from an OO design.
Sets don't support indexing, so it would still raise an exception. Dicts do, which is an example of them supporting more operations than sets, which is an example of why (if there is to be any subclass relation) dicts are subclasses of sets.
Edit: I suppose there's some confusion about my language above. "then sets would be dicts whose values are all None" could more helpfully read "sets would be equivalent to dicts whose...".
> as dicts are essentially sets with values attached.
Interestingly enough some languages actually do the opposite. In Rust for example a set is literally just a dictionary with unit as the value[0] and unit is essentially a way of expressing the absence of a value (it takes up no space in memory, and you just can't do anything with it).
> Interestingly enough some languages actually do the opposite.
Of course you can represent sets as dictionaries with empty values (ask anyone who programmed Perl). You're supporting my point that dicts logically subclass sets, because they can represent sets where the values can be other things as well.
You're also getting at what the behavior should be if you union a dictionary and a set. Hypothetical Python:
>>> s = {'a','b','c'}
>>> d = {'d': 'D'}
>>> d | s
>>> {'d': 'D', 'a': None, 'b': None, 'c': None}
I think he's flipping your point: sets are a subclass of dicts/maps, not vice-versa. Thinking of a maps as a set where it's value maps to something else sounds backwards because values in sets don't map to values arbitrarily (or at all in some cases); maps are maps.
>>> a = {"foo": 1}
>>> b = {"bar"}
>>> a.keys() | b
{'bar', 'foo'}
As an aside, I like the plus operator. Begin able to merge two dictionaries in one line and have the result be a new dict is something I've needed often enough.
{**d, **d2}
works, but is pretty recent and still feels weird to me (not coming from a language that makes use of destructuring a lot, like Javascript).
Yeah, I largely agree. Is that unusual these days? :D
And I'd be sort of okay if dicts implemented set operators, although I don't think using `set()` or `.keys()` is a big ask. But using the pipe operator for the operation in this PEP would be a bad idea IMO: For sets, a | b == b | a. For dicts, not necessarily. So if they used the pipe operator, that could lead to surprising or unintuitive results.
On the plus side, not using pipe still leaves it open for future usage, so you might get your wish yet.
Your exact point that for sets, union operator is commutative while for dictionaries it wouldn't be is one of the main objections I foresaw if I ever wrote this up as a PEP. Counterpoint: for numbers, + is commutative but for lists it's not, so it's normal for the same operator to have different commutativity depending on the type of the operands. IMO it's worth using the set operators because of the subclass relation of sets and dicts despite some small (but really, predictable) changes in behavior, but I can see how someone could have a different opinion.
The extension operator still resembles more a set union than an addition. The fact that it is not commutative is not imo an argument since sum is even more frequently used with a commutative semantics.
If you want to see a dictionary as a set, I think the better view would be to see it as a set of (key,value) pairs where equality of pairs is defined as equality of the key parts, ignoring the ‘value’ parts.
I think it makes sense to require that such a set should behave identical to a dictionary, and that providing a ‘real’ dictionary is just an optimization, plus the addition of convenience functions, e.g. to get the set of keys.
If one sees things that way, one could even define the dictionary interface as taking an equality operation on the keys and a ‘value combiner’ function that combines values, and will be used in the cases you outline (that function could add integers, concatenate strings, keep the larger value, or whatever the programmer specifies)
Was trying to make the case as to why treating dictionaries as a child of <generic collection> with an extend/merge operation using ‘+’ rather than a child of set with a union operation ‘|’ makes more sense (to me).
Changing the behaviour of a well defined operation like union seems bad - although my case is somewhat undermined by python’s overloading of ‘+’ to mean extend.
Practically, what your wish would accomplish ? Will it make most people more productive ? Produce less bug ? Learn faster ?
Most Python coders don't even use sets more than once a year. Hell, I use collections.deque more than sets.
But dicts ? We use it all the time. In fact, failing a {} + {} is a recurring disapointment in all my classrooms.
Plus, in PHP and JS, arrays/objects are the "do-it-all" data structure. And it's horrible. You see the same data structure everywhere. You have to read in details what something is, and what it's for.
It very nice that dict and set are very distincts, and that they have a distint set of operators. This way it's super easy to scan the code and know what the data structure is, and what it's used for. That's why I always teach set([1, 2]) and not {1, 2} first. It helps people to make a clear distinction in their mind.
I use sets a lot, sample size of one. I do data science/engineering stuff, sets of columns/sets of keys. I'm very sceptical of your claim that most developers don't use sets.
Actually, a lot of Python users don't even know sets exist. Or they forgot, and use them randomly after they google "remove duplicates". Even among the sets users, a lot of them don't even know you can use ^, | and ~ with them.
Size sample: a few hundred students and colleagues over 15 years.
It make sense: for columns, it's quite commonto use dicts, or pandas dataframe. The sets of key is just the dict keys memory view.
A web dev rarely needs sets. A GUI coders, a sysadmin or a geograph neither.
It's not that sets are not useful, it's just that in the huge numbers of things you need to do in programming, accross all fields that Python attends to, they are pretty niche.
> It very nice that dict and set are very distincts, and that they have a distint set of operators. This way it's super easy to scan the code and know what the data structure is, and what it's used for.
This is exactly why I want explicit typing in Python.
> But type declaration may not be in the view port.
Type declaration doesn't have to be in the view; the very point of type annotations is to support static type systems, which you can query about any type. For example, if you use mypy, you can add `reveal_type( any_kind_of_expr )` anywhere in your code and you'll get the most precise type available, be it inferred or declared manually, for that expression at given point in code.
That's disappointing. It's always been on my Python wish list that dicts would subclass sets, as dicts are essentially sets with values attached. Pretty much everywhere you can use a set you can use a dict and it acts like the set of its keys. For example:
Dictionaries have been moving in this more ergonomic direction for a while. Originally, to union two dictionaries you had to say: Nowadays, as the PEP points out, you can just say: There's no reason you shouldn't have always been able to say d | d2, same as sets. Now I finally get my wish that dictionaries will behave more similarly to sets and they use the wrong set of operators.