Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm pretty sure that the situation you're describing was not thread safe in Python 2 either.

Sure, once you're in the body of the for loop, the dictionary must have been copied to the list so you're safe. But while d.items() is being evaluated at the start of the for loop, there is an internal iteration that could be preempted by the other thread. The GIL doesn't save you because Python operations aren't guaranteed to be atomic, and I doubt something that complex would be (it would be a serious problem if iterating over a large dictionary in one thread held up all other threads for an arbitrarily long time). Even if it is GIL-atomic, you're risking breakage if you move to another implementation (e.g. pypy) or if Python changes its atomiticity in future.

In general, if you want to modify an object in one thread and read it in another thread, you should add locking to prevent this happening simultaneously.

It is however true that the Python 2 items() method allows you to modify the dictionary in the body of the same for loop. But this is a surprising exception compared to iterating over a list or other container, so it makes sense overall to demand you explicitly make a copy if that's what you want.



In python 2 items() returned list, and access to dictionary was blocked by GIL, so while array is prepared, dict couldn't be modified. So it is thread safe in Python 2. In python 3 you need to lock, but it's not always obvious until it bites you. You may think that you need threads only for parallel processing and it's easy and managed, but there are much more common cases when you may use threads – UI or third party toolkits like QT, which often run callbacks in their own threads. And there is no other way to protect items() except of locking, even if you would try to prepare array out of iterator to make it faster, any parallel thread could break it by modification.

For myself I found only one good solution. Subclass dictionary and create thread safe version of it with locks around all critical operations: modifications and reads. If you want to make it more efficient you need separate read and write locks.


> while array is prepared, dict couldn't be modified

I mentioned this exact situtation in my comment. In fact that's what most of my comment is about.

To repeat:

* I don't believe it actually is atomic (but I haven't checked ... have you?)

* Even if it is it wouldn't be guaranteed to be atomic in future versions of Python (ignoring the fact that future versions of Python no longer have items() with the same symantics).

* It won't be safe in other implementations of Python e.g. pypy

* It doesn't match other collections that you can iterate over that don't need an items() e.g. list

* (This one is new) It won't be safe in user-defined dict-like classes that define their own items() method, even if that method is supposed to have the same symantics.

Modifying an object in one thread while reading it in another is a bug, even it seems to work for now. Don't blame Python for making it slightly more likely to break. Just using use a flipping mutex!


Lets assume that evaluating a Python 2 items() call or list isn't atomic, and it would break some multi-threaded code. Even with that, there is a huge difference with iterators that can be passed around left and right, and be executed far after they are generated.

Using a non-threadsafe list, race conditions and other problems will likely crop up in CPU-bound applications. However, with iterators, that may get lazy executed far after they are created, race conditions are far more likely to occur.

As an example, consider the following program:

  for value in x.items():
    do_shared_network_or_disk_call(value)
If "x" is a list, there is definitely the possibility of race conditions cropping up. But if "x" is an iterator, the possibility of that increases dramatically. In a multi-threaded/processed environment, both are bad, but why would Python 3 try to make the situation worse?


It's atomic in CPython and protected with GIL. It will be safe in user-defined dict classes if you will make it safe and care of this. And everything above is a matter of implementation. What you write is pure and correct in common sense, but it's not practical. If you have thread safe data structure that care about its state consistency itself, why not to use it without locks and make things simplier? I don't talk about syncing state of several data structures etc. I'm talking about very simple use cases when it becomes very handy.


What do you mean by "threadsafe" here? Could dict.items() actually break in Python 2? I've never seen that happen.


As I admitted in my comment, I'm not 100% sure that it's not protected by the GIL. If it's not, I wouldn't expect a hard crash if you mutate from another thread while iterating, but more like e.g. an item doesn't appear in the even though a different one had been removed by the other thread. But as I said in my comment, even if it does happen to be protected by the GIL, I think it's unsafe and fragile to rely on it.


Unexpected/undefined behavior?


There are a lot of cases when you don't need strict consistency and current state is enough for processing. For example you want to save requests stats from web servers. Would you stop all operations until you counting and writing to DB to be precise? Off course not. Some current number that you have is good enough for you. Off course you need to be aware of side effects.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: