Jan 6 2015

Python线程的隐晦之处

记几点python多线程使用里需要注意的地方。

thread.stack_size([size])

这是module thread提供的接口，该函数每次调用时都会设置stack size，不加参数就设置stack size为0，也就是使用系统默认值，返回值是之前stack size的“旧值”。不要以为不加参数仅仅是返回stack size。

thread._local, _threading_local, threading.local

thread._local和_threading_local是原始实现， threading提供的只是个alias。mannual page: link.

_local()每次都会产生一个全新的localobject，实现方式是新建一个localobject，内部封装一个dict，然后对localobject的属性查找都在封装的dict里进行，同时dict也以threadid为key存储在PyThreadState/threading.Thread的dict属性里。没啥用。
使用过程中需要由用户持有对localobject的引用，一旦释放localobject也就被析构了，内部存储的数据也会被析构，也就是不能像以下代码里这样用：

def func_a():
    tss =thread._local()
    tss.var = tss.var + 1
def func_b():
    tss = thread._local()
    print tss.var
def thread_proc():
    while True:
        func_a()
        func_b()

也就是他无法提供一个线程级的__builtins__ namespace空间。thread._local支持的功能用户可以用更加清晰的方式自己实现。

基于linux实现的thread module是detach的。

其他系统有没有detach这个概念不清楚了。

threading的线程安全性。

我竟然在说线程库的线程安全性（囧
每个threading.Thread()是有个thread-name的，如果初始化参数没有提供会使用如下函数进行生成：

# file: threading.py
_counter = 0
def _newname(template="Thread-%d"):
    global _counter
    _counter = _counter + 1
    return template % _counter

threading是python实现的，这段代码也不是线程安全的，生成的thread-name有可能重复。所以还是自己设置thread name吧。
可以执行这个脚本测下，输出的thread name是有重复的，我的测试版本是cpython 2.7.5。

import threading
import time
import sys

sys.setcheckinterval(1)

class MyThread(threading.Thread):
    def __init__(self, level):
        threading.Thread.__init__(self)
        self.level = level
    def run(self):
        print self.getName()
        if self.level > 8:
            return
        thread_a = MyThread(self.level+1)
        thread_b = MyThread(self.level+1)
        thread_a.start()
        thread_b.start()

thread = MyThread(1)
thread.start()

time.sleep(2)

###importing in threaded code
from mannual, link.

Firstly, other than in the main module, an import should not have the side effect of spawning a new thread and then waiting for that thread in any way. Failing to abide by this restriction can lead to a deadlock if the spawned thread directly or indirectly attempts to import a module.

这条说的是import的时候，要导入的module如果另起一个线程并等待该线程做某些事情，同时该线程里又要执行import操作，这样会导致死锁，main thread也不行。

// file: import.c
PyObject *
PyImport_ImportModuleLevel(char *name, PyObject *globals, PyObject *locals,
             PyObject *fromlist, int level)
{
    PyObject *result;
    lock_import();
    result = import_module_level(name, globals, locals, fromlist, level);
    if (unlock_import() < 0) {
        Py_XDECREF(result);
        PyErr_SetString(PyExc_RuntimeError,
                "not holding the import lock");
        return NULL;
    }
    return result;
}

这是cpython的import功能实现代码，虚拟机经过一系列调用到这个函数真正开始import操作。其中的lock_import() & unlock_import()实现了reentrant lock的功能。原因很清楚，虚拟机是字节码级别的中断，该函数是import_name的实现，可以认为是原子的，函数里加的锁不会随着GIL的释放而释放。
死锁模拟代码：link

Secondly, all import attempts must be completed before the interpreter starts shutting itself down. This can be most easily achieved by only performing imports from non-daemon threads created through the threading module. Daemon threads and threads created directly with the thread module will require some other form of synchronization to ensure they do not attempt imports after system shutdown has commenced. Failure to abide by this restriction will lead to intermittent exceptions and crashes during interpreter shutdown (as the late imports attempt to access machinery which is no longer in a valid state).

这个说的比较清楚了。

kmiku7's blog

JUST DO IT.

Python线程的隐晦之处

thread.stack_size([size])

thread._local, _threading_local, threading.local

基于linux实现的thread module是detach的。

threading的线程安全性。