python 500行 小项目 --- 简单对象模型



简单对象模型

卡尔·弗里德里希·Bolz

卡尔·弗里德里希·Bolz是在伦敦大学国王学院的研究人员,并在各种动态语言的实现和优化广泛兴趣。他是PyPy / RPython的核心作者之一,并已在序言,球拍,Smalltalk中,PHP和Ruby的实现工作。@cfbolz在Twitter上。

介绍

面向对象编程是目前使用的主要的编程范式之一,有很多的提供某种形式的面向对象的语言。虽然从表面上看,不同的面向对象编程语言提供给程序员的机制非常相似,细节可以改变很多。大多数语言的共性是物体的存在和某种类型的继承机制。类,但是,是不是每个语言直接支持的功能。例如,在基于原型的语言,如自我或JavaScript,类的概念不存在,对象,而不是直接相互继承。

了解不同的对象模型之间的差异可以是有趣。他们往往揭示不同语言之间的家族相似。它可以把一个新的语言模型成其他语言的模型的上下文中,既要迅速了解新模式,并获得了编程语言设计空间,更好的感觉。

本章探讨了一系列非常简单对象模型的实施。它开始了简单的实例和类,并呼吁实例方法的能力。这是一家成立于早期的面向对象语言,如Simula的67和Smalltalk的“经典”面向对象的方法。这个模型是逐步延长,下两步探索不同语言的设计选择,最后一步提高对象模型的效率。最终的模型是不是一个真正的语言,但是Python的对象模型的理想化,简化的版本。

本章介绍的对象模型将在Python中实现。代码工作在两个的Python 2.7和3.4。要了解的行为和设计选择更好的,本章也将呈现为对象模型试验。该测试可与py.test或鼻子上运行。

Python中的作为实现语言的选择是非常不现实的。“真正的”虚拟机像C / C ++低级别的语言通常实现的,需要大量的关注工程的细节,使之高效。然而,简单的实现语言可以更容易地专注于实际行为的差异,而不是陷入困境的实现细节。

基于模型的方法

对象模型,我们将在开始时是该Smalltalk中的一个非常简化的版本。Smalltalk的是一个面向对象的编程语言,由Alan Kay的组在Xerox PARC在20世纪70年代设计的。它推广面向对象的编程,并在今天的编程语言中的许多功能的来源。一个Smalltalk的语言设计的核心原则之一是“一切都是对象”。Smalltalk的当今使用最直接的继任者是红宝石,它采用了更类似C的语法,但保留了大部分Smalltalk的对象模型。

本节中的对象模型将有类和它们的实例,读写属性为对象的能力,来调用对象的方法的能力,并为一类的能力是另一个类的子类。从一开始,课程将是可以有自己的属性和方法完全普通对象。

关于术语的说明:在这一章中,我会用这个词“实例”的意思 - “一个对象,它是不是一个类”。

一个好方法开始是写一个测试来指定要被实现的行为应该是什么。本章介绍的所有测试将包括两个部分。首先,有点普通Python代码定义和使用了几类,并利用Python的对象模型的日益先进的功能。其次,使用对象模型相应的测试,我们将在本章中实现的,而不是普通的Python类。

使用普通的Python类和使用我们的对象模型之间的映射关系将手动测试来完成。例如,而不是写obj.attribute在Python,在对象模型中,我们将使用的方法obj.read_attr("attribute")这个映射会,在真实的语言实现,可以通过语言的翻译,或者编译器完成。

本章中进一步简化的是,我们使实现该对象模型的代码和被用于写入的对象所使用的方法的代码之间没有明显的区别。在实际系统中,两人便经常在不同的编程语言实现。

让我们从一个简单的测试阅读和写作对象字段。

def test_read_write_field():
    # Python code
    class A(object):
        pass
    obj = A()
    obj.a = 1
    assert obj.a == 1

    obj.b = 5
    assert obj.a == 1
    assert obj.b == 5

    obj.a = 2
    assert obj.a == 2
    assert obj.b == 5

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("a", 1)
    assert obj.read_attr("a") == 1

    obj.write_attr("b", 5)
    assert obj.read_attr("a") == 1
    assert obj.read_attr("b") == 5

    obj.write_attr("a", 2)
    assert obj.read_attr("a") == 2
    assert obj.read_attr("b") == 5

本次测试使用的是我们必须实现三两件事。这些类ClassInstance分别代表了我们的对象模型的类和实例。有类的两个特殊情况:OBJECTTYPEOBJECT对应于object在Python和是最终的基类的继承层次结构的。TYPE对应于type在Python和是所有类的类型。

做任何事情的情况ClassInstance他们从一个共享的基类继承实现共享接口Base暴露了一些方法:

class Base(object):
    """ The base class that all of the object model classes inherit from. """

    def __init__(self, cls, fields):
        """ Every object has a class. """
        self.cls = cls
        self._fields = fields

    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        return self._read_dict(fieldname)

    def write_attr(self, fieldname, value):
        """ write field 'fieldname' into the object """
        self._write_dict(fieldname, value)

    def isinstance(self, cls):
        """ return True if the object is an instance of class cls """
        return self.cls.issubclass(cls)

    def callmethod(self, methname, *args):
        """ call method 'methname' with arguments 'args' on object """
        meth = self.cls._read_from_class(methname)
        return meth(self, *args)

    def _read_dict(self, fieldname):
        """ read an field 'fieldname' out of the object's dict """
        return self._fields.get(fieldname, MISSING)

    def _write_dict(self, fieldname, value):
        """ write a field 'fieldname' into the object's dict """
        self._fields[fieldname] = value

MISSING = object()

Base类实现存储一个对象的类,和含有该对象的字段值的字典。现在,我们需要实现ClassInstance的构造Instance采用的类实例化和初始化fieldsdict为一个空的字典。否则,Instance仅仅是一个非常薄的子类,各地Base不添加任何额外的功能。

的构造Class采用了类,基础类,类和元类的字典的名称。对于类,该字段由对象模型的用户传递到构造。类构造函数还需要一个基类,测试到目前为止还不需要,但我们将利用在下一节。

class Instance(Base):
    """Instance of a user-defined class. """

    def __init__(self, cls):
        assert isinstance(cls, Class)
        Base.__init__(self, cls, {})


class Class(Base):
    """ A User-defined class. """

    def __init__(self, name, base_class, fields, metaclass):
        Base.__init__(self, metaclass, fields)
        self.name = name
        self.base_class = base_class

由于类也是一种对象的,它们(间接)继承Base因此,类需要是另一个类的一个实例:元类。

现在,我们的第一个测试通过差不多。唯一缺少的位是基类的定义TYPEOBJECT,它们是两个实例Class对于这些问题,我们将进行从Smalltalk的模型,它有一个相当复杂的元类系统是一大飞跃。相反,我们将使用ObjVlisp推出的模型1,它是由Python采纳。

在ObjVlisp模型,OBJECTTYPE相互交织。OBJECT是基类的所有类的,这意味着它没有基类。TYPE是的子类OBJECT默认情况下,每类的一个实例TYPE特别是,无论是TYPEOBJECT是实例TYPE但是,程序员也可以继承TYPE,以使一个新的元类:

# set up the base hierarchy as in Python (the ObjVLisp model)
# the ultimate base class is OBJECT
OBJECT = Class(name="object", base_class=None, fields={}, metaclass=None)
# TYPE is a subclass of OBJECT
TYPE = Class(name="type", base_class=OBJECT, fields={}, metaclass=None)
# TYPE is an instance of itself
TYPE.cls = TYPE
# OBJECT is an instance of TYPE
OBJECT.cls = TYPE

要定义新的元类,这是不够的子类TYPE然而,在本章中,我们不会那么做的休息; 我们简单地一直使用TYPE,因为每个类的元类。

图14.1  - 继承

图14.1 - 继承

现在,第一个测试通过。第二个测试检查在类,读取和写入属性的作品也是如此。这很容易写,并立即通过。

def test_read_write_field_class():
    # classes are objects too
    # Python code
    class A(object):
        pass
    A.a = 1
    assert A.a == 1
    A.a = 6
    assert A.a == 6

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={"a": 1}, metaclass=TYPE)
    assert A.read_attr("a") == 1
    A.write_attr("a", 5)
    assert A.read_attr("a") == 5

isinstance 检查

到目前为止,我们还没有考虑一个事实,即对象具有类优势。接下来的测试实现了isinstance机械:

def test_isinstance():
    # Python code
    class A(object):
        pass
    class B(A):
        pass
    b = B()
    assert isinstance(b, B)
    assert isinstance(b, A)
    assert isinstance(b, object)
    assert not isinstance(b, type)

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={}, metaclass=TYPE)
    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    b = Instance(B)
    assert b.isinstance(B)
    assert b.isinstance(A)
    assert b.isinstance(OBJECT)
    assert not b.isinstance(TYPE)

要检查的对象是否obj是某个类的实例cls,它是足够的,以检查是否cls是类的父类obj,或类本身。要检查类是否是另一个类的父类,这个类的父类的链走去。当且仅当在链中找到其他的类,它是一个超类。一类,包括类本身的超类的链,称为该类的“方法解析顺序”。它可以很容易地进行递归运算:

class Class(Base):
    ...

    def method_resolution_order(self):
        """ compute the method resolution order of the class """
        if self.base_class is None:
            return [self]
        else:
            return [self] + self.base_class.method_resolution_order()

    def issubclass(self, cls):
        """ is self a subclass of cls? """
        return cls in self.method_resolution_order()

与该代码,测试通过。

调用方法

为对象模型的第一个版本将余下的丢失功能是调用对象的方法的能力。在本章中,我们将实现一个简单的单继承模式。

def test_callmethod_simple():
    # Python code
    class A(object):
        def f(self):
            return self.x + 1
    obj = A()
    obj.x = 1
    assert obj.f() == 2

    class B(A):
        pass
    obj = B()
    obj.x = 1
    assert obj.f() == 2 # works on subclass too

    # Object model code
    def f_A(self):
        return self.read_attr("x") + 1
    A = Class(name="A", base_class=OBJECT, fields={"f": f_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 1)
    assert obj.callmethod("f") == 2

    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 2)
    assert obj.callmethod("f") == 3

To find the correct implementation of a method that is sent to an object, we walk the method resolution order of the class of the object. The first method found in the dictionary of one of the classes in the method resolution order is called:

class Class(Base):
    ...

    def _read_from_class(self, methname):
        for cls in self.method_resolution_order():
            if methname in cls._fields:
                return cls._fields[methname]
        return MISSING

Together with the code for callmethod in the Base implementation, this passes the test.

To make sure that methods with arguments work as well, and that overriding of methods is implemented correctly, we can use the following slightly more complex test, which already passes:

def test_callmethod_subclassing_and_arguments():
    # Python code
    class A(object):
        def g(self, arg):
            return self.x + arg
    obj = A()
    obj.x = 1
    assert obj.g(4) == 5

    class B(A):
        def g(self, arg):
            return self.x + arg * 2
    obj = B()
    obj.x = 4
    assert obj.g(4) == 12

    # Object model code
    def g_A(self, arg):
        return self.read_attr("x") + arg
    A = Class(name="A", base_class=OBJECT, fields={"g": g_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 1)
    assert obj.callmethod("g", 4) == 5

    def g_B(self, arg):
        return self.read_attr("x") + arg * 2
    B = Class(name="B", base_class=A, fields={"g": g_B}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 4)
    assert obj.callmethod("g", 4) == 12

Attribute-Based Model

Now that the simplest version of our object model is working, we can think of ways to change it. This section will introduce the distinction between a method-based model and an attribute-based model. This is one of the core differences between Smalltalk, Ruby, and JavaScript on the one hand and Python and Lua on the other hand.

The method-based model has the method-calling as the primitive of program execution:

result = obj.f(arg1, arg2)

The attribute-based model splits up method calling into two steps: looking up an attribute and calling the result:

method = obj.f
result = method(arg1, arg2)

This difference can be shown in the following test:

def test_bound_method():
    # Python code
    class A(object):
        def f(self, a):
            return self.x + a + 1
    obj = A()
    obj.x = 2
    m = obj.f
    assert m(4) == 7

    class B(A):
        pass
    obj = B()
    obj.x = 1
    m = obj.f
    assert m(10) == 12 # works on subclass too

    # Object model code
    def f_A(self, a):
        return self.read_attr("x") + a + 1
    A = Class(name="A", base_class=OBJECT, fields={"f": f_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 2)
    m = obj.read_attr("f")
    assert m(4) == 7

    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 1)
    m = obj.read_attr("f")
    assert m(10) == 12

While the setup is the same as the corresponding test for method calls, the way that the methods are called is different. First, the attribute with the name of the method is looked up on the object. The result of that lookup operation is a bound method, an object that encapsulates both the object as well as the function found in the class. Next, that bound method is called with a call operation2.

To implement this behaviour, we need to change the Base.read_attr implementation. If the attribute is not found in the dictionary, it is looked for in the class. If it is found in the class, and the attribute is a callable, it needs to be turned into a bound method. To emulate a bound method we simply use a closure. In addition to changing Base.read_attr we can also change Base.callmethod to use the new approach to calling methods to make sure all the tests still pass.

class Base(object):
    ...
    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        result = self._read_dict(fieldname)
        if result is not MISSING:
            return result
        result = self.cls._read_from_class(fieldname)
        if _is_bindable(result):
            return _make_boundmethod(result, self)
        if result is not MISSING:
            return result
        raise AttributeError(fieldname)

    def callmethod(self, methname, *args):
        """ call method 'methname' with arguments 'args' on object """
        meth = self.read_attr(methname)
        return meth(*args)

def _is_bindable(meth):
    return callable(meth)

def _make_boundmethod(meth, self):
    def bound(*args):
        return meth(self, *args)
    return bound

The rest of the code does not need to be changed at all.

Meta-Object Protocols

In addition to "normal" methods that are called directly by the program, many dynamic languages support special methods. These are methods that aren't meant to be called directly but will be called by the object system. In Python those special methods usually have names that start and end with two underscores; e.g., __init__. Special methods can be used to override primitive operations and provide custom behaviour for them instead. Thus, they are hooks that tell the object model machinery exactly how to do certain things. Python's object model has dozens of special methods.

Meta-object protocols were introduced by Smalltalk, but were used even more by the object systems for Common Lisp, such as CLOS. That is also where the name meta-object protocol, for collections of special methods, was coined3.

In this chapter we will add three such meta-hooks to our object model. They are used to fine-tune what exactly happens when reading and writing attributes. The special methods we will add first are __getattr__ and __setattr__, which closely follow the behaviour of Python's namesakes.

Customizing Reading and Writing and Attribute

The method __getattr__ is called by the object model when the attribute that is being looked up is not found by normal means; i.e., neither on the instance nor on the class. It gets the name of the attribute being looked up as an argument. An equivalent of the __getattr__ special method was part of early Smalltalk4 systems under the name doesNotUnderstand:.

The case of __setattr__ is a bit different. Since setting an attribute always creates it, __setattr__ is always called when setting an attribute. To make sure that a __setattr__ method always exists, the OBJECT class has a definition of __setattr__. This base implementation simply does what setting an attribute did so far, which is write the attribute into the object's dictionary. This also makes it possible for a user-defined __setattr__ to delegate to the base OBJECT.__setattr__ in some cases.

A test for these two special methods is the following:

def test_getattr():
    # Python code
    class A(object):
        def __getattr__(self, name):
            if name == "fahrenheit":
                return self.celsius * 9. / 5. + 32
            raise AttributeError(name)

        def __setattr__(self, name, value):
            if name == "fahrenheit":
                self.celsius = (value - 32) * 5. / 9.
            else:
                # call the base implementation
                object.__setattr__(self, name, value)
    obj = A()
    obj.celsius = 30
    assert obj.fahrenheit == 86 # test __getattr__
    obj.celsius = 40
    assert obj.fahrenheit == 104

    obj.fahrenheit = 86 # test __setattr__
    assert obj.celsius == 30
    assert obj.fahrenheit == 86

    # Object model code
    def __getattr__(self, name):
        if name == "fahrenheit":
            return self.read_attr("celsius") * 9. / 5. + 32
        raise AttributeError(name)
    def __setattr__(self, name, value):
        if name == "fahrenheit":
            self.write_attr("celsius", (value - 32) * 5. / 9.)
        else:
            # call the base implementation
            OBJECT.read_attr("__setattr__")(self, name, value)

    A = Class(name="A", base_class=OBJECT,
              fields={"__getattr__": __getattr__, "__setattr__": __setattr__},
              metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("celsius", 30)
    assert obj.read_attr("fahrenheit") == 86 # test __getattr__
    obj.write_attr("celsius", 40)
    assert obj.read_attr("fahrenheit") == 104
    obj.write_attr("fahrenheit", 86) # test __setattr__
    assert obj.read_attr("celsius") == 30
    assert obj.read_attr("fahrenheit") == 86

To pass these tests, the Base.read_attr and Base.write_attr methods need to be changed:

class Base(object):
    ...

    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        result = self._read_dict(fieldname)
        if result is not MISSING:
            return result
        result = self.cls._read_from_class(fieldname)
        if _is_bindable(result):
            return _make_boundmethod(result, self)
        if result is not MISSING:
            return result
        meth = self.cls._read_from_class("__getattr__")
        if meth is not MISSING:
            return meth(self, fieldname)
        raise AttributeError(fieldname)

    def write_attr(self, fieldname, value):
        """ write field 'fieldname' into the object """
        meth = self.cls._read_from_class("__setattr__")
        return meth(self, fieldname, value)

The procedure for reading an attribute is changed to call the __getattr__ method with the fieldname as an argument, if the method exists, instead of raising an error. Note that __getattr__ (and indeed all special methods in Python) is looked up on the class only, instead of recursively calling self.read_attr("__getattr__"). That is because the latter would lead to an infinite recursion of read_attr if __getattr__ were not defined on the object.

Writing of attributes is fully deferred to the __setattr__ method. To make this work, OBJECT needs to have a __setattr__ method that calls the default behaviour, as follows:

def OBJECT__setattr__(self, fieldname, value):
    self._write_dict(fieldname, value)
OBJECT = Class("object", None, {"__setattr__": OBJECT__setattr__}, None)

The behaviour of OBJECT__setattr__ is like the previous behaviour of write_attr. With these modifications, the new test passes.

Descriptor Protocol

The above test to provide automatic conversion between different temperature scales worked but was annoying to write, as the attribute name needed to be checked explicitly in the __getattr__ and __setattr__ methods. To get around this, the descriptor protocol was introduced in Python.

While __getattr__ and __setattr__ are called on the object the attribute is being read from, the descriptor protocol calls a special method on the result of getting an attribute from an object. It can be seen as the generalization of binding a method to an object – and indeed, binding a method to an object is done using the descriptor protocol. In addition to bound methods, the most important use case for the descriptor protocol in Python is the implementation of staticmethodclassmethod and property.

In this subsection we will introduce the subset of the descriptor protocol which deals with binding objects. This is done using the special method __get__, and is best explained with an example test:

def test_get():
    # Python code
    class FahrenheitGetter(object):
        def __get__(self, inst, cls):
            return inst.celsius * 9. / 5. + 32

    class A(object):
        fahrenheit = FahrenheitGetter()
    obj = A()
    obj.celsius = 30
    assert obj.fahrenheit == 86

    # Object model code
    class FahrenheitGetter(object):
        def __get__(self, inst, cls):
            return inst.read_attr("celsius") * 9. / 5. + 32

    A = Class(name="A", base_class=OBJECT,
              fields={"fahrenheit": FahrenheitGetter()},
              metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("celsius", 30)
    assert obj.read_attr("fahrenheit") == 86

The __get__ method is called on the FahrenheitGetter instance after that has been looked up in the class of obj. The arguments to __get__ are the instance where the lookup was done5.

Implementing this behaviour is easy. We simply need to change _is_bindable and _make_boundmethod:

def _is_bindable(meth):
    return hasattr(meth, "__get__")

def _make_boundmethod(meth, self):
    return meth.__get__(self, None)

This makes the test pass. The previous tests about bound methods also still pass, as Python's functions have a __get__ method that returns a bound method object.

In practice, the descriptor protocol is quite a lot more complex. It also supports __set__ to override what setting an attribute means on a per-attribute basis. Also, the current implementation is cutting a few corners. Note that _make_boundmethod calls the method __get__ on the implementation level, instead of using meth.read_attr("__get__"). This is necessary since our object model borrows functions and thus methods from Python, instead of having a representation for them that uses the object model. A more complete object model would have to solve this problem.

Instance Optimization

While the first three variants of the object model were concerned with behavioural variation, in this last section we will look at an optimization without any behavioural impact. This optimization is called maps and was pioneered in the VM for the Self programming language6. It is still one of the most important object model optimizations: it's used in PyPy and all modern JavaScript VMs, such as V8 (where the optimization is called hidden classes).

The optimization starts from the following observation: In the object model as implemented so far all instances use a full dictionary to store their attributes. A dictionary is implemented using a hash map, which takes a lot of memory. In addition, the dictionaries of instances of the same class typically have the same keys as well. For example, given a class Point, the keys of all its instances' dictionaries are likely "x" and "y".

The maps optimization exploits this fact. It effectively splits up the dictionary of every instance into two parts. A part storing the keys (the map) that can be shared between all instances with the same set of attribute names. The instance then only stores a reference to the shared map and the values of the attributes in a list (which is a lot more compact in memory than a dictionary). The map stores a mapping from attribute names to indexes into that list.

A simple test of that behaviour looks like this:

def test_maps():
    # white box test inspecting the implementation
    Point = Class(name="Point", base_class=OBJECT, fields={}, metaclass=TYPE)
    p1 = Instance(Point)
    p1.write_attr("x", 1)
    p1.write_attr("y", 2)
    assert p1.storage == [1, 2]
    assert p1.map.attrs == {"x": 0, "y": 1}

    p2 = Instance(Point)
    p2.write_attr("x", 5)
    p2.write_attr("y", 6)
    assert p1.map is p2.map
    assert p2.storage == [5, 6]

    p1.write_attr("x", -1)
    p1.write_attr("y", -2)
    assert p1.map is p2.map
    assert p1.storage == [-1, -2]

    p3 = Instance(Point)
    p3.write_attr("x", 100)
    p3.write_attr("z", -343)
    assert p3.map is not p1.map
    assert p3.map.attrs == {"x": 0, "z": 1}

Note that this is a different flavour of test than the ones we've written before. All previous tests just tested the behaviour of the classes via the exposed interfaces. This test instead checks the implementation details of the Instance class by reading internal attributes and comparing them to predefined values. Therefore this test can be called a white-box test.

The attrs attribute of the map of p1 describes the layout of the instance as having two attributes "x" and "y" which are stored at position 0 and 1 of the storage of p1. Making a second instance p2 and adding to it the same attributes in the same order will make it end up with the same map. If, on the other hand, a different attribute is added, the map can of course not be shared.

The Map class looks like this:

class Map(object):
    def __init__(self, attrs):
        self.attrs = attrs
        self.next_maps = {}

    def get_index(self, fieldname):
        return self.attrs.get(fieldname, -1)

    def next_map(self, fieldname):
        assert fieldname not in self.attrs
        if fieldname in self.next_maps:
            return self.next_maps[fieldname]
        attrs = self.attrs.copy()
        attrs[fieldname] = len(attrs)
        result = self.next_maps[fieldname] = Map(attrs)
        return result

EMPTY_MAP = Map({})

Maps have two methods, get_index and next_map. The former is used to find the index of an attribute name in the object's storage. The latter is used when a new attribute is added to an object. In that case the object needs to use a different map, which next_map computes. The method uses the next_maps dictionary to cache already created maps. That way, objects that have the same layout also end up using the same Map object.

图14.2  - 地图转换

Figure 14.2 - Map transitions

The Instance implementation that uses maps looks like this:

class Instance(Base):
    """Instance of a user-defined class. """

    def __init__(self, cls):
        assert isinstance(cls, Class)
        Base.__init__(self, cls, None)
        self.map = EMPTY_MAP
        self.storage = []

    def _read_dict(self, fieldname):
        index = self.map.get_index(fieldname)
        if index == -1:
            return MISSING
        return self.storage[index]

    def _write_dict(self, fieldname, value):
        index = self.map.get_index(fieldname)
        if index != -1:
            self.storage[index] = value
        else:
            new_map = self.map.next_map(fieldname)
            self.storage.append(value)
            self.map = new_map

The class now passes None as the fields dictionary to Base, as Instance will store the content of the dictionary in another way. Therefore it needs to override the _read_dict and _write_dict methods. In a real implementation, we would refactor the Base class so that it is no longer responsible for storing the fields dictionary, but for now having instances store None there is good enough.

A newly created instance starts out using the EMPTY_MAP, which has no attributes, and empty storage. To implement _read_dict, the instance's map is asked for the index of the attribute name. Then the corresponding entry of the storage list is returned.

Writing into the fields dictionary has two cases. On the one hand the value of an existing attribute can be changed. This is done by simply changing the storage at the corresponding index. On the other hand, if the attribute does not exist yet, a map transition (Figure 14.2) is needed using the next_map method. The value of the new attribute is appended to the storage list.

What does this optimization achieve? It optimizes use of memory in the common case where there are many instances with the same layout. It is not a universal optimization: code that creates instances with wildly different sets of attributes will have a larger memory footprint than if we just use dictionaries.

This is a common problem when optimizing dynamic languages. It is often not possible to find optimizations that are faster or use less memory in all cases. In practice, the optimizations chosen apply to how the language is typically used, while potentially making behaviour worse for programs that use extremely dynamic features.

Another interesting aspect of maps is that, while here they only optimize for memory use, in actual VMs that use a just-in-time (JIT) compiler they also improve the performance of the program. To achieve that, the JIT uses the maps to compile attribute lookups to a lookup in the objects' storage at a fixed offset, getting rid of all dictionary lookups completely7.

Potential Extensions

It is easy to extend our object model and experiment with various language design choices. Here are some possibilities:

  • The easiest thing to do is to add further special methods. Some easy and interesting ones to add are __init____getattribute____set__.

  • The model can be very easily extended to support multiple inheritance. To do this, every class would get a list of base classes. Then the Class.method_resolution_order method would need to be changed to support looking up methods. A simple method resolution order could be computed using a depth-first search with removal of duplicates. A more complicated but better one is the C3 algorithm, which adds better handling in the base of diamond-shaped multiple inheritance hierarchies and rejects insensible inheritance patterns.

  • A more radical change is to switch to a prototype model, which involves the removal of the distinction between classes and instances.

Conclusions

Some of the core aspects of the design of an object-oriented programming language are the details of its object model. Writing small object model prototypes is an easy and fun way to understand the inner workings of existing languages better and to get insights into the design space of object-oriented languages. Playing with object models is a good way to experiment with different language design ideas without having to worry about the more boring parts of language implementation, such as parsing and executing code.

Such object models can also be useful in practice, not just as vehicles for experimentation. They can be embedded in and used from other languages. Examples of this approach are common: the GObject object model, written in C, that's used in GLib and other Gnome libraries; or the various class system implementations in JavaScript.


  1. P. Cointe, “Metaclasses are first class: The ObjVlisp Model,” SIGPLAN Not, vol. 22, no. 12, pp. 156–162, 1987.

  2. It seems that the attribute-based model is conceptually more complex, because it needs both method lookup and call. In practice, calling something is defined by looking up and calling a special attribute __call__, so conceptual simplicity is regained. This won't be implemented in this chapter, however.)

  3. G. Kiczales, J. des Rivieres, and D. G. Bobrow, The Art of the Metaobject Protocol. Cambridge, Mass: The MIT Press, 1991.

  4. A. Goldberg, Smalltalk-80: The Language and its Implementation. Addison-Wesley, 1983, page 61.

  5. In Python the second argument is the class where the attribute was found, though we will ignore that here.

  6. C. Chambers, D. Ungar, and E. Lee, “An efficient implementation of SELF, a dynamically-typed object-oriented language based on prototypes,” in OOPSLA, 1989, vol. 24.

  7. 其工作原理是超出了本章的范围。我试着给一个合理的可读考虑其在论文中,我几年前写的。它使用一个对象模型,它基本上与一个本章中的变体:CF Bolz,A.铜镍,M.Fijałkowski,M. Leuschel,S.佩德罗尼,和A. Rigo旅馆,“运行时反馈的元跟踪JIT在第六届研讨会论文集高效的动态语言,”关于实施,编制,面向对象的语言,程序和系统,纽约,NY,USA,2011,第9的优化:1-9:8。


简单对象模型

卡尔·弗里德里希·Bolz

卡尔·弗里德里希·Bolz是在伦敦大学国王学院的研究人员,并在各种动态语言的实现和优化广泛兴趣。他是PyPy / RPython的核心作者之一,并已在序言,球拍,Smalltalk中,PHP和Ruby的实现工作。@cfbolz在Twitter上。

介绍

面向对象编程是目前使用的主要的编程范式之一,有很多的提供某种形式的面向对象的语言。虽然从表面上看,不同的面向对象编程语言提供给程序员的机制非常相似,细节可以改变很多。大多数语言的共性是物体的存在和某种类型的继承机制。类,但是,是不是每个语言直接支持的功能。例如,在基于原型的语言,如自我或JavaScript,类的概念不存在,对象,而不是直接相互继承。

了解不同的对象模型之间的差异可以是有趣。他们往往揭示不同语言之间的家族相似。它可以把一个新的语言模型成其他语言的模型的上下文中,既要迅速了解新模式,并获得了编程语言设计空间,更好的感觉。

本章探讨了一系列非常简单对象模型的实施。它开始了简单的实例和类,并呼吁实例方法的能力。这是一家成立于早期的面向对象语言,如Simula的67和Smalltalk的“经典”面向对象的方法。这个模型是逐步延长,下两步探索不同语言的设计选择,最后一步提高对象模型的效率。最终的模型是不是一个真正的语言,但是Python的对象模型的理想化,简化的版本。

本章介绍的对象模型将在Python中实现。代码工作在两个的Python 2.7和3.4。要了解的行为和设计选择更好的,本章也将呈现为对象模型试验。该测试可与py.test或鼻子上运行。

Python中的作为实现语言的选择是非常不现实的。“真正的”虚拟机像C / C ++低级别的语言通常实现的,需要大量的关注工程的细节,使之高效。然而,简单的实现语言可以更容易地专注于实际行为的差异,而不是陷入困境的实现细节。

基于模型的方法

对象模型,我们将在开始时是该Smalltalk中的一个非常简化的版本。Smalltalk的是一个面向对象的编程语言,由Alan Kay的组在Xerox PARC在20世纪70年代设计的。它推广面向对象的编程,并在今天的编程语言中的许多功能的来源。一个Smalltalk的语言设计的核心原则之一是“一切都是对象”。Smalltalk的当今使用最直接的继任者是红宝石,它采用了更类似C的语法,但保留了大部分Smalltalk的对象模型。

本节中的对象模型将有类和它们的实例,读写属性为对象的能力,来调用对象的方法的能力,并为一类的能力是另一个类的子类。从一开始,课程将是可以有自己的属性和方法完全普通对象。

关于术语的说明:在这一章中,我会用这个词“实例”的意思 - “一个对象,它是不是一个类”。

一个好方法开始是写一个测试来指定要被实现的行为应该是什么。本章介绍的所有测试将包括两个部分。首先,有点普通Python代码定义和使用了几类,并利用Python的对象模型的日益先进的功能。其次,使用对象模型相应的测试,我们将在本章中实现的,而不是普通的Python类。

使用普通的Python类和使用我们的对象模型之间的映射关系将手动测试来完成。例如,而不是写obj.attribute在Python,在对象模型中,我们将使用的方法obj.read_attr("attribute")这个映射会,在真实的语言实现,可以通过语言的翻译,或者编译器完成。

本章中进一步简化的是,我们使实现该对象模型的代码和被用于写入的对象所使用的方法的代码之间没有明显的区别。在实际系统中,两人便经常在不同的编程语言实现。

让我们从一个简单的测试阅读和写作对象字段。

def test_read_write_field():
    # Python code
    class A(object):
        pass
    obj = A()
    obj.a = 1
    assert obj.a == 1

    obj.b = 5
    assert obj.a == 1
    assert obj.b == 5

    obj.a = 2
    assert obj.a == 2
    assert obj.b == 5

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("a", 1)
    assert obj.read_attr("a") == 1

    obj.write_attr("b", 5)
    assert obj.read_attr("a") == 1
    assert obj.read_attr("b") == 5

    obj.write_attr("a", 2)
    assert obj.read_attr("a") == 2
    assert obj.read_attr("b") == 5

本次测试使用的是我们必须实现三两件事。这些类ClassInstance分别代表了我们的对象模型的类和实例。有类的两个特殊情况:OBJECTTYPEOBJECT对应于object在Python和是最终的基类的继承层次结构的。TYPE对应于type在Python和是所有类的类型。

做任何事情的情况ClassInstance他们从一个共享的基类继承实现共享接口Base暴露了一些方法:

class Base(object):
    """ The base class that all of the object model classes inherit from. """

    def __init__(self, cls, fields):
        """ Every object has a class. """
        self.cls = cls
        self._fields = fields

    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        return self._read_dict(fieldname)

    def write_attr(self, fieldname, value):
        """ write field 'fieldname' into the object """
        self._write_dict(fieldname, value)

    def isinstance(self, cls):
        """ return True if the object is an instance of class cls """
        return self.cls.issubclass(cls)

    def callmethod(self, methname, *args):
        """ call method 'methname' with arguments 'args' on object """
        meth = self.cls._read_from_class(methname)
        return meth(self, *args)

    def _read_dict(self, fieldname):
        """ read an field 'fieldname' out of the object's dict """
        return self._fields.get(fieldname, MISSING)

    def _write_dict(self, fieldname, value):
        """ write a field 'fieldname' into the object's dict """
        self._fields[fieldname] = value

MISSING = object()

Base类实现存储一个对象的类,和含有该对象的字段值的字典。现在,我们需要实现ClassInstance的构造Instance采用的类实例化和初始化fieldsdict为一个空的字典。否则,Instance仅仅是一个非常薄的子类,各地Base不添加任何额外的功能。

的构造Class采用了类,基础类,类和元类的字典的名称。对于类,该字段由对象模型的用户传递到构造。类构造函数还需要一个基类,测试到目前为止还不需要,但我们将利用在下一节。

class Instance(Base):
    """Instance of a user-defined class. """

    def __init__(self, cls):
        assert isinstance(cls, Class)
        Base.__init__(self, cls, {})


class Class(Base):
    """ A User-defined class. """

    def __init__(self, name, base_class, fields, metaclass):
        Base.__init__(self, metaclass, fields)
        self.name = name
        self.base_class = base_class

由于类也是一种对象的,它们(间接)继承Base因此,类需要是另一个类的一个实例:元类。

现在,我们的第一个测试通过差不多。唯一缺少的位是基类的定义TYPEOBJECT,它们是两个实例Class对于这些问题,我们将进行从Smalltalk的模型,它有一个相当复杂的元类系统是一大飞跃。相反,我们将使用ObjVlisp推出的模型1,它是由Python采纳。

在ObjVlisp模型,OBJECTTYPE相互交织。OBJECT是基类的所有类的,这意味着它没有基类。TYPE是的子类OBJECT默认情况下,每类的一个实例TYPE特别是,无论是TYPEOBJECT是实例TYPE但是,程序员也可以继承TYPE,以使一个新的元类:

# set up the base hierarchy as in Python (the ObjVLisp model)
# the ultimate base class is OBJECT
OBJECT = Class(name="object", base_class=None, fields={}, metaclass=None)
# TYPE is a subclass of OBJECT
TYPE = Class(name="type", base_class=OBJECT, fields={}, metaclass=None)
# TYPE is an instance of itself
TYPE.cls = TYPE
# OBJECT is an instance of TYPE
OBJECT.cls = TYPE

要定义新的元类,这是不够的子类TYPE然而,在本章中,我们不会那么做的休息; 我们简单地一直使用TYPE,因为每个类的元类。

图14.1  - 继承

图14.1 - 继承

现在,第一个测试通过。第二个测试检查在类,读取和写入属性的作品也是如此。这很容易写,并立即通过。

def test_read_write_field_class():
    # classes are objects too
    # Python code
    class A(object):
        pass
    A.a = 1
    assert A.a == 1
    A.a = 6
    assert A.a == 6

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={"a": 1}, metaclass=TYPE)
    assert A.read_attr("a") == 1
    A.write_attr("a", 5)
    assert A.read_attr("a") == 5

isinstance 检查

到目前为止,我们还没有考虑一个事实,即对象具有类优势。接下来的测试实现了isinstance机械:

def test_isinstance():
    # Python code
    class A(object):
        pass
    class B(A):
        pass
    b = B()
    assert isinstance(b, B)
    assert isinstance(b, A)
    assert isinstance(b, object)
    assert not isinstance(b, type)

    # Object model code
    A = Class(name="A", base_class=OBJECT, fields={}, metaclass=TYPE)
    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    b = Instance(B)
    assert b.isinstance(B)
    assert b.isinstance(A)
    assert b.isinstance(OBJECT)
    assert not b.isinstance(TYPE)

要检查的对象是否obj是某个类的实例cls,它是足够的,以检查是否cls是类的父类obj,或类本身。要检查类是否是另一个类的父类,这个类的父类的链走去。当且仅当在链中找到其他的类,它是一个超类。一类,包括类本身的超类的链,称为该类的“方法解析顺序”。它可以很容易地进行递归运算:

class Class(Base):
    ...

    def method_resolution_order(self):
        """ compute the method resolution order of the class """
        if self.base_class is None:
            return [self]
        else:
            return [self] + self.base_class.method_resolution_order()

    def issubclass(self, cls):
        """ is self a subclass of cls? """
        return cls in self.method_resolution_order()

与该代码,测试通过。

调用方法

为对象模型的第一个版本将余下的丢失功能是调用对象的方法的能力。在本章中,我们将实现一个简单的单继承模式。

def test_callmethod_simple():
    # Python code
    class A(object):
        def f(self):
            return self.x + 1
    obj = A()
    obj.x = 1
    assert obj.f() == 2

    class B(A):
        pass
    obj = B()
    obj.x = 1
    assert obj.f() == 2 # works on subclass too

    # Object model code
    def f_A(self):
        return self.read_attr("x") + 1
    A = Class(name="A", base_class=OBJECT, fields={"f": f_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 1)
    assert obj.callmethod("f") == 2

    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 2)
    assert obj.callmethod("f") == 3

To find the correct implementation of a method that is sent to an object, we walk the method resolution order of the class of the object. The first method found in the dictionary of one of the classes in the method resolution order is called:

class Class(Base):
    ...

    def _read_from_class(self, methname):
        for cls in self.method_resolution_order():
            if methname in cls._fields:
                return cls._fields[methname]
        return MISSING

Together with the code for callmethod in the Base implementation, this passes the test.

To make sure that methods with arguments work as well, and that overriding of methods is implemented correctly, we can use the following slightly more complex test, which already passes:

def test_callmethod_subclassing_and_arguments():
    # Python code
    class A(object):
        def g(self, arg):
            return self.x + arg
    obj = A()
    obj.x = 1
    assert obj.g(4) == 5

    class B(A):
        def g(self, arg):
            return self.x + arg * 2
    obj = B()
    obj.x = 4
    assert obj.g(4) == 12

    # Object model code
    def g_A(self, arg):
        return self.read_attr("x") + arg
    A = Class(name="A", base_class=OBJECT, fields={"g": g_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 1)
    assert obj.callmethod("g", 4) == 5

    def g_B(self, arg):
        return self.read_attr("x") + arg * 2
    B = Class(name="B", base_class=A, fields={"g": g_B}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 4)
    assert obj.callmethod("g", 4) == 12

Attribute-Based Model

Now that the simplest version of our object model is working, we can think of ways to change it. This section will introduce the distinction between a method-based model and an attribute-based model. This is one of the core differences between Smalltalk, Ruby, and JavaScript on the one hand and Python and Lua on the other hand.

The method-based model has the method-calling as the primitive of program execution:

result = obj.f(arg1, arg2)

The attribute-based model splits up method calling into two steps: looking up an attribute and calling the result:

method = obj.f
result = method(arg1, arg2)

This difference can be shown in the following test:

def test_bound_method():
    # Python code
    class A(object):
        def f(self, a):
            return self.x + a + 1
    obj = A()
    obj.x = 2
    m = obj.f
    assert m(4) == 7

    class B(A):
        pass
    obj = B()
    obj.x = 1
    m = obj.f
    assert m(10) == 12 # works on subclass too

    # Object model code
    def f_A(self, a):
        return self.read_attr("x") + a + 1
    A = Class(name="A", base_class=OBJECT, fields={"f": f_A}, metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("x", 2)
    m = obj.read_attr("f")
    assert m(4) == 7

    B = Class(name="B", base_class=A, fields={}, metaclass=TYPE)
    obj = Instance(B)
    obj.write_attr("x", 1)
    m = obj.read_attr("f")
    assert m(10) == 12

While the setup is the same as the corresponding test for method calls, the way that the methods are called is different. First, the attribute with the name of the method is looked up on the object. The result of that lookup operation is a bound method, an object that encapsulates both the object as well as the function found in the class. Next, that bound method is called with a call operation2.

To implement this behaviour, we need to change the Base.read_attr implementation. If the attribute is not found in the dictionary, it is looked for in the class. If it is found in the class, and the attribute is a callable, it needs to be turned into a bound method. To emulate a bound method we simply use a closure. In addition to changing Base.read_attr we can also change Base.callmethod to use the new approach to calling methods to make sure all the tests still pass.

class Base(object):
    ...
    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        result = self._read_dict(fieldname)
        if result is not MISSING:
            return result
        result = self.cls._read_from_class(fieldname)
        if _is_bindable(result):
            return _make_boundmethod(result, self)
        if result is not MISSING:
            return result
        raise AttributeError(fieldname)

    def callmethod(self, methname, *args):
        """ call method 'methname' with arguments 'args' on object """
        meth = self.read_attr(methname)
        return meth(*args)

def _is_bindable(meth):
    return callable(meth)

def _make_boundmethod(meth, self):
    def bound(*args):
        return meth(self, *args)
    return bound

The rest of the code does not need to be changed at all.

Meta-Object Protocols

In addition to "normal" methods that are called directly by the program, many dynamic languages support special methods. These are methods that aren't meant to be called directly but will be called by the object system. In Python those special methods usually have names that start and end with two underscores; e.g., __init__. Special methods can be used to override primitive operations and provide custom behaviour for them instead. Thus, they are hooks that tell the object model machinery exactly how to do certain things. Python's object model has dozens of special methods.

Meta-object protocols were introduced by Smalltalk, but were used even more by the object systems for Common Lisp, such as CLOS. That is also where the name meta-object protocol, for collections of special methods, was coined3.

In this chapter we will add three such meta-hooks to our object model. They are used to fine-tune what exactly happens when reading and writing attributes. The special methods we will add first are __getattr__ and __setattr__, which closely follow the behaviour of Python's namesakes.

Customizing Reading and Writing and Attribute

The method __getattr__ is called by the object model when the attribute that is being looked up is not found by normal means; i.e., neither on the instance nor on the class. It gets the name of the attribute being looked up as an argument. An equivalent of the __getattr__ special method was part of early Smalltalk4 systems under the name doesNotUnderstand:.

The case of __setattr__ is a bit different. Since setting an attribute always creates it, __setattr__ is always called when setting an attribute. To make sure that a __setattr__ method always exists, the OBJECT class has a definition of __setattr__. This base implementation simply does what setting an attribute did so far, which is write the attribute into the object's dictionary. This also makes it possible for a user-defined __setattr__ to delegate to the base OBJECT.__setattr__ in some cases.

A test for these two special methods is the following:

def test_getattr():
    # Python code
    class A(object):
        def __getattr__(self, name):
            if name == "fahrenheit":
                return self.celsius * 9. / 5. + 32
            raise AttributeError(name)

        def __setattr__(self, name, value):
            if name == "fahrenheit":
                self.celsius = (value - 32) * 5. / 9.
            else:
                # call the base implementation
                object.__setattr__(self, name, value)
    obj = A()
    obj.celsius = 30
    assert obj.fahrenheit == 86 # test __getattr__
    obj.celsius = 40
    assert obj.fahrenheit == 104

    obj.fahrenheit = 86 # test __setattr__
    assert obj.celsius == 30
    assert obj.fahrenheit == 86

    # Object model code
    def __getattr__(self, name):
        if name == "fahrenheit":
            return self.read_attr("celsius") * 9. / 5. + 32
        raise AttributeError(name)
    def __setattr__(self, name, value):
        if name == "fahrenheit":
            self.write_attr("celsius", (value - 32) * 5. / 9.)
        else:
            # call the base implementation
            OBJECT.read_attr("__setattr__")(self, name, value)

    A = Class(name="A", base_class=OBJECT,
              fields={"__getattr__": __getattr__, "__setattr__": __setattr__},
              metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("celsius", 30)
    assert obj.read_attr("fahrenheit") == 86 # test __getattr__
    obj.write_attr("celsius", 40)
    assert obj.read_attr("fahrenheit") == 104
    obj.write_attr("fahrenheit", 86) # test __setattr__
    assert obj.read_attr("celsius") == 30
    assert obj.read_attr("fahrenheit") == 86

To pass these tests, the Base.read_attr and Base.write_attr methods need to be changed:

class Base(object):
    ...

    def read_attr(self, fieldname):
        """ read field 'fieldname' out of the object """
        result = self._read_dict(fieldname)
        if result is not MISSING:
            return result
        result = self.cls._read_from_class(fieldname)
        if _is_bindable(result):
            return _make_boundmethod(result, self)
        if result is not MISSING:
            return result
        meth = self.cls._read_from_class("__getattr__")
        if meth is not MISSING:
            return meth(self, fieldname)
        raise AttributeError(fieldname)

    def write_attr(self, fieldname, value):
        """ write field 'fieldname' into the object """
        meth = self.cls._read_from_class("__setattr__")
        return meth(self, fieldname, value)

The procedure for reading an attribute is changed to call the __getattr__ method with the fieldname as an argument, if the method exists, instead of raising an error. Note that __getattr__ (and indeed all special methods in Python) is looked up on the class only, instead of recursively calling self.read_attr("__getattr__"). That is because the latter would lead to an infinite recursion of read_attr if __getattr__ were not defined on the object.

Writing of attributes is fully deferred to the __setattr__ method. To make this work, OBJECT needs to have a __setattr__ method that calls the default behaviour, as follows:

def OBJECT__setattr__(self, fieldname, value):
    self._write_dict(fieldname, value)
OBJECT = Class("object", None, {"__setattr__": OBJECT__setattr__}, None)

The behaviour of OBJECT__setattr__ is like the previous behaviour of write_attr. With these modifications, the new test passes.

Descriptor Protocol

The above test to provide automatic conversion between different temperature scales worked but was annoying to write, as the attribute name needed to be checked explicitly in the __getattr__ and __setattr__ methods. To get around this, the descriptor protocol was introduced in Python.

While __getattr__ and __setattr__ are called on the object the attribute is being read from, the descriptor protocol calls a special method on the result of getting an attribute from an object. It can be seen as the generalization of binding a method to an object – and indeed, binding a method to an object is done using the descriptor protocol. In addition to bound methods, the most important use case for the descriptor protocol in Python is the implementation of staticmethodclassmethod and property.

In this subsection we will introduce the subset of the descriptor protocol which deals with binding objects. This is done using the special method __get__, and is best explained with an example test:

def test_get():
    # Python code
    class FahrenheitGetter(object):
        def __get__(self, inst, cls):
            return inst.celsius * 9. / 5. + 32

    class A(object):
        fahrenheit = FahrenheitGetter()
    obj = A()
    obj.celsius = 30
    assert obj.fahrenheit == 86

    # Object model code
    class FahrenheitGetter(object):
        def __get__(self, inst, cls):
            return inst.read_attr("celsius") * 9. / 5. + 32

    A = Class(name="A", base_class=OBJECT,
              fields={"fahrenheit": FahrenheitGetter()},
              metaclass=TYPE)
    obj = Instance(A)
    obj.write_attr("celsius", 30)
    assert obj.read_attr("fahrenheit") == 86

The __get__ method is called on the FahrenheitGetter instance after that has been looked up in the class of obj. The arguments to __get__ are the instance where the lookup was done5.

Implementing this behaviour is easy. We simply need to change _is_bindable and _make_boundmethod:

def _is_bindable(meth):
    return hasattr(meth, "__get__")

def _make_boundmethod(meth, self):
    return meth.__get__(self, None)

This makes the test pass. The previous tests about bound methods also still pass, as Python's functions have a __get__ method that returns a bound method object.

In practice, the descriptor protocol is quite a lot more complex. It also supports __set__ to override what setting an attribute means on a per-attribute basis. Also, the current implementation is cutting a few corners. Note that _make_boundmethod calls the method __get__ on the implementation level, instead of using meth.read_attr("__get__"). This is necessary since our object model borrows functions and thus methods from Python, instead of having a representation for them that uses the object model. A more complete object model would have to solve this problem.

Instance Optimization

While the first three variants of the object model were concerned with behavioural variation, in this last section we will look at an optimization without any behavioural impact. This optimization is called maps and was pioneered in the VM for the Self programming language6. It is still one of the most important object model optimizations: it's used in PyPy and all modern JavaScript VMs, such as V8 (where the optimization is called hidden classes).

The optimization starts from the following observation: In the object model as implemented so far all instances use a full dictionary to store their attributes. A dictionary is implemented using a hash map, which takes a lot of memory. In addition, the dictionaries of instances of the same class typically have the same keys as well. For example, given a class Point, the keys of all its instances' dictionaries are likely "x" and "y".

The maps optimization exploits this fact. It effectively splits up the dictionary of every instance into two parts. A part storing the keys (the map) that can be shared between all instances with the same set of attribute names. The instance then only stores a reference to the shared map and the values of the attributes in a list (which is a lot more compact in memory than a dictionary). The map stores a mapping from attribute names to indexes into that list.

A simple test of that behaviour looks like this:

def test_maps():
    # white box test inspecting the implementation
    Point = Class(name="Point", base_class=OBJECT, fields={}, metaclass=TYPE)
    p1 = Instance(Point)
    p1.write_attr("x", 1)
    p1.write_attr("y", 2)
    assert p1.storage == [1, 2]
    assert p1.map.attrs == {"x": 0, "y": 1}

    p2 = Instance(Point)
    p2.write_attr("x", 5)
    p2.write_attr("y", 6)
    assert p1.map is p2.map
    assert p2.storage == [5, 6]

    p1.write_attr("x", -1)
    p1.write_attr("y", -2)
    assert p1.map is p2.map
    assert p1.storage == [-1, -2]

    p3 = Instance(Point)
    p3.write_attr("x", 100)
    p3.write_attr("z", -343)
    assert p3.map is not p1.map
    assert p3.map.attrs == {"x": 0, "z": 1}

Note that this is a different flavour of test than the ones we've written before. All previous tests just tested the behaviour of the classes via the exposed interfaces. This test instead checks the implementation details of the Instance class by reading internal attributes and comparing them to predefined values. Therefore this test can be called a white-box test.

The attrs attribute of the map of p1 describes the layout of the instance as having two attributes "x" and "y" which are stored at position 0 and 1 of the storage of p1. Making a second instance p2 and adding to it the same attributes in the same order will make it end up with the same map. If, on the other hand, a different attribute is added, the map can of course not be shared.

The Map class looks like this:

class Map(object):
    def __init__(self, attrs):
        self.attrs = attrs
        self.next_maps = {}

    def get_index(self, fieldname):
        return self.attrs.get(fieldname, -1)

    def next_map(self, fieldname):
        assert fieldname not in self.attrs
        if fieldname in self.next_maps:
            return self.next_maps[fieldname]
        attrs = self.attrs.copy()
        attrs[fieldname] = len(attrs)
        result = self.next_maps[fieldname] = Map(attrs)
        return result

EMPTY_MAP = Map({})

Maps have two methods, get_index and next_map. The former is used to find the index of an attribute name in the object's storage. The latter is used when a new attribute is added to an object. In that case the object needs to use a different map, which next_map computes. The method uses the next_maps dictionary to cache already created maps. That way, objects that have the same layout also end up using the same Map object.

图14.2  - 地图转换

Figure 14.2 - Map transitions

The Instance implementation that uses maps looks like this:

class Instance(Base):
    """Instance of a user-defined class. """

    def __init__(self, cls):
        assert isinstance(cls, Class)
        Base.__init__(self, cls, None)
        self.map = EMPTY_MAP
        self.storage = []

    def _read_dict(self, fieldname):
        index = self.map.get_index(fieldname)
        if index == -1:
            return MISSING
        return self.storage[index]

    def _write_dict(self, fieldname, value):
        index = self.map.get_index(fieldname)
        if index != -1:
            self.storage[index] = value
        else:
            new_map = self.map.next_map(fieldname)
            self.storage.append(value)
            self.map = new_map

The class now passes None as the fields dictionary to Base, as Instance will store the content of the dictionary in another way. Therefore it needs to override the _read_dict and _write_dict methods. In a real implementation, we would refactor the Base class so that it is no longer responsible for storing the fields dictionary, but for now having instances store None there is good enough.

A newly created instance starts out using the EMPTY_MAP, which has no attributes, and empty storage. To implement _read_dict, the instance's map is asked for the index of the attribute name. Then the corresponding entry of the storage list is returned.

Writing into the fields dictionary has two cases. On the one hand the value of an existing attribute can be changed. This is done by simply changing the storage at the corresponding index. On the other hand, if the attribute does not exist yet, a map transition (Figure 14.2) is needed using the next_map method. The value of the new attribute is appended to the storage list.

What does this optimization achieve? It optimizes use of memory in the common case where there are many instances with the same layout. It is not a universal optimization: code that creates instances with wildly different sets of attributes will have a larger memory footprint than if we just use dictionaries.

This is a common problem when optimizing dynamic languages. It is often not possible to find optimizations that are faster or use less memory in all cases. In practice, the optimizations chosen apply to how the language is typically used, while potentially making behaviour worse for programs that use extremely dynamic features.

Another interesting aspect of maps is that, while here they only optimize for memory use, in actual VMs that use a just-in-time (JIT) compiler they also improve the performance of the program. To achieve that, the JIT uses the maps to compile attribute lookups to a lookup in the objects' storage at a fixed offset, getting rid of all dictionary lookups completely7.

Potential Extensions

It is easy to extend our object model and experiment with various language design choices. Here are some possibilities:

  • The easiest thing to do is to add further special methods. Some easy and interesting ones to add are __init____getattribute____set__.

  • The model can be very easily extended to support multiple inheritance. To do this, every class would get a list of base classes. Then the Class.method_resolution_order method would need to be changed to support looking up methods. A simple method resolution order could be computed using a depth-first search with removal of duplicates. A more complicated but better one is the C3 algorithm, which adds better handling in the base of diamond-shaped multiple inheritance hierarchies and rejects insensible inheritance patterns.

  • A more radical change is to switch to a prototype model, which involves the removal of the distinction between classes and instances.

Conclusions

Some of the core aspects of the design of an object-oriented programming language are the details of its object model. Writing small object model prototypes is an easy and fun way to understand the inner workings of existing languages better and to get insights into the design space of object-oriented languages. Playing with object models is a good way to experiment with different language design ideas without having to worry about the more boring parts of language implementation, such as parsing and executing code.

Such object models can also be useful in practice, not just as vehicles for experimentation. They can be embedded in and used from other languages. Examples of this approach are common: the GObject object model, written in C, that's used in GLib and other Gnome libraries; or the various class system implementations in JavaScript.


  1. P. Cointe, “Metaclasses are first class: The ObjVlisp Model,” SIGPLAN Not, vol. 22, no. 12, pp. 156–162, 1987.

  2. It seems that the attribute-based model is conceptually more complex, because it needs both method lookup and call. In practice, calling something is defined by looking up and calling a special attribute __call__, so conceptual simplicity is regained. This won't be implemented in this chapter, however.)

  3. G. Kiczales, J. des Rivieres, and D. G. Bobrow, The Art of the Metaobject Protocol. Cambridge, Mass: The MIT Press, 1991.

  4. A. Goldberg, Smalltalk-80: The Language and its Implementation. Addison-Wesley, 1983, page 61.

  5. In Python the second argument is the class where the attribute was found, though we will ignore that here.

  6. C. Chambers, D. Ungar, and E. Lee, “An efficient implementation of SELF, a dynamically-typed object-oriented language based on prototypes,” in OOPSLA, 1989, vol. 24.

  7. 其工作原理是超出了本章的范围。我试着给一个合理的可读考虑其在论文中,我几年前写的。它使用一个对象模型,它基本上与一个本章中的变体:CF Bolz,A.铜镍,M.Fijałkowski,M. Leuschel,S.佩德罗尼,和A. Rigo旅馆,“运行时反馈的元跟踪JIT在第六届研讨会论文集高效的动态语言,”关于实施,编制,面向对象的语言,程序和系统,纽约,NY,USA,2011,第9的优化:1-9:8。


猜你喜欢

转载自blog.csdn.net/zhxlx/article/details/79302294