Skip to content

Latest commit

 

History

History
497 lines (336 loc) · 14 KB

File metadata and controls

497 lines (336 loc) · 14 KB

frame

contents

related file

  • cpython/Objects/frameobject.c
  • cpython/Include/frameobject.h

memory layout

The PyFrameObject is the stack frame in the Python virtual machine. It contains space for the currently executing code object, parameters, variables in different scopes, try block info, and more

for more information please refer to stack frame strategy

layout

example

Every time you make a function call, a new PyFrameObject will be created and attached to the current function call.

It's not intuitive to trace a frame object in the middle of a function. I will use a generator object to do the explanation.

You can always get the frame of the current environment by executing sys._current_frames().

If you need the meaning of each field, please refer to Junnplus' blog or read the source code directly

f_valuestack/f_stacktop/f_localsplus

PyFrameObject object is variable-sized object, it can be cast to type PyVarObject, the real ob_size is decided by the code object

Py_ssize_t extras, ncells, nfrees;
ncells = PyTuple_GET_SIZE(code->co_cellvars);
nfrees = PyTuple_GET_SIZE(code->co_freevars);
extras = code->co_stacksize + code->co_nlocals + ncells + nfrees;
/* omit */
if (free_list == NULL) { /* omit */
    f = PyObject_GC_NewVar(PyFrameObject, &PyFrame_Type, extras);
}
else { /* omit */
    PyFrameObject *new_f = PyObject_GC_Resize(PyFrameObject, f, extras);
}
extras = code->co_nlocals + ncells + nfrees;
f->f_valuestack = f->f_localsplus + extras;
for (i=0; i<extras; i++)
    f->f_localsplus[i] = NULL;

the ob_size is the sum of code->co_stacksize, code->co_nlocals, code->co_cellvars and code->co_freevars

code->co_stacksize: an integer that represents the maximum amount stack space that the function will use. It's computed when the code object generated

code->co_nlocals: number of local variables

code->co_cellvars: a tuple containing the names of all variables in the function that are also used in a nested function

code->co_freevars: the names of all variables used in the function that is defined in an enclosing function scope

for more information about PyCodeObject please refer to What is a code object in Python? and code object

let's see an example

def g2(a, b=1, c=2):
    yield a
    c = str(b + c)
    yield c
    new_g = range(3)
    yield from new_g

The dis result

  # ./python.exe -m dis frame_dis.py
  1           0 LOAD_CONST               5 ((1, 2))
              2 LOAD_CONST               2 (<code object g2 at 0x10c495030, file "frame_dis.py", line 1>)
              4 LOAD_CONST               3 ('g2')
              6 MAKE_FUNCTION            1 (defaults)
              8 STORE_NAME               0 (g2)
             10 LOAD_CONST               4 (None)
             12 RETURN_VALUE

Disassembly of <code object g2 at 0x10c495030, file "frame_dis.py", line 1>:
  2           0 LOAD_FAST                0 (a)
              2 YIELD_VALUE
              4 POP_TOP

  3           6 LOAD_GLOBAL              0 (str)
              8 LOAD_FAST                1 (b)
             10 LOAD_FAST                2 (c)
             12 BINARY_ADD
             14 CALL_FUNCTION            1
             16 STORE_FAST               2 (c)

  4          18 LOAD_FAST                2 (c)
             20 YIELD_VALUE
             22 POP_TOP

  5          24 LOAD_GLOBAL              1 (range)
             26 LOAD_CONST               1 (3)
             28 CALL_FUNCTION            1
             30 STORE_FAST               3 (new_g)

  6          32 LOAD_FAST                3 (new_g)
             34 GET_YIELD_FROM_ITER
             36 LOAD_CONST               0 (None)
             38 YIELD_FROM
             40 POP_TOP
             42 LOAD_CONST               0 (None)
             44 RETURN_VALUE

Let's iterate through the generator

>>> gg = g2("param a")

example0

After the first next returns, the first opcode 0 LOAD_FAST 0 (a) will be executed and the current execution flow is in the middle of the second opcode 2 YIELD_VALUE.

The field f_lasti is 2, indicating that the virtual program counter is at 2 YIELD_VALUE.

The opcode LOAD_FAST will push the parameter to f_valuestack, and opcode YIELD_VALUE will pop the top element from f_valuestack. The definition of pop is #define BASIC_POP() (*--stack_pointer).

The value (address 0x100a5b538) in f_valuestack is the same as the previous step (previous picture), but the first element the address (0x100a5b538) points to is different. Currently, it's a pointer to a PyUnicodeObject('param a') or an invalid address (if the PyUnicodeObject is deallocated)

>>> next(gg)
'param a'

example1

>>> next(gg)
'3'

The opcodes 6 LOAD_GLOBAL 0 (str), 8 LOAD_FAST 1 (b), and 10 LOAD_FAST 2 (c) in line 3 push str (parameter str is stored in the frame->f_code->co_names field), b (int 1), and c (int 2) to f_valuestack. Opcode 12 BINARY_ADD pops off the top 2 elements in f_valuestack (b and c), sums these two values, and stores the result at the top of f_valuestack. This is what f_valuestack looks like after 12 BINARY_ADD

example1_2

The opcode 14 CALL_FUNCTION 1 will pop the function and argument off the stack and delegate the actual function call.

After the function call, the result '3' is pushed onto the stack

example1_2_1

Opcode 16 STORE_FAST 2 (c) pops off the top element in f_valuestack and stores it into the 2nd position of f_localsplus

example1_2_2

Opcode 18 LOAD_FAST 2 (c) pushes the 2nd element in f_localsplus onto f_valuestack, and 20 YIELD_VALUE pops it and sends it to the caller.

Field f_lasti is 20, indicating that it's currently executing the opcode 20 YIELD_VALUE

example2

after 24 LOAD_GLOBAL 1 (range) and 26 LOAD_CONST 1 (3)

example1_3_1

after 28 CALL_FUNCTION 1

example1_3_2

after 30 STORE_FAST 3 (new_g)

example1_3_3

after 32 LOAD_FAST 3 (new_g)

example1_3_4

The opcode 34 GET_YIELD_FROM_ITER makes sure the stack's top is an iterable object

36 LOAD_CONST 0 (None) pushes None onto the stack

>>> next(gg)
0

Field f_lasti is 36, indicating that it's after 38 YIELD_FROM.

At the end of YIELD_FROM, the following code f->f_lasti -= sizeof(_Py_CODEUNIT); resets f_lasti to the beginning of YIELD_FROM Thanks to @RyanHe123

example3

The frame object is deallocated after StopIteration is raised (the opcode 44 RETURN_VALUE is also executed)

>>> next(gg)
1
>>> next(gg)
2
>>> next(gg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration
>>> repr(gg.gi_frame)
'None'

f_blockstack

f_blockstack is an array. The element type is PyTryBlock and the size is CO_MAXBLOCKS (20).

The definition of PyTryBlock

typedef struct {
    int b_type;                 /* what kind of block this is */
    int b_handler;              /* where to jump to find handler */
    int b_level;                /* value stack level to pop to */
} PyTryBlock;

Let's define a generator with some blocks

def g3():
    try:
        yield 1
        1 / 0
    except ZeroDivisionError:
        yield 2
        try:
            yield 3
            import no
        except ModuleNotFoundError:
            for i in range(3):
                yield i + 4
            yield 4
        finally:
            yield 100


>>> gg = g3()

blockstack0

In the first yield statement, the first try block is set up.

f_iblock is 1, indicating that there's currently one block

b_type 122 is the opcode SETUP_FINALLY, b_handler 20 is the opcode location of the except ZeroDivisionError, b_level 0 is the stack pointer's position to use

>>> next(gg)
1

blockstack1

b_type 257 is the opcode EXCEPT_HANDLER, EXCEPT_HANDLER has a special meaning

/* EXCEPT_HANDLER is a special, implicit block type which is created when
   entering an except handler. It is not an opcode but we define it here
   as we want it to be available to both frameobject.c and ceval.c, while
   remaining private.*/
#define EXCEPT_HANDLER 257

b_handler set to -1, since already in the processing of the try block

b_level doesn't change

>>> next(gg)
2

blockstack2

f_iblock is 3, the second try block comes from finally:(opcode position 116), and the third try block comes from except ModuleNotFoundError:(opcode position 62)

>>> next(gg)
3

blockstack3

>>> next(gg)
4

b_type of the third try block becomes 257 and b_handler becomes -1, means this block is currently being handling

blockstack4

The other two try blocks are handled properly

>>> next(gg)
5
>>> next(gg)
6
>>> next(gg)
4
>>> next(gg)
100

blockstack5

Frame object is deallocated

>>> next(gg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

f_back

f_back is a pointer that points to the previous frame. It makes the related frames a singly linked list

import inspect

def g4(depth):
    print("depth", depth)
    print(repr(inspect.currentframe()), inspect.currentframe().f_back)
    if depth > 0:
        g4(depth-1)


g4(3)

Output

depth 3
<frame at 0x7fedc2f2e9a8, file '<input>', line 3, code g4> <frame at 0x7fedc2cab468, file '<input>', line 1, code <module>>
depth 2
<frame at 0x7fedc2de54a8, file '<input>', line 3, code g4> <frame at 0x7fedc2f2e9a8, file '<input>', line 5, code g4>
depth 1
<frame at 0x7fedc2ca6348, file '<input>', line 3, code g4> <frame at 0x7fedc2de54a8, file '<input>', line 5, code g4>
depth 0
<frame at 0x10c2c9930, file '<input>', line 3, code g4> <frame at 0x7fedc2ca6348, file '<input>', line 5, code g4>

f_back

free_list mechanism

zombie frame

The first time a code object is attached to a frame object, after the execution of the code block, the frame object will not be freed. It becomes a "zombie" frame. The next time the code block executes again, it will reuse the same frame object.

This strategy saves malloc/realloc overhead and some field initialization

def g5():
    yield 1

>>> gg = g5()
>>> gg.gi_frame
<frame at 0x10224c970, file '<stdin>', line 1, code g5>
>>> next(gg)
1
>>> next(gg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

>>> gg3 = g5()
>>> gg3.gi_frame # id same as previous one, the same frame object in the same code block is reused
<frame at 0x10224c970, file '<stdin>', line 1, code g5>

free_list sub

There's a singly linked list that stores the deallocated frame objects. It saves malloc/free overhead

static PyFrameObject *free_list = NULL;
static int numfree = 0;         /* number of frames currently in free_list */
/* max value for numfree */
#define PyFrame_MAXFREELIST 200

When a PyFrameObject is on the free list, only the following members have meaning

ob_type             == &Frametype
f_back              next item on free list, or NULL
f_stacksize         size of value stack
ob_size             size of localsplus

The creation process will check if the stack size is enough

if (Py_SIZE(f) < extras) {
    PyFrameObject *new_f = PyObject_GC_Resize(PyFrameObject, f, extras);

Let's see an example

import inspect

def g6():
    yield repr(inspect.currentframe()), inspect.currentframe().f_back

>>> gg = g6()
>>> gg1 = g6()
>>> gg2 = g6()

free_list0

the frame attached to variable gg is deallocated, because it's the first frame execute the code block, it becomes the "zombie" frame of the code object

because the code object still contains reference count to the frame object("zombie" frame), the frame object won't go to the free_list or trigger gc

>>> next(gg)
("<frame at 0x1052d83a0, file '<stdin>', line 2, code g6>", <frame at 0x105225e50, file '<stdin>', line 1, code <module>>)
>>> next(gg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

free_list1

>>> next(gg1)
("<frame at 0x105620040, file '<stdin>', line 2, code g6>", <frame at 0x105474cc0, file '<stdin>', line 1, code <module>>)
>>> next(gg1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

free_list2

>>> next(gg2)
("<frame at 0x105482d00, file '<stdin>', line 2, code g6>", <frame at 0x105225e50, file '<stdin>', line 1, code <module>>)
>>> next(gg2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
StopIteration

free_list3