BEAM 簡介

2020 年 10 月 20 日 · 作者:John Högberg

這篇文章是關於 BEAM 的簡要入門,BEAM 是在 Erlang 執行時系統(ERTS)中執行使用者程式碼的虛擬機器。它的目的是幫助那些剛接觸 BEAM 的人,能夠理解即將發布的關於 OTP 24 中 JIT 的系列文章,實作細節將在之後討論。

BEAM 經常與 ERTS 混淆,因此區分兩者很重要;BEAM 只是虛擬機器,它沒有進程、埠、ETS 表格等概念。它僅執行指令,儘管 ERTS 影響了它們的設計,但它不會影響程式碼執行時的操作,因此你不需要理解 ERTS 也能理解 BEAM。

BEAM 是一個暫存器機器,所有指令都在具名的暫存器上操作。每個暫存器可以包含任何 Erlang 術語,例如整數或元組,可以將它們視為簡單的變數。兩種最重要的暫存器類型是:

  • X:這些暫存器用於暫時資料和在函數之間傳遞資料。它們不需要堆疊框架,並且可以在任何函數中自由使用,但有一些限制,我們稍後會詳細說明。
  • Y:這些暫存器是每個堆疊框架的本地變數,除了需要堆疊框架之外,沒有其他特殊限制。

控制流程由測試特定條件的指令處理,並移動到下一個指令或分支到其失敗標籤,標記為{f,Index}。例如,{test,is_integer,{f,7},[{x,0}]} 檢查 {x,0} 是否包含整數,如果不是,則跳轉到標籤 7。

函數參數從左到右在 X 暫存器中傳遞,從 {x,0} 開始,結果在 {x,0} 中返回。

透過範例來解釋它們如何組合在一起會更容易,所以讓我們來看幾個範例

sum_tail(List) ->
    sum_tail(List, 0).

sum_tail([Head | Tail], Acc) ->
    sum_tail(Tail, Head + Acc);
sum_tail([], Acc) ->
    Acc.

讓我們使用 erlc -S 來逐一查看指令

%% sum_tail/1, entry label is 2
{function, sum_tail, 1, 2}.

  %% Marks a jump target with the label 1.
  {label,1}.

    %% Special instruction that raises a function_clause
    %% exception. Unused in this function.
    {func_info,{atom,primer},{atom,sum_tail},1}.

  {label,2}.
    %% The meat of the function starts here.
    %%
    %% Our only argument - List - is in {x,0} and
    %% since sum_tail/2 expects it to be the first
    %% argument we can leave it be. We'll pass the
    %% integer 0 as the second argument in {x,1}.
    {move,{integer,0},{x,1}}.

    %% Tail call sum_tail/2, whose entry label is 4.
    {call_only,2,{f,4}}.

%% sum_tail/2, entry label is 4
{function, sum_tail, 2, 4}.
  {label,3}.
    {func_info,{atom,primer},{atom,sum_tail},2}.
  {label,4}.

    %% Test whether we have a non-empty list, and jump to
    %% the base case at label 5 if we don't.
    {test,is_nonempty_list,{f,5},[{x,0}]}.

    %% Unpack the list in the first argument, placing the
    %% head in {x,2} and the tail in {x,0}.
    {get_list,{x,0},{x,2},{x,0}}.

    %% Add the head and our accumulator (remember that the
    %% second function argument is in {x,1}), and place
    %% the result in {x,1}.
    %%
    %% A fail label of 0 means that we want the
    %% instruction to throw an exception on error, rather
    %% than jump to a given label.
    {gc_bif,'+',{f,0},3,[{x,2},{x,1}],{x,1}}.

    %% Tail-call ourselves to handle the rest of the list,
    %% the arguments are already in the right registers.
    {call_only,2,{f,4}}.

  {label,5}.
    %% Test whether our argument was the empty list. If
    %% not, we jump to label 3 to raise a function_clause
    %% exception.
    {test,is_nil,{f,3},[{x,0}]}.

    %% Return our accumulator.
    {move,{x,1},{x,0}}.
    return.

很簡單,不是嗎?

不過,我忽略了一個小細節;加法指令中神秘的數字 3。這個數字告訴我們有多少個 X 暫存器保存著有效數據,以防我們需要更多記憶體,因此可以在其餘暫存器被視為垃圾丟棄時保留它們。因此,在此指令之後引用較高的 X 暫存器是不安全的,因為它們的內容可能無效(在本例中為 {x,3} 及以上)。

函數調用類似;每當我們調用或從函數返回時,我們可能會將自己排程出去,並且我們只會在這樣做時保留函數參數/返回值。這表示即使你確信被調用的函數沒有觸碰特定暫存器,除了 {x,0} 之外的所有 X 暫存器在調用後都無效。

這就是 Y 暫存器發揮作用的地方。讓我們以前面的範例為基礎,並將其改為尾遞迴

sum_body([Head | Tail]) ->
    Head + sum_body(Tail);
sum_body([]) ->
    0.
{function, sum_body, 1, 7}.
  {label,6}.
    {func_info,{atom,primer},{atom,sum_body},1}.
  {label,7}.
    {test,is_nonempty_list,{f,8},[{x,0}]}.

    %% Allocate a stack frame with a single Y register.
    %% Since this instruction may need more memory, we
    %% tell the garbage collector that we currently have
    %% one live X register (our list argument in {x,0}).
    {allocate,1,1}.

    %% Unpack the list, placing the head in {y,0} and
    %% the tail in {x,0}.
    {get_list,{x,0},{y,0},{x,0}}.

    %% Body-call ourselves. Note that while this kills all
    %% X registers, it leaves Y registers alone so our
    %% head is still valid.
    {call,1,{f,7}}.

    %% Add the head to our return value and store the
    %% result in {x,0}.
    {gc_bif,'+',{f,0},1,[{y,0},{x,0}],{x,0}}.

    %% Deallocate our stack frame and return.
    {deallocate,1}.
    return.

  {label,8}.
    {test,is_nil,{f,6},[{x,0}]}.

    %% Return the integer 0.
    {move,{integer,0},{x,0}}.
    return.

注意到當我們在堆疊框架中時,調用指令發生了變化嗎?有三種不同的調用指令

  • call:如同範例中的一般調用。當被調用的函數返回時,控制流程將在下一個指令繼續。
  • call_last:當有堆疊框架時的尾調用。目前的框架將在調用之前釋放。
  • call_only:當沒有堆疊框架時的尾調用。

每個指令都有一個用於調用其他模組中函數的變體(例如 call_ext),但它們在其他方面是相同的。

到目前為止,我們只看了使用術語,那麼如何建立它們呢?讓我們來看看

create_tuple(Term) ->
    {hello, Term}.
{function, create_tuple, 1, 10}.
  {label,9}.
    {func_info,{atom,primer},{atom,create_tuple},1}.
  {label,10}.
    %% Allocate the three words needed for a 2-tuple, with
    %% a liveness annotation of 1 indicating that {x,0}
    %% is alive in case we need to GC.
    {test_heap,3,1}.

    %% Create the tuple and place the result in {x,0}
    {put_tuple2,{x,0},{list,[{atom,hello},{x,0}]}}.
  
    return.

這有點神奇,因為有一個不可見的暫存器用於記憶體分配,但分配很少與使用相隔太遠,而且通常很容易追蹤。同樣的原理也適用於列表(consing)、浮點數和 funs,遵循 PR 2765

更複雜的類型,例如映射、大整數、參考等,由特殊指令建立,這些指令可能會自行進行 GC(或在「堆片段」中於堆之外分配),因為它們的大小無法預先靜態決定。

現在讓我們看看一些不常見的東西:例外。

exception() ->
    try
        external:call()
    catch
        throw:example -> hello
    end.
{function, exception, 0, 12}.
  {label,11}.
    {func_info,{atom,primer},{atom,exception},0}.
  {label,12}.
    {allocate,1,0}.
  
    %% Place a catch tag in {y,0}. If an exception is
    %% raised while this tag is the most current one,
    %% the control flow will resume at {f,13} in this
    %% stack frame.
    {'try',{y,0},{f,13}}.

    {call_ext,0,{extfunc,external,call,0}}.

    %% Deactivate the catch tag before returning with the
    %% result from the call.
    {try_end,{y,0}}.

    {deallocate,1}.
    return.

  {label,13}.
    %% Uh oh, we've got an exception. Kill the catch tag
    %% and place the exception class in {x,0}, the error
    %% reason/thrown value in {x,1}, and the stack trace
    %% in {x,2}.
    {try_case,{y,0}}.

    %% Return 'hello' if the user threw 'example'
    {test,is_eq_exact,{f,14},[{x,0},{atom,throw}]}.
    {test,is_eq_exact,{f,14},[{x,1},{atom,example}]}.
    {move,{atom,hello},{x,0}}.
    {deallocate,1}.
    return.

  {label,14}.
    %% Otherwise, rethrow the exception since no catch
    %% clause matched.
    {bif,raise,{f,0},[{x,2},{x,1}],{x,0}}.

到目前為止,你可能已經注意到控制流程是如何只向前移動的;就像 Erlang 本身一樣,迴圈的唯一方法是透過遞迴。這唯一的例外是 receive 結構,它可能會迴圈直到收到符合條件的訊息。

selective_receive(Ref) ->
    receive
        {Ref, Result} -> Result
    end.
{function, selective_receive, 1, 16}.
  {label,15}.
    {func_info,{atom,primer},{atom,selective_receive},1}.
  {label,16}.
    {allocate,1,1}.

    %% We may be scheduled out while waiting for a
    %% message, so we'll preserve our Ref in {y,0}.
    {move,{x,0},{y,0}}.

  {label,17}.
    %% Pick the next message from the process' message box
    %% and place it in {x,0}, jumping to label 19 if the
    %% message box is empty.
    {loop_rec,{f,19},{x,0}}.
  
    %% Does it match our pattern? If not, jump to label 18
    %% and try the next message.
    {test,is_tuple,{f,18},[{x,0}]}.
    {test,test_arity,{f,18},[{x,0},2]}.
    {get_tuple_element,{x,0},0,{x,1}}.
    {test,is_eq_exact,{f,18},[{x,1},{y,0}]}.

    %% We've got a match, extract the result and remove
    %% the message from the mailbox.
    {get_tuple_element,{x,0},1,{x,0}}.
    remove_message.
    {deallocate,1}.
    return.

  {label,18}.
    %% The message didn't match, loop back to handle our
    %% next message. Note that the current message remains
    %% in the inbox since a different receive may be
    %% interested in it.
    {loop_rec_end,{f,17}}.

  {label,19}.
    %% Wait until the next message arrives, returning to
    %% the start of the loop when it does. If there's a
    %% timeout involved, it will be handled here.
    {wait,{f,17}}.

沒有太多其他內容了,如果你覺得可以理解上面的範例,那麼你應該對 JIT 系列沒有任何問題。

如果你好奇有哪些指令,你可以在 genop.tab 中找到每個指令的簡短說明。