BEAM 簡介
這篇文章是關於 BEAM 的簡要入門,BEAM 是在 Erlang 執行時系統(ERTS)中執行使用者程式碼的虛擬機器。它的目的是幫助那些剛接觸 BEAM 的人,能夠理解即將發布的關於 OTP 24 中 JIT 的系列文章,實作細節將在之後討論。
BEAM 經常與 ERTS 混淆,因此區分兩者很重要;BEAM 只是虛擬機器,它沒有進程、埠、ETS 表格等概念。它僅執行指令,儘管 ERTS 影響了它們的設計,但它不會影響程式碼執行時的操作,因此你不需要理解 ERTS 也能理解 BEAM。
BEAM 是一個暫存器機器,所有指令都在具名的暫存器上操作。每個暫存器可以包含任何 Erlang 術語,例如整數或元組,可以將它們視為簡單的變數。兩種最重要的暫存器類型是:
X
:這些暫存器用於暫時資料和在函數之間傳遞資料。它們不需要堆疊框架,並且可以在任何函數中自由使用,但有一些限制,我們稍後會詳細說明。Y
:這些暫存器是每個堆疊框架的本地變數,除了需要堆疊框架之外,沒有其他特殊限制。
控制流程由測試特定條件的指令處理,並移動到下一個指令或分支到其失敗標籤,標記為{f,Index}
。例如,{test,is_integer,{f,7},[{x,0}]}
檢查 {x,0}
是否包含整數,如果不是,則跳轉到標籤 7。
函數參數從左到右在 X
暫存器中傳遞,從 {x,0}
開始,結果在 {x,0}
中返回。
透過範例來解釋它們如何組合在一起會更容易,所以讓我們來看幾個範例
sum_tail(List) ->
sum_tail(List, 0).
sum_tail([Head | Tail], Acc) ->
sum_tail(Tail, Head + Acc);
sum_tail([], Acc) ->
Acc.
讓我們使用 erlc -S
來逐一查看指令
%% sum_tail/1, entry label is 2 {function, sum_tail, 1, 2}. %% Marks a jump target with the label 1. {label,1}. %% Special instruction that raises a function_clause %% exception. Unused in this function. {func_info,{atom,primer},{atom,sum_tail},1}. {label,2}. %% The meat of the function starts here. %% %% Our only argument - List - is in {x,0} and %% since sum_tail/2 expects it to be the first %% argument we can leave it be. We'll pass the %% integer 0 as the second argument in {x,1}. {move,{integer,0},{x,1}}. %% Tail call sum_tail/2, whose entry label is 4. {call_only,2,{f,4}}. %% sum_tail/2, entry label is 4 {function, sum_tail, 2, 4}. {label,3}. {func_info,{atom,primer},{atom,sum_tail},2}. {label,4}. %% Test whether we have a non-empty list, and jump to %% the base case at label 5 if we don't. {test,is_nonempty_list,{f,5},[{x,0}]}. %% Unpack the list in the first argument, placing the %% head in {x,2} and the tail in {x,0}. {get_list,{x,0},{x,2},{x,0}}. %% Add the head and our accumulator (remember that the %% second function argument is in {x,1}), and place %% the result in {x,1}. %% %% A fail label of 0 means that we want the %% instruction to throw an exception on error, rather %% than jump to a given label. {gc_bif,'+',{f,0},3,[{x,2},{x,1}],{x,1}}. %% Tail-call ourselves to handle the rest of the list, %% the arguments are already in the right registers. {call_only,2,{f,4}}. {label,5}. %% Test whether our argument was the empty list. If %% not, we jump to label 3 to raise a function_clause %% exception. {test,is_nil,{f,3},[{x,0}]}. %% Return our accumulator. {move,{x,1},{x,0}}. return.
很簡單,不是嗎?
不過,我忽略了一個小細節;加法指令中神秘的數字 3
。這個數字告訴我們有多少個 X
暫存器保存著有效數據,以防我們需要更多記憶體,因此可以在其餘暫存器被視為垃圾丟棄時保留它們。因此,在此指令之後引用較高的 X
暫存器是不安全的,因為它們的內容可能無效(在本例中為 {x,3}
及以上)。
函數調用類似;每當我們調用或從函數返回時,我們可能會將自己排程出去,並且我們只會在這樣做時保留函數參數/返回值。這表示即使你確信被調用的函數沒有觸碰特定暫存器,除了 {x,0}
之外的所有 X
暫存器在調用後都無效。
這就是 Y
暫存器發揮作用的地方。讓我們以前面的範例為基礎,並將其改為尾遞迴
sum_body([Head | Tail]) ->
Head + sum_body(Tail);
sum_body([]) ->
0.
{function, sum_body, 1, 7}. {label,6}. {func_info,{atom,primer},{atom,sum_body},1}. {label,7}. {test,is_nonempty_list,{f,8},[{x,0}]}. %% Allocate a stack frame with a single Y register. %% Since this instruction may need more memory, we %% tell the garbage collector that we currently have %% one live X register (our list argument in {x,0}). {allocate,1,1}. %% Unpack the list, placing the head in {y,0} and %% the tail in {x,0}. {get_list,{x,0},{y,0},{x,0}}. %% Body-call ourselves. Note that while this kills all %% X registers, it leaves Y registers alone so our %% head is still valid. {call,1,{f,7}}. %% Add the head to our return value and store the %% result in {x,0}. {gc_bif,'+',{f,0},1,[{y,0},{x,0}],{x,0}}. %% Deallocate our stack frame and return. {deallocate,1}. return. {label,8}. {test,is_nil,{f,6},[{x,0}]}. %% Return the integer 0. {move,{integer,0},{x,0}}. return.
注意到當我們在堆疊框架中時,調用指令發生了變化嗎?有三種不同的調用指令
call
:如同範例中的一般調用。當被調用的函數返回時,控制流程將在下一個指令繼續。call_last
:當有堆疊框架時的尾調用。目前的框架將在調用之前釋放。call_only
:當沒有堆疊框架時的尾調用。
每個指令都有一個用於調用其他模組中函數的變體(例如 call_ext
),但它們在其他方面是相同的。
到目前為止,我們只看了使用術語,那麼如何建立它們呢?讓我們來看看
create_tuple(Term) ->
{hello, Term}.
{function, create_tuple, 1, 10}. {label,9}. {func_info,{atom,primer},{atom,create_tuple},1}. {label,10}. %% Allocate the three words needed for a 2-tuple, with %% a liveness annotation of 1 indicating that {x,0} %% is alive in case we need to GC. {test_heap,3,1}. %% Create the tuple and place the result in {x,0} {put_tuple2,{x,0},{list,[{atom,hello},{x,0}]}}. return.
這有點神奇,因為有一個不可見的暫存器用於記憶體分配,但分配很少與使用相隔太遠,而且通常很容易追蹤。同樣的原理也適用於列表(consing)、浮點數和 funs,遵循 PR 2765。
更複雜的類型,例如映射、大整數、參考等,由特殊指令建立,這些指令可能會自行進行 GC(或在「堆片段」中於堆之外分配),因為它們的大小無法預先靜態決定。
現在讓我們看看一些不常見的東西:例外。
exception() ->
try
external:call()
catch
throw:example -> hello
end.
{function, exception, 0, 12}. {label,11}. {func_info,{atom,primer},{atom,exception},0}. {label,12}. {allocate,1,0}. %% Place a catch tag in {y,0}. If an exception is %% raised while this tag is the most current one, %% the control flow will resume at {f,13} in this %% stack frame. {'try',{y,0},{f,13}}. {call_ext,0,{extfunc,external,call,0}}. %% Deactivate the catch tag before returning with the %% result from the call. {try_end,{y,0}}. {deallocate,1}. return. {label,13}. %% Uh oh, we've got an exception. Kill the catch tag %% and place the exception class in {x,0}, the error %% reason/thrown value in {x,1}, and the stack trace %% in {x,2}. {try_case,{y,0}}. %% Return 'hello' if the user threw 'example' {test,is_eq_exact,{f,14},[{x,0},{atom,throw}]}. {test,is_eq_exact,{f,14},[{x,1},{atom,example}]}. {move,{atom,hello},{x,0}}. {deallocate,1}. return. {label,14}. %% Otherwise, rethrow the exception since no catch %% clause matched. {bif,raise,{f,0},[{x,2},{x,1}],{x,0}}.
到目前為止,你可能已經注意到控制流程是如何只向前移動的;就像 Erlang 本身一樣,迴圈的唯一方法是透過遞迴。這唯一的例外是 receive 結構,它可能會迴圈直到收到符合條件的訊息。
selective_receive(Ref) ->
receive
{Ref, Result} -> Result
end.
{function, selective_receive, 1, 16}. {label,15}. {func_info,{atom,primer},{atom,selective_receive},1}. {label,16}. {allocate,1,1}. %% We may be scheduled out while waiting for a %% message, so we'll preserve our Ref in {y,0}. {move,{x,0},{y,0}}. {label,17}. %% Pick the next message from the process' message box %% and place it in {x,0}, jumping to label 19 if the %% message box is empty. {loop_rec,{f,19},{x,0}}. %% Does it match our pattern? If not, jump to label 18 %% and try the next message. {test,is_tuple,{f,18},[{x,0}]}. {test,test_arity,{f,18},[{x,0},2]}. {get_tuple_element,{x,0},0,{x,1}}. {test,is_eq_exact,{f,18},[{x,1},{y,0}]}. %% We've got a match, extract the result and remove %% the message from the mailbox. {get_tuple_element,{x,0},1,{x,0}}. remove_message. {deallocate,1}. return. {label,18}. %% The message didn't match, loop back to handle our %% next message. Note that the current message remains %% in the inbox since a different receive may be %% interested in it. {loop_rec_end,{f,17}}. {label,19}. %% Wait until the next message arrives, returning to %% the start of the loop when it does. If there's a %% timeout involved, it will be handled here. {wait,{f,17}}.
沒有太多其他內容了,如果你覺得可以理解上面的範例,那麼你應該對 JIT 系列沒有任何問題。
如果你好奇有哪些指令,你可以在 genop.tab 中找到每個指令的簡短說明。