檢視原始碼 強健性

較大的範例中的 messenger 範例有幾個問題。例如,如果使用者登入的節點在沒有登出的情況下當機,則使用者仍然留在伺服器的 User_List 中,但客戶端會消失。這使得使用者無法再次登入,因為伺服器認為使用者已經登入。

或者,如果伺服器在傳送訊息的過程中當機,會發生什麼事?導致傳送訊息的客戶端永遠掛在 await_result 函數中?

逾時

在改進 messenger 程式之前,讓我們先看看一些基本原則,使用 ping pong 程式作為範例。回想一下,當 "ping" 完成時,它會傳送原子 finished 作為訊息給 "pong",告知 "pong" 它已完成,以便 "pong" 也可以完成。讓 "pong" 完成的另一種方式是讓 "pong" 在一段時間內沒有收到 ping 的訊息時退出。這可以透過在 pong 中加入逾時來完成,如下列範例所示

-module(tut19).

-export([start_ping/1, start_pong/0,  ping/2, pong/0]).

ping(0, Pong_Node) ->
    io:format("ping finished~n", []);

ping(N, Pong_Node) ->
    {pong, Pong_Node} ! {ping, self()},
    receive
        pong ->
            io:format("Ping received pong~n", [])
    end,
    ping(N - 1, Pong_Node).

pong() ->
    receive
        {ping, Ping_PID} ->
            io:format("Pong received ping~n", []),
            Ping_PID ! pong,
            pong()
    after 5000 ->
            io:format("Pong timed out~n", [])
    end.

start_pong() ->
    register(pong, spawn(tut19, pong, [])).

start_ping(Pong_Node) ->
    spawn(tut19, ping, [3, Pong_Node]).

在編譯此程式碼,並將檔案 tut19.beam 複製到必要的目錄後,在 (pong@kosken) 上會看到以下內容

(pong@kosken)1> tut19:start_pong().
true
Pong received ping
Pong received ping
Pong received ping
Pong timed out

在 (ping@gollum) 上會看到以下內容

(ping@gollum)1> tut19:start_ping(pong@kosken).
<0.36.0>
Ping received pong
Ping received pong
Ping received pong
ping finished

逾時設定在

pong() ->
    receive
        {ping, Ping_PID} ->
            io:format("Pong received ping~n", []),
            Ping_PID ! pong,
            pong()
    after 5000 ->
            io:format("Pong timed out~n", [])
    end.

逾時 (after 5000) 在進入 receive 時開始。如果收到 {ping,Ping_PID},則逾時會被取消。如果沒有收到 {ping,Ping_PID},則在 5000 毫秒後執行逾時後的操作。after 必須是 receive 中的最後一項,也就是說,必須在 receive 中的所有其他訊息接收規範之後。也可以呼叫一個函數來返回逾時的整數值

after pong_timeout() ->

一般來說,除了使用逾時之外,還有更好的方法來監督分散式 Erlang 系統的各個部分。逾時通常適用於監督外部事件,例如,如果您期望在特定時間內收到來自某些外部系統的訊息。例如,如果使用者在十分鐘內沒有存取 messenger 系統,則可以使用逾時來將使用者登出。

錯誤處理

在深入了解 Erlang 系統中的監督和錯誤處理細節之前,讓我們先看看 Erlang 程序的終止方式,或者在 Erlang 術語中,退出

執行 exit(normal) 或只是執行完所有事情的程序具有正常退出。

遇到執行時期錯誤(例如,除以零、錯誤的匹配、嘗試呼叫不存在的函數等等)的程序會以錯誤退出,也就是說,具有異常退出。執行 exit(Reason) 的程序,其中 Reason 是任何 Erlang 項,除了原子 normal 之外,也具有異常退出。

Erlang 程序可以設定與其他 Erlang 程序的連結。如果程序呼叫 link(Other_Pid),則它會在自身和名為 Other_Pid 的程序之間設定一個雙向連結。當程序終止時,它會向所有與其有連結的程序傳送一個稱為訊號的東西。

該訊號攜帶有關其傳送來源的 pid 以及退出原因的資訊。

接收到正常退出的程序的預設行為是忽略訊號。

在其他兩種情況(即,異常退出)下的預設行為是

  • 繞過傳送到接收程序的所有訊息。
  • 終止接收程序。
  • 將相同的錯誤訊號傳播到被終止程序的連結。

透過這種方式,您可以使用連結將交易中的所有程序連接在一起。如果其中一個程序異常退出,則交易中的所有程序都會被終止。由於通常需要在建立程序時同時建立連結,因此有一個特殊的 BIF,spawn_link,其功能與 spawn 相同,但也建立與產生程序的連結。

現在提供一個使用連結來終止 "pong" 的 ping pong 範例

-module(tut20).

-export([start/1,  ping/2, pong/0]).

ping(N, Pong_Pid) ->
    link(Pong_Pid),
    ping1(N, Pong_Pid).

ping1(0, _) ->
    exit(ping);

ping1(N, Pong_Pid) ->
    Pong_Pid ! {ping, self()},
    receive
        pong ->
            io:format("Ping received pong~n", [])
    end,
    ping1(N - 1, Pong_Pid).

pong() ->
    receive
        {ping, Ping_PID} ->
            io:format("Pong received ping~n", []),
            Ping_PID ! pong,
            pong()
    end.

start(Ping_Node) ->
    PongPID = spawn(tut20, pong, []),
    spawn(Ping_Node, tut20, ping, [3, PongPID]).
(s1@bill)3> tut20:start(s2@kosken).
Pong received ping
<3820.41.0>
Ping received pong
Pong received ping
Ping received pong
Pong received ping
Ping received pong

這是 ping pong 程式的一個小修改,其中兩個程序都是從同一個 start/1 函數中產生的,並且 "ping" 程序可以在一個單獨的節點上產生。請注意 link BIF 的使用。"Ping" 在完成時呼叫 exit(ping),這會導致向 "pong" 發送退出訊號,這也會終止 "pong"。

可以修改程序的預設行為,使其在收到異常退出訊號時不會被終止。相反,所有訊號都會轉換為格式為 {'EXIT',FromPID,Reason} 的正常訊息,並加入到接收程序的訊息佇列的末尾。此行為由以下設定

process_flag(trap_exit, true)

還有其他幾個程序標誌,請參閱 erlang(3)。以這種方式變更程序的預設行為通常不會在標準使用者程式中執行,而是留給 OTP 中的監管程式。但是,ping pong 程式已修改以說明退出捕獲。

-module(tut21).

-export([start/1,  ping/2, pong/0]).

ping(N, Pong_Pid) ->
    link(Pong_Pid),
    ping1(N, Pong_Pid).

ping1(0, _) ->
    exit(ping);

ping1(N, Pong_Pid) ->
    Pong_Pid ! {ping, self()},
    receive
        pong ->
            io:format("Ping received pong~n", [])
    end,
    ping1(N - 1, Pong_Pid).

pong() ->
    process_flag(trap_exit, true),
    pong1().

pong1() ->
    receive
        {ping, Ping_PID} ->
            io:format("Pong received ping~n", []),
            Ping_PID ! pong,
            pong1();
        {'EXIT', From, Reason} ->
            io:format("pong exiting, got ~p~n", [{'EXIT', From, Reason}])
    end.

start(Ping_Node) ->
    PongPID = spawn(tut21, pong, []),
    spawn(Ping_Node, tut21, ping, [3, PongPID]).
(s1@bill)1> tut21:start(s2@gollum).
<3820.39.0>
Pong received ping
Ping received pong
Pong received ping
Ping received pong
Pong received ping
Ping received pong
pong exiting, got {'EXIT',<3820.39.0>,ping}

新增強健性的較大範例

讓我們回到 messenger 程式並新增變更以使其更強健

%%% Message passing utility.
%%% User interface:
%%% login(Name)
%%%     One user at a time can log in from each Erlang node in the
%%%     system messenger: and choose a suitable Name. If the Name
%%%     is already logged in at another node or if someone else is
%%%     already logged in at the same node, login will be rejected
%%%     with a suitable error message.
%%% logoff()
%%%     Logs off anybody at that node
%%% message(ToName, Message)
%%%     sends Message to ToName. Error messages if the user of this
%%%     function is not logged on or if ToName is not logged on at
%%%     any node.
%%%
%%% One node in the network of Erlang nodes runs a server which maintains
%%% data about the logged on users. The server is registered as "messenger"
%%% Each node where there is a user logged on runs a client process registered
%%% as "mess_client"
%%%
%%% Protocol between the client processes and the server
%%% ----------------------------------------------------
%%%
%%% To server: {ClientPid, logon, UserName}
%%% Reply {messenger, stop, user_exists_at_other_node} stops the client
%%% Reply {messenger, logged_on} logon was successful
%%%
%%% When the client terminates for some reason
%%% To server: {'EXIT', ClientPid, Reason}
%%%
%%% To server: {ClientPid, message_to, ToName, Message} send a message
%%% Reply: {messenger, stop, you_are_not_logged_on} stops the client
%%% Reply: {messenger, receiver_not_found} no user with this name logged on
%%% Reply: {messenger, sent} Message has been sent (but no guarantee)
%%%
%%% To client: {message_from, Name, Message},
%%%
%%% Protocol between the "commands" and the client
%%% ----------------------------------------------
%%%
%%% Started: messenger:client(Server_Node, Name)
%%% To client: logoff
%%% To client: {message_to, ToName, Message}
%%%
%%% Configuration: change the server_node() function to return the
%%% name of the node where the messenger server runs

-module(messenger).
-export([start_server/0, server/0,
         logon/1, logoff/0, message/2, client/2]).

%%% Change the function below to return the name of the node where the
%%% messenger server runs
server_node() ->
    messenger@super.

%%% This is the server process for the "messenger"
%%% the user list has the format [{ClientPid1, Name1},{ClientPid22, Name2},...]
server() ->
    process_flag(trap_exit, true),
    server([]).

server(User_List) ->
    receive
        {From, logon, Name} ->
            New_User_List = server_logon(From, Name, User_List),
            server(New_User_List);
        {'EXIT', From, _} ->
            New_User_List = server_logoff(From, User_List),
            server(New_User_List);
        {From, message_to, To, Message} ->
            server_transfer(From, To, Message, User_List),
            io:format("list is now: ~p~n", [User_List]),
            server(User_List)
    end.

%%% Start the server
start_server() ->
    register(messenger, spawn(messenger, server, [])).

%%% Server adds a new user to the user list
server_logon(From, Name, User_List) ->
    %% check if logged on anywhere else
    case lists:keymember(Name, 2, User_List) of
        true ->
            From ! {messenger, stop, user_exists_at_other_node},  %reject logon
            User_List;
        false ->
            From ! {messenger, logged_on},
            link(From),
            [{From, Name} | User_List]        %add user to the list
    end.

%%% Server deletes a user from the user list
server_logoff(From, User_List) ->
    lists:keydelete(From, 1, User_List).


%%% Server transfers a message between user
server_transfer(From, To, Message, User_List) ->
    %% check that the user is logged on and who he is
    case lists:keysearch(From, 1, User_List) of
        false ->
            From ! {messenger, stop, you_are_not_logged_on};
        {value, {_, Name}} ->
            server_transfer(From, Name, To, Message, User_List)
    end.

%%% If the user exists, send the message
server_transfer(From, Name, To, Message, User_List) ->
    %% Find the receiver and send the message
    case lists:keysearch(To, 2, User_List) of
        false ->
            From ! {messenger, receiver_not_found};
        {value, {ToPid, To}} ->
            ToPid ! {message_from, Name, Message},
            From ! {messenger, sent}
    end.

%%% User Commands
logon(Name) ->
    case whereis(mess_client) of
        undefined ->
            register(mess_client,
                     spawn(messenger, client, [server_node(), Name]));
        _ -> already_logged_on
    end.

logoff() ->
    mess_client ! logoff.

message(ToName, Message) ->
    case whereis(mess_client) of % Test if the client is running
        undefined ->
            not_logged_on;
        _ -> mess_client ! {message_to, ToName, Message},
             ok
end.

%%% The client process which runs on each user node
client(Server_Node, Name) ->
    {messenger, Server_Node} ! {self(), logon, Name},
    await_result(),
    client(Server_Node).

client(Server_Node) ->
    receive
        logoff ->
            exit(normal);
        {message_to, ToName, Message} ->
            {messenger, Server_Node} ! {self(), message_to, ToName, Message},
            await_result();
        {message_from, FromName, Message} ->
            io:format("Message from ~p: ~p~n", [FromName, Message])
    end,
    client(Server_Node).

%%% wait for a response from the server
await_result() ->
    receive
        {messenger, stop, Why} -> % Stop the client
            io:format("~p~n", [Why]),
            exit(normal);
        {messenger, What} ->  % Normal response
            io:format("~p~n", [What])
    after 5000 ->
            io:format("No response from server~n", []),
            exit(timeout)
    end.

新增以下變更

messenger 伺服器會捕獲退出。如果它收到一個退出訊號,{'EXIT',From,Reason},這表示客戶端程序已終止或由於以下原因之一而無法連線

  • 使用者已登出(已移除 "logoff" 訊息)。
  • 與客戶端的網路連線已中斷。
  • 客戶端程序所在的節點已當機。
  • 客戶端程序已執行某些非法操作。

如果收到如上的退出訊號,則使用 server_logoff 函數從伺服器的 User_List 中刪除 tuple {From,Name}。如果執行伺服器的節點當機,則會向所有客戶端程序傳送一個退出訊號(系統自動產生):{'EXIT',MessengerPID,noconnection},導致所有客戶端程序終止。

此外,在 await_result 函數中引入了五秒的逾時。也就是說,如果伺服器在五秒 (5000 毫秒) 內沒有回覆,則客戶端會終止。這僅在客戶端和伺服器連結之前的登入序列中需要。

一個有趣的情況是,如果客戶端在伺服器連結到它之前終止。這會被處理,因為連結到不存在的程序會自動產生一個退出訊號,{'EXIT',From,noproc}。這就像程序在連結操作後立即終止一樣。