檢視原始碼 file_sorter (stdlib v6.2)

檔案排序器。

此模組包含在檔案上排序詞語、合併已排序檔案和檢查檔案是否已排序的功能。包含二進位詞語的區塊會從一系列檔案中讀取，在記憶體中進行內部排序，並寫入暫存檔，然後合併這些暫存檔以產生一個已排序的輸出檔。合併是一種最佳化方法；當檔案已經排序時速度更快，但它始終可以執行排序而非合併。

在檔案中，一個詞語由標頭和二進位表示。有兩個選項定義檔案中詞語的格式：

{header, HeaderLength} - HeaderLength 決定每個二進位之前的前導位元組數，其中包含二進位的位元組長度。預設值為 4。標頭位元組的順序定義如下：如果 B 是僅包含標頭的二進位，則二進位的大小 Size 計算為 <<Size:HeaderLength/unit:8>> = B。
{format, Format} - 選項 Format 決定套用於二進位以建立要排序的詞語的函式。預設值為 binary_term，相當於 fun binary_to_term/1。值 binary 相當於 fun(X) -> X end，這表示二進位會按照原樣排序。這是最快的格式。如果 Format 是 term，則會呼叫 io:read/2 來讀取詞語。在這種情況下，僅允許選項 header 的預設值。
選項 format 也會決定要寫入已排序輸出檔的內容：如果 Format 是 term，則會呼叫 io:format/3 來寫入每個詞語，否則會寫入以標頭作為前綴的二進位。請注意，寫入的二進位與讀取的二進位相同；當詞語排序完成後，會捨棄套用函式 Format 的結果。使用 io 模組讀取和寫入詞語的速度遠低於讀取和寫入二進位。

其他選項包括：

{order, Order} - 預設為以遞增順序排序詞語，但是可以使用值 descending 或指定排序函式 Fun 來變更此行為。排序函式必須是反對稱、傳遞和完全的。Fun(A, B) 如果 A 在排序中位於 B 之前，則必須傳回 true，否則傳回 false。典型的排序函式範例是小於或等於，=</2。使用排序函式會大幅降低排序速度。函式 keysort、keymerge 和 keycheck 不接受排序函式。
{unique, boolean()} - 排序或合併檔案時，如果此選項設定為 true，則僅輸出比較相等的 (==) 一連串詞語中的第一個。預設值為 false，表示會輸出所有比較相等的詞語。檢查檔案是否已排序時，如果此選項設定為 true，則會執行檢查，確保沒有連續的詞語對比較相等。
{tmpdir, TempDirectory} - 可以明確選擇暫存檔的放置目錄。預設值，即值 "" 所表示的，是將暫存檔放置在與已排序輸出檔相同的目錄中。如果輸出是函式（請參閱下文），則會改為使用 file:get_cwd() 傳回的目錄。暫存檔的名稱衍生自 Erlang 節點名稱 (node/0)、目前 Erlang 模擬器的程序 ID (os:getpid()) 和唯一的整數 (erlang:unique_integer([positive]))。典型的名稱為 fs_mynode@myhost_1763_4711.17，其中 17 是序號。現有檔案會被覆寫。除非發生某些未捕獲的 EXIT 訊號，否則會刪除暫存檔。
{compressed, boolean()} - 可以壓縮暫存檔和輸出檔。預設值為 false，表示寫入的檔案不會壓縮。無論選項 compressed 的值為何，始終可以讀取壓縮的檔案。請注意，讀取和寫入壓縮的檔案的速度明顯慢於讀取和寫入未壓縮的檔案。
{size, Size} - 預設情況下，從檔案讀取的約 512*1024 個位元組會在內部排序。這個選項很少需要。
{no_files, NoFiles} - 預設情況下，一次合併 16 個檔案。這個選項很少需要。

作為排序檔案的替代方法，可以指定一個參數的函式作為輸入。當使用參數 read 呼叫時，假設函式會傳回下列其中一項：

end_of_input 或 {end_of_input, Value}}，表示沒有更多輸入（下文將說明 Value）。
{Objects, Fun}，其中 Objects 是二進位或詞語的清單，具體取決於格式，而 Fun 是一個新的輸入函式。

任何其他值都會立即傳回為目前對 sort 或 keysort 呼叫的值。每個輸入函式都會恰好呼叫一次。如果發生錯誤，則會使用引數 close 呼叫最後一個函式，其回覆會被忽略。

可以指定一個參數的函式作為輸出。排序或合併輸入的結果會收集在一個非空的變長清單序列中，清單中的內容是二進位或詞語，具體取決於格式。會一次使用一個清單呼叫輸出函式，並假設其會傳回一個新的輸出函式。任何其他傳回值都會立即傳回為目前對排序或合併函式的呼叫值。每個輸出函式都會恰好呼叫一次。當某些輸出函式已套用於所有結果或發生錯誤時，會使用引數 close 呼叫最後一個函式，並傳回回覆作為目前對排序或合併函式的呼叫值。

如果指定函式作為輸入，並且最後一個輸入函式傳回 {end_of_input, Value}，則會使用引數 {value, Value} 呼叫指定為輸出的函式。這使得能夠輕易使用輸入函式計算的值來起始輸出函式的序列。

舉例來說，假設您要排序磁碟記錄檔上的詞語。一個從磁碟記錄讀取區塊並傳回二進位清單的函式會用作輸入。結果會收集在詞語清單中。

sort(Log) ->
    {ok, _} = disk_log:open([{name,Log}, {mode,read_only}]),
    Input = input(Log, start),
    Output = output([]),
    Reply = file_sorter:sort(Input, Output, {format,term}),
    ok = disk_log:close(Log),
    Reply.

input(Log, Cont) ->
    fun(close) ->
            ok;
       (read) ->
            case disk_log:chunk(Log, Cont) of
                {error, Reason} ->
                    {error, Reason};
                {Cont2, Terms} ->
                    {Terms, input(Log, Cont2)};
                {Cont2, Terms, _Badbytes} ->
                    {Terms, input(Log, Cont2)};
                eof ->
                    end_of_input
            end
    end.

output(L) ->
    fun(close) ->
            lists:append(lists:reverse(L));
       (Terms) ->
            output([Terms | L])
    end.

如需有關將函式用作輸入和輸出的更多範例，請參閱 file_sorter 模組的末尾；term 格式是使用函式來實作的。

發生錯誤時傳回的 Reason 可能值為：

bad_object、{bad_object, FileName} - 套用格式函式至某些二進位失敗，或無法從某些詞語中擷取鍵。
{bad_term, FileName} - io:read/2 無法讀取某些詞語。
{file_error, FileName, file:posix()} - 如需 file:posix() 的說明，請參閱 file。
{premature_eof, FileName} - 在某些二進位詞語內遇到檔案結尾。

摘要

類型

file_name()

file_names()

format()

format_fun()

header_length()

i_command()

i_reply()

infun()

input()

input_reply()

key_pos()

no_files()

o_command()

o_reply()

object()

option()

options()

order()

order_fun()

outfun()

output()

output_reply()

reason()

size()

tmp_directory()

value()

函式

check(FileName)

相當於 check([FileName], [])。

check(FileNames, Options)

檢查檔案是否已排序。如果檔案未排序，則會傳回第一個順序錯誤的元素。檔案中的第一個詞語的位置為 1。

keycheck(KeyPos, FileName)

相當於 keycheck(KeyPos, [Filename], [])。

keycheck(KeyPos, FileNames, Options)

檢查檔案是否已排序。如果檔案未排序，則會傳回第一個順序錯誤的元素。檔案中的第一個詞語的位置為 1。

keymerge(KeyPos, FileNames, Output)

相當於 keymerge(KeyPos, FileNames, Output, [])。

keymerge(KeyPos, FileNames, Output, Options)

合併檔案上的元組。假設每個輸入檔案都按鍵排序。

keysort(KeyPos, FileName)

排序檔案上的元組。

keysort(KeyPos, Input, Output)

相當於 keysort(KeyPos, Input, Output, [])。

keysort(KeyPos, Input, Output, Options)

排序檔案上的元組。排序是在 KeyPos 中提及的元素上執行。如果兩個元組在一個元素上比較相等 (==)，則會比較根據 KeyPos 的下一個元素。排序是穩定的。

merge(FileNames, Output)

相當於 merge(FileNames, Output, [])。

merge(FileNames, Output, Options)

合併檔案上的詞語。假設每個輸入檔案都已排序。

sort(FileName)

排序檔案上的詞語。

sort(Input, Output)

相當於 sort(Input, Output, [])。

sort(Input, Output, Options)

排序檔案上的詞語。

類型

file_name()

(未匯出)

-type file_name() :: file:name().

file_names()

(未匯出)

-type file_names() :: [file:name()].

format()

(未匯出)

-type format() :: binary_term | term | binary | format_fun().

format_fun()

(未匯出)

-type format_fun() :: fun((binary()) -> term()).

header_length()

(未匯出)

-type header_length() :: pos_integer().

i_command()

(未匯出)

-type i_command() :: read | close.

i_reply()

(未匯出)

-type i_reply() :: end_of_input | {end_of_input, value()} | {[object()], infun()} | input_reply().

infun()

(未匯出)

-type infun() :: fun((i_command()) -> i_reply()).

input()

(未匯出)

-type input() :: file_names() | infun().

input_reply()

(未匯出)

-type input_reply() :: term().

key_pos()

(未匯出)

-type key_pos() :: pos_integer() | [pos_integer()].

no_files()

(未匯出)

-type no_files() :: pos_integer().

o_command()

(未匯出)

-type o_command() :: {value, value()} | [object()] | close.

o_reply()

(未匯出)

-type o_reply() :: outfun() | output_reply().

object()

(未匯出)

-type object() :: term() | binary().

option()

(未匯出)

-type option() ::
          {compressed, boolean()} |
          {header, header_length()} |
          {format, format()} |
          {no_files, no_files()} |
          {order, order()} |
          {size, size()} |
          {tmpdir, tmp_directory()} |
          {unique, boolean()}.

options()

(未匯出)

-type options() :: [option()] | option().

order()

(未匯出)

-type order() :: ascending | descending | order_fun().

order_fun()

(未匯出)

-type order_fun() :: fun((term(), term()) -> boolean()).

outfun()

(未匯出)

-type outfun() :: fun((o_command()) -> o_reply()).

output()

(未匯出)

-type output() :: file_name() | outfun().

output_reply()

(未匯出)

-type output_reply() :: term().

reason()

-type reason() ::
          bad_object |
          {bad_object, file_name()} |
          {bad_term, file_name()} |
          {file_error, file_name(), file:posix() | badarg | system_limit} |
          {premature_eof, file_name()}.

size()

(未匯出)

-type size() :: non_neg_integer().

tmp_directory()

(未匯出)

-type tmp_directory() :: [] | file:name().

value()

(未匯出)

-type value() :: term().

函式

check(FileName)

-spec check(FileName) -> Reply
               when
                   FileName :: file_name(),
                   Reply :: {ok, [Result]} | {error, reason()},
                   Result :: {FileName, TermPosition, term()},
                   TermPosition :: pos_integer().

相當於 check([FileName], [])。

check(FileNames, Options)

-spec check(FileNames, Options) -> Reply
               when
                   FileNames :: file_names(),
                   Options :: options(),
                   Reply :: {ok, [Result]} | {error, reason()},
                   Result :: {FileName, TermPosition, term()},
                   FileName :: file_name(),
                   TermPosition :: pos_integer().

檢查檔案是否已排序。如果檔案未排序，則會傳回第一個順序錯誤的元素。檔案中的第一個詞語的位置為 1。

keycheck(KeyPos, FileName)

-spec keycheck(KeyPos, FileName) -> Reply
                  when
                      KeyPos :: key_pos(),
                      FileName :: file_name(),
                      Reply :: {ok, [Result]} | {error, reason()},
                      Result :: {FileName, TermPosition, term()},
                      TermPosition :: pos_integer().

相當於 keycheck(KeyPos, [Filename], [])。

keycheck(KeyPos, FileNames, Options)

-spec keycheck(KeyPos, FileNames, Options) -> Reply
                  when
                      KeyPos :: key_pos(),
                      FileNames :: file_names(),
                      Options :: options(),
                      Reply :: {ok, [Result]} | {error, reason()},
                      Result :: {FileName, TermPosition, term()},
                      FileName :: file_name(),
                      TermPosition :: pos_integer().

檢查檔案是否已排序。如果檔案未排序，則會傳回第一個順序錯誤的元素。檔案中的第一個詞語的位置為 1。

keymerge(KeyPos, FileNames, Output)

-spec keymerge(KeyPos, FileNames, Output) -> Reply
                  when
                      KeyPos :: key_pos(),
                      FileNames :: file_names(),
                      Output :: output(),
                      Reply :: ok | {error, reason()} | output_reply().

相當於 keymerge(KeyPos, FileNames, Output, [])。

keymerge(KeyPos, FileNames, Output, Options)

-spec keymerge(KeyPos, FileNames, Output, Options) -> Reply
                  when
                      KeyPos :: key_pos(),
                      FileNames :: file_names(),
                      Output :: output(),
                      Options :: options(),
                      Reply :: ok | {error, reason()} | output_reply().

合併檔案上的元組。假設每個輸入檔案都按鍵排序。

keysort(KeyPos, FileName)

-spec keysort(KeyPos, FileName) -> Reply
                 when
                     KeyPos :: key_pos(),
                     FileName :: file_name(),
                     Reply :: ok | {error, reason()} | input_reply() | output_reply().

排序檔案上的元組。

keysort(KeyPos, Input, Output)

-spec keysort(KeyPos, Input, Output) -> Reply
                 when
                     KeyPos :: key_pos(),
                     Input :: input(),
                     Output :: output(),
                     Reply :: ok | {error, reason()} | input_reply() | output_reply().

相當於 keysort(KeyPos, Input, Output, [])。

keysort(KeyPos, Input, Output, Options)

-spec keysort(KeyPos, Input, Output, Options) -> Reply
                 when
                     KeyPos :: key_pos(),
                     Input :: input(),
                     Output :: output(),
                     Options :: options(),
                     Reply :: ok | {error, reason()} | input_reply() | output_reply().

merge(FileNames, Output)

-spec merge(FileNames, Output) -> Reply
               when
                   FileNames :: file_names(),
                   Output :: output(),
                   Reply :: ok | {error, reason()} | output_reply().

相當於 merge(FileNames, Output, [])。

merge(FileNames, Output, Options)

-spec merge(FileNames, Output, Options) -> Reply
               when
                   FileNames :: file_names(),
                   Output :: output(),
                   Options :: options(),
                   Reply :: ok | {error, reason()} | output_reply().

合併檔案上的詞語。假設每個輸入檔案都已排序。

sort(FileName)

-spec sort(FileName) -> Reply
              when
                  FileName :: file_name(),
                  Reply :: ok | {error, reason()} | input_reply() | output_reply().

排序檔案上的詞語。

sort(Input, Output)

-spec sort(Input, Output) -> Reply
              when
                  Input :: input(),
                  Output :: output(),
                  Reply :: ok | {error, reason()} | input_reply() | output_reply().

相當於 sort(Input, Output, [])。

sort(Input, Output, Options)

-spec sort(Input, Output, Options) -> Reply
              when
                  Input :: input(),
                  Output :: output(),
                  Options :: options(),
                  Reply :: ok | {error, reason()} | input_reply() | output_reply().

排序檔案上的詞語。