Low-Level Routines Are Error-Prone

C++ Primer 4/e在The IO Library Revisited這個地方有一個警告:『In general, we advocate using the higher-level abstractions provided by the library. The IO operations that return int are a good example of why.

It is a common programming error to assign the return from get or one of the other int returning functions to a char rather than an int. Doing so is an error but an error the compiler will not detect. Instead, what happens depends on the machine and on the input data. For example, on a machine in which chars are implemented as unsigned chars, this loop will run forever:

char ch; // Using a char here invites disaster!

     // return from cin.get is converted from int to char and then compared to an int
     while ((ch = cin.get()) != EOF)

The problem is that when get returns EOF, that value will be converted to an unsigned char value. That converted value is no longer equal to the integral value of EOF, and the loop will continue forever.

At least that error is likely to be caught in testing. On machines for which chars are implemented as signed chars, we can’t say with confidence what the behavior of the loop might be. What happens when an out-of-bounds value is assigned to a signed value is up to the compiler. On many machines, this loop will appear to work, unless a character in the input matches the EOF value. While such characters are unlikely in ordinary data, presumably low-level IO is necessary only when reading binary values that do not map directly to ordinary characters and numeric values. For example, on our machine, if the input contains a character whose value is '\377' then the loop terminates prematurely. '\377' is the value on our machine to which -1 converts when used as a signed char. If the input has this value, then it will be treated as the (premature) end-of-file indicator.

Such bugs do not happen when reading and writing typed values. If you can use the more type-safe, higher-level operations supported by the library, do so.』


將get()或其他任何「返回int」的函式的返回值賦值給某個char變數,是個常見的編程錯誤。這麼做會造成錯誤,但編譯器確無法察覺。其所導致的結果取決於機器本身和被輸入的資料。如果在某一個機器上chars被實作為unsigned chars,下面的迴圈就會無窮執行下去:

char ch;//這裡使用char會帶來災難!
//cin.get 返回植被轉換,從int轉為char,而後和一個int比較
while((ch = cin.get()) != EOF)

問題在於當get()返回EOF時,其值會被轉為unsigned char,於是不再等於EOF整數值,導致迴圈永遠執行下去。

這個錯誤很可能在測試中遇上。另,如果你的機器將char實作為singned char,那就無從確認上述迴圈會發生什麼事。把一個越界值賦給signed數值時會發生什麼事得由編譯器決定。在很多機器上這個迴圈似乎可以正常運作,除非遇到input stream內有個字元與EOF值相等。由於這樣的字元不大可能出現於一般資料中,所以可推斷低階IO只在以下的情況才必要:讀取的二進制值沒有直接映射至常用字元和數值。例如我的機器上如果輸入資料包含一個其值為’\377’的字元,上述迴圈就會非正常結束。在我得的機器上’\377’是-1被視為signed char轉換而得的值。如果輸入資料含有這個值,它就會被(不正常地)視為end-of-file。



Print Friendly, PDF & Email


發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *

這個網站採用 Akismet 服務減少垃圾留言。進一步瞭解 Akismet 如何處理網站訪客的留言資料