C++中的字符串類型

常用的C++的字符串類型主要是std::string。它是模板std::basic_string的一個實例化。另外還有三個實例化std::wstring、std::u16string、std::u32string，不過不是很常用。

std::basic_string<T>
std::string            std::basic_string<char>
std::wstring           std::basic_string<wchar_t>
std::u16string         std::basic_string<char16_t>
std::u32string         std::basic_string<char32_t>

具體可以參考：http://en.cppreference.com/w/cpp/string/basic_string

std::string

標準庫中，std::string的成員函數和相關的算法特別多，從上面給出的鏈接里的內容，粗略計算一下，包括所有的重載函數，也有百余個了。但是在實際的工作使用中，很多時候，總是會感覺，C++對字符串的處理支持實在是弱爆了……感覺這個具有百余個方法的“巨”類用起來總是捉襟見肘。

std::string中的很多操作都是基于迭代器的——這樣的話，很多操作，我們都需要先調用find或者直接遍歷字符串拿到操作區間的迭代器，然后再進行實際的操作。成員函數中：insert、erase、replace都是基于迭代器的操作。

同時，std::string也沒有提供一些常用的字符串處理的方法，比如：簡單的大小寫轉換，字符串連接，字符串分割等。

C++11中，提供了std::string的數字和字符串相互轉換的算法：

字符串==>數字
stoi string to int
stol string to long
stoll string to long long
stoul string to unsigned long
stoull string to unsigned long long
stof string to float
stod string to double
stold string to long double
數字==>字符串
to_string
to_wstring

Boost中的字符串處理

Boost庫通過算法的形式，提供了一些處理C++字符串的函數，雖然比起Java或者其它一些動態語言還是略顯不足，但也算在一定程度上方便了我們對C++的字符串處理。

除了普通的字符串處理算法，Boost庫還提供了一個正則表達式的函數庫Boost.Regex。Boost.Regex已經被納入到C++11的標準之中，但是我們常用的g++4.8.x（比如ubuntu14.04默認的g++版本就是4.8.x，公司的g++版本也是4.8.x）的C++標準庫還沒有實現正則表達式。

實際上，g++4.8.x已經定義了標準庫正則表達式的類型和接口，但是只是占了個坑，并沒有真正實現……結果可以編譯通過,但是運行一直拋出異常。gcc4.9才真正實現了標準庫的正則表達式。

下面通過例子介紹一個Boost提供的字符串處理算法以及Boost.Regex的用法。

Boost的字符串算法

頭文件：#include <boost/algorithm/string.hpp>
Boost的很多修改字符串的算法都提供了直接修改傳入字符串,名字不帶copy和返回一個新的字符串,名字帶copy兩個版本。

字符串大小寫轉換

C++標準庫竟然連一個字符串大小寫的轉換函數都沒有提供。

boost::algorithm::to_upper(), boost::algorithm::to_lower()直接修改傳入的字符串,將其轉換為對應字符串的大寫或小寫。
boost::algorithm::to_upper_copy(), boost::algorithm::to_lower_copy()返回一個新的大寫或小寫字符串。

例子：

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
using namespace std;
int main()
{
    std::string s("AbCdefG123 HijkLmn");
    cout << boost::algorithm::to_upper_copy(s) << endl;
    boost::algorithm::to_lower(s);
    cout << s << endl;
}
輸出結果：
ABCDEFG123 HIJKLMN
abcdefg123 hijklmn

子串刪除。

std::string提供了幾個erase成員函數，都是基于“位置（下標或迭代器）”的刪除：

basic_string& erase(size_type index = 0, size_type count = npos);
iterator erase(iterator position);
iterator erase(const_iterator position);
iterator erase(iterator first, iterator last);
iterator erase(const_iterator first, const_iterator last);

STL提供的remove系列的算法,由于其需要與其他容器通用,其刪除時的比較函數只能是一個字符之間的比較(std::string中的一個字符相當于vector中的一個元素)。

ForwardIt remove(ForwardIt first, ForwardIt last, const T& value);
ForwardIt remove_if(ForwardIt first, ForwardIt last, UnaryPredicate p);
OutputIt remove_copy(InputIt first, InputIt last, OutputIt d_first, const T& value);
OutputIt remove_copy_if(InputIt first, InputIt last, OutputIt d_first, UnaryPredicate p);

Boost提供了刪除字符串子串的算法。

erase_all()刪除主串中所有相等的子串。
erase_first()刪除主串中第一個相等的子串。
erase_nth()刪除主串中的第n個子串。**注意這里的n是從0開始的。**
erase_head()刪除主串的前n個字符。
erase_tail()刪除組成的后n個字符。
erase系列的copy版本

例子：

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
using namespace std;
int main()
{
    std::string s("AbCdefG123 HijkLmn");
    s += s;
    s += s;
    string s0 = s;
    cout << "Input String: " << s0 << endl;
    cout << boost::algorithm::erase_all_copy(s0, "AbC") << endl;
    cout << boost::algorithm::ierase_all_copy(s0, "ABC") << endl;
    cout << boost::algorithm::erase_first_copy(s0, "defG123") << endl;
    cout << boost::algorithm::ierase_first_copy(s0, "DEFG123") << endl;
    cout << boost::algorithm::erase_nth_copy(s0, "HijkLmn", 1) << endl;
    cout << boost::algorithm::ierase_nth_copy(s0, "HIJKLMN", 1) << endl;
    cout << boost::algorithm::erase_head_copy(s0, 3) << endl;
    cout << boost::algorithm::erase_tail_copy(s0, 5) << endl;
}
輸出結果：
Input String: AbCdefG123 HijkLmnAbCdefG123 HijkLmnAbCdefG123 HijkLmnAbCdefG123 HijkLmn
defG123 HijkLmndefG123 HijkLmndefG123 HijkLmndefG123 HijkLmn
defG123 HijkLmndefG123 HijkLmndefG123 HijkLmndefG123 HijkLmn
AbC HijkLmnAbCdefG123 HijkLmnAbCdefG123 HijkLmnAbCdefG123 HijkLmn
AbC HijkLmnAbCdefG123 HijkLmnAbCdefG123 HijkLmnAbCdefG123 HijkLmn
AbCdefG123 HijkLmnAbCdefG123 AbCdefG123 HijkLmnAbCdefG123 HijkLmn
AbCdefG123 HijkLmnAbCdefG123 AbCdefG123 HijkLmnAbCdefG123 HijkLmn
defG123 HijkLmnAbCdefG123 HijkLmnAbCdefG123 HijkLmnAbCdefG123 HijkLmn
AbCdefG123 HijkLmnAbCdefG123 HijkLmnAbCdefG123 HijkLmnAbCdefG123 Hi

子串查找

Boost的這些字符串的find算法的返回值都是boost::iterator_range類型的一對迭代器。
find_first() 查找第一個匹配的子串。std::string::find能實現一樣的功能。（find_first的實現應該是封裝了這個成員函數,不過個人感覺這個算法用起來更方便。）
find_last() 查找最后一個匹配的子串。std::string::rfind能實現一樣的功能。
find_nth() 查找第n(n>=0)個匹配的字符串。
find_head(s, n) 返回字符串的前n個字符。
find_tail(s, n) 返回字符串的最后n個字符。

例子:

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
using namespace std;
int main()
{
    std::string s("AbCdefG123 HijkLmn");
    s += s;
    s += s;
    cout << "Input String: " << s << endl;
    boost::iterator_range<std::string::iterator> itRange = boost::algorithm::find_first(s, "123");
    cout << itRange << endl;
    cout << itRange.begin() - s.begin() << endl;
    cout << itRange.end() - s.begin() << endl;
    itRange = boost::algorithm::find_last(s, "123");
    cout << itRange << endl;
    cout << itRange.begin() - s.begin() << endl;
    cout << itRange.end() - s.begin() << endl;
    itRange = boost::algorithm::find_nth(s, "123", 1);
    cout << itRange << endl;
    cout << itRange.begin() - s.begin() << endl;
    cout << itRange.end() - s.begin() << endl;
    itRange = boost::algorithm::find_head(s, 5);
    cout << itRange << endl;
    cout << itRange.begin() - s.begin() << endl;
    cout << itRange.end() - s.begin() << endl;
    itRange = boost::algorithm::find_tail(s, 5);
    cout << itRange << endl;
    cout << itRange.begin() - s.begin() << endl;
    cout << itRange.end() - s.begin() << endl;
}

輸出結果：
123
7
10
123
61
64
123
25
28
AbCde
0
5
jkLmn
67
72

連接字符串

Boost庫提供了join()算法接受一個字符串容器作為第一個參數，根據第二個參數將這些字符串連接起來。

例子：

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
#include <vector>
using namespace std;
int main()
{
    vector<string> sVec{"ABC", "def", "GHIJK", "123456"};
    cout << boost::algorithm::join(sVec, "+**+") << endl;
}
輸出結果：
ABC+**+def+**+GHIJK+**+123456

替換字符串

std::string提供的replace成員函數的重載很多,但是都只提供了基于位置(index或iterator)的替換操作 (http://en.cppreference.com/w/cpp/string/basic_string/replace), 沒有基于子串比較再替換的操作。
Boost提供了基于比較的子串替換算法。

replace_first()替換第一個匹配的字符串。
replace_nth()替換第n(n>=0)個匹配的字符串。
replace_last()替換最后一個匹配的字符串。
replace_all()替換所有匹配的字符串。
replace系列的copy版本。

例子:

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
using namespace std;
int main()
{
    string s("AbcDeFGHIJklmn");
    s += s;
    s += s;
    cout << "Input String: " << s << endl;
    cout << boost::algorithm::replace_all_copy(s, "AbcD", "**") << endl;
    cout << boost::algorithm::replace_first_copy(s, "AbcD", "**") << endl;
    cout << boost::algorithm::replace_last_copy(s, "AbcD", "**") << endl;
    cout << boost::algorithm::replace_nth_copy(s, "AbcD", 1, "**") << endl;
}
輸出結果：
Input String: AbcDeFGHIJklmnAbcDeFGHIJklmnAbcDeFGHIJklmnAbcDeFGHIJklmn
**eFGHIJklmn**eFGHIJklmn**eFGHIJklmn**eFGHIJklmn
**eFGHIJklmnAbcDeFGHIJklmnAbcDeFGHIJklmnAbcDeFGHIJklmn
AbcDeFGHIJklmnAbcDeFGHIJklmnAbcDeFGHIJklmn**eFGHIJklmn
AbcDeFGHIJklmn**eFGHIJklmnAbcDeFGHIJklmnAbcDeFGHIJklmn

消除字符串兩端的特殊字符

很多時候,我們會希望刪除字符左右兩邊的空白字符。Boost提供了幾個算法來實現這個功能。

trim_left()刪除字符串左邊的空白。
trim_right()刪除字符串右邊的空白。
trim()刪除字符串左右兩邊的空白。
trim系列的copy版本。

有時候,我們想要刪除的不僅僅是字符串左右兩邊的空白,而是其它一下特定的字符。

trim_left_if()
trim_right_if()
trim_if()
trim_if系列的copy版本,如果`trim_left_copy_if`...

Boost庫的if系列算法通常傳入一個"謂詞參數", 如:

is_any_of
is_space 是否是空白字符。
is_alnum是否是字母或數字。
is_alpha是否時字母。
is_cntrl是否控制字符。
is_digit是否十進制數字。
is_graph是否圖形字符。
is_lower是否小寫字母。
is_print是否可打印字符。
is_punct是否標點符號。
is_upper是否大寫字符。
is_xdigit是否十六進制數字。
is_from_range(from, to)是否from <= ch <= to。

例子:

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
using namespace std;
int main()
{
    string s("   AbcDeF GHIJklmn      ");
    cout << "Input String: <" << s << '>' << endl;
    cout << '<' << boost::algorithm::trim_left_copy(s) << '>' << endl;
    cout << '<' << boost::algorithm::trim_right_copy(s) << '>' << endl;
    cout << '<' << boost::algorithm::trim_copy(s) << '>' << endl;
    cout << endl;
    string s1("==ABCD Efgh=IJK==-==   ");
    cout << "Input String: <" << s1 << '>' << endl;
    cout << '<' << boost::algorithm::trim_copy_if(s, boost::algorithm::is_any_of(" -=")) << '>' << endl;
}
    輸出結果：
    Input String: <   AbcDeF GHIJklmn      >
    <AbcDeF GHIJklmn      >
    <   AbcDeF GHIJklmn>
    <AbcDeF GHIJklmn>
    Input String: <==ABCD Efgh=IJK==-==   >
    <AbcDeF GHIJklmn>

匹配比較

starts_with(s, sub) s是否以sub開頭, 即前綴。
ends_with(s, sub) s是否以sub結尾, 即后綴。
contains(s, sub) s是否包含sub。

例子：

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
using namespace std;
int main()
{
    string s("abcdefGHIJKLMN");
    cout << boost::algorithm::starts_with(s, "abcd") << endl;
    cout << boost::algorithm::starts_with(s, "abcD") << endl;
    cout << boost::algorithm::ends_with(s, "MN") << endl;
    cout << boost::algorithm::ends_with(s, "mn") << endl;
    cout << boost::algorithm::contains(s, "efG") << endl;
    cout << boost::algorithm::contains(s, "WWW") << endl;
}
輸出結果：
1
0
1
0
1
0

分割字符串

Boost庫提供了split算法,根據指定的字符集合對字符串進行分割。

例子:

#include <boost/algorithm/string.hpp>
#include <string>
#include <iostream>
#include <algorithm>
using namespace std;
int main()
{
    string s("abc 123 cde");
    vector<string> sVec;
    boost::algorithm::split(sVec, s, boost::algorithm::is_space());
    for (auto& str : sVec)
    {
        cout << str << endl;
    }
    cout << "--------分割線--------" << endl;
    s = " abc 123 cde   ";
    boost::algorithm::split(sVec, s, boost::algorithm::is_space());
    for (auto& str : sVec)
    {
        cout << str << endl;
    }
    cout << "--------分割線--------" << endl;
    s = "--abc 123--cde-";
    boost::algorithm::split(sVec, s, boost::algorithm::is_any_of(" -"));
    for (auto& str : sVec)
    {
        cout << str << endl;
    }
}
輸出結果：
abc
123
cde
--------分割線--------
//空行
abc
123
cde
//空行
//空行
//空行
--------分割線--------
//空行
//空行
abc
123
//空行
cde
//空行

注: boost的很多(但不是全部)字符串算法都帶有忽略大小寫的版本，相差只是以'i'開頭。

正則表達式

簡介

簡單地說，Boost提供了三個類型和三個算法來處理正則表達式：

三個類型
- 正則表達式使用boost::regex來表示。
- 正則表達式的匹配的子串結果使用boost::smatch和boost::sub_match來表示。
三個算法
- 判斷整個字符串是否與正則表達式匹配：boost::regex_match()
- 在字符串中搜索與正則表達式匹配的子串：boost::regex_search()
- 替換掉字符串中所有與正則表達式匹配的字串：boost::regex_replace()

關于正則表達式的學習，可以參考這篇文章。

例子

** 下面通過例子和注釋簡單說明其用法。**

#include <boost/regex.hpp>
#include <iostream>
#include <string>
using namespace std;
int main()
{
    boost::regex rgx("(\\w+)\\s(\\w+)"); 
    string s("abcd efgh");
    // boost::regex_match() 當字符串和正則表達式<完全匹配>的時候返回true，
    // 否則返回false。
     cout << boost::regex_match(s, rgx) << endl; 
     cout << "========分割線========" << endl;
     // boost::regex_search() 找到第一個和正則表達式匹配的子串則返回true，
     // 具體匹配子串的信息存放在boost::smatch類型的參數里。否則返回false。
     // boost::smatch實際上是持有boost::sub_match的元素的容器。
     // boost::sub_match繼承自類std::pair，
     // 對應的匹配子串由first和second成員表示：[first, second)。
     boost::smatch result;
     if (boost::regex_search(s, result, rgx))
     {
         for (size_t i = 0; i < result.size(); ++i)
        {
            //result[0] 正則表達式的匹配結果。
            //result[1] 第一個分組的匹配結果。
            //result[2] 第二個分組的匹配結果。
            cout << result[i] << endl;
         }
     }
     cout << "========分割線========" << endl;
 
     rgx = "(\\w+)\\s\\w+";
     if (boost::regex_search(s, result, rgx))
     {
         for (size_t i = 0; i < result.size(); ++i)
         {
             //result[0] 正則表達式的匹配結果
             //result[1] 分組的匹配結果
             cout << result[i] << endl;
         }
     }
     cout << "========分割線========" << endl;
 
     rgx = "\\w+\\s(\\w+)";
     if (boost::regex_search(s, result, rgx))
     {
         for (size_t i = 0; i < result.size(); ++i)
         {
            cout << result[i] << endl;
         }
     }
     cout << "========分割線========" << endl;
     rgx = "\\w+\\s\\w+";
     if (boost::regex_search(s, result, rgx))
     {
         for (size_t i = 0; i < result.size(); ++i)
         {
              cout << result[i] << endl;
          }
      }
      cout << "========分割線========" << endl;
      rgx = "(\\d+)\\s(\\w+)";
      if (boost::regex_search(s, result, rgx))
      {
         for (size_t i = 0; i < result.size(); ++i)
         {
             cout << result[i] << endl;
         }
      }
      cout << "========分割線========" << endl;
      // 遍歷正則匹配整個字符串。
      s = "abcd efgh ijk www";
      rgx = "\\w+\\s\\w+";
      auto begin = s.cbegin();
      auto end = s.cend();
      while (boost::regex_search(begin, end, result, rgx))
      {
           cout << result[0] << endl;
           begin = result[0].second;
      }
      cout << "========分割線========" << endl;

    // boost::regex_replace() 替換掉字符串中<所有>匹配的子串。

    //結果輸出到一個Output Iterator。
     boost::regex_replace(std::ostreambuf_iterator<char>(std::cout), s.cbegin(), s.cend(), rgx, "666666");
     cout << endl;
      //直接返回結果
     cout << boost::regex_replace(s, rgx, "2233333") << endl; //每一個匹配
}

輸出結果：
1
========分割線========
abcd efgh
abcd
efgh
========分割線========
abcd efgh
abcd
========分割線========
abcd efgh
efgh
========分割線========
abcd efgh
========分割線========
========分割線========
abcd efgh
ijk www
========分割線========
666666 666666
2233333 2233333

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

C++字符串處理小結

C++字符串處理小結

C++中的字符串類型

std::string

Boost中的字符串處理

Boost的字符串算法

字符串大小寫轉換

子串刪除。

子串查找

連接字符串

替換字符串

消除字符串兩端的特殊字符

匹配比較

分割字符串

正則表達式

簡介

例子

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

C++字符串處理小結

C++中的字符串類型

std::string

Boost中的字符串處理

Boost的字符串算法

字符串大小寫轉換

子串刪除。

子串查找

連接字符串

替換字符串

消除字符串兩端的特殊字符

匹配比較

分割字符串

正則表達式

簡介

例子

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频