Boost.Pool - Cry's Blog

#include <cstddef>
#include <list>
#include <iostream>

#include <boost/pool/pool_alloc.hpp>
#include <boost/timer.hpp>

int main()
{
  int const n = 10000000;

  boost::timer t;
  double d = 0.0;
  {
    std::list< int, ＊＊＊＊＊ ここを色々取り替える ＊＊＊＊＊ > l;
    for( int i = 0; i < n; ++i ){
      l.push_back( 42 );
      //l.pop_back(); // allocation/dealloation の繰り返しではここを comment in.
    }
    d = t.elapsed();
    std::cout << d << std::endl;
  }
  std::cout << "on destruction: " << t.elapsed() - d << std::endl;

  return EXIT_SUCCESS;
}

MSVC7.1 on Windows XP, Athlon 2500+

シングルスレッドビルド

allocation/deallocation の繰り返し

std::allocator	3.890s
boost::pool_allocator	3.156s
boost::fast_pool_allocator	0.468s

allocation を繰り返して最後に deallocation

std::allocator	3.046s + 3.204s
boost::pool_allocator	N/A
boost::fast_pool_allocator	1.203s + 0.343s

マルチスレッドビルド

allocation/deallocation の繰り返し

std::allocator	3.937s
boost::pool_allocator	3.843s
boost::fast_pool_allocator	1.093s

allocation を繰り返して最後に deallocation

std::allocator	5.406s + 4.844s
boost::pool_allocator	N/A
boost::fast_pool_allocator	1.578s + 0.765s

GCC4.1.1 on Linux, XEON 2.4GHz x2

シングルスレッドビルド

allocation/deallocation の繰り返し

std::allocator	1.08s
boost::pool_allocator	2.24s
boost::fast_pool_allocator	0.56s

allocation を繰り返して最後に deallocation

std::allocator	1.15s + 0.47s
boost::pool_allocator	N/A
boost::fast_pool_allocator	0.64s + 0.22s

マルチスレッドビルド

allocation/deallocation の繰り返し

std::allocator	1.15s
boost::pool_allocator	4.78s
boost::fast_pool_allocator	3.20s

allocation を繰り返して最後に deallocation

std::allocator	1.14s + 0.46s
boost::pool_allocator	N/A
boost::fast_pool_allocator	2.01s + 1.53s

結果

単純かつ理想的なコードなのであんまり意味が無いけれどｷﾆｼﾅｲ!! とりあえず boost::pool_allocator はチャンク数が増えると急激にパフォーマンスが悪化することは分かった（deallocation が激しく遅くなる）．
あくまで自分の環境での結果として，

Windows 環境では一定して Boost.Pool が outperform．特に， fast_pool_allocator による改善は著しい (小規模チャンクのみで評価しているから， fast_pool_allocator 有利なのは当たり前だけれど)
GCC 環境ではシングルスレッド環境で Boost.Pool が outperform する可能性はある．しかし，マルチスレッド環境ではデフォルトの allocation に勝てる見込みはなさそう

な感じのまとめか．上のコードは (典型的な 32bit 環境では) 12バイトのリストのノードの allocation/deallocation なので一定して pool_allocator より fast_pool_allocator 優位だけれど，連続した中規模・大規模のチャンクの allocation/deallocation には，ドキュメントのとおり pool_allocator 使ってね．
しかし， allocator は排他処理だから threading の設定で状況が変わりうるとはいえ， GCC マルチスレッドでの Boost.Pool の結果はへぼすぎ． Boost.Pool の排他処理がへぼいのか，はたまた libstdc++ のデフォルト allocator がカリカリモフモフにチューンされているのか．
いずれにせよ，結局のところは低レベルな部分で簡単に allocator を取り替えられるようにしておいて，ある程度の規模に組みあがってテストもきっちり走るようになってから allocator を取り替えてみて，パフォーマンスを比較して総合的に判断してね，としか書きようがない．