Google Benchmark

C++ User Group Aachen
14.07.2016
Sven Johannsen

table of contents

  • installation / build
  • overview
  • examples
    • minimal example
    • string example
    • list vs. vector
    • unique ids
    • ...
  • api overview

Google Benchmark

  • GitHub:google/benchmark
  • a library to support the benchmarking of functions, similar to unit-tests.
  • mirco benchmark framework
  • arguments and ranges
  • templated benchmarks
  • reports in csv and json
  • fixtures
  • multithreading support
  • ...

mirco benchmarks

Google Benchmark

installation

Download with git
            
git clone https://github.com/google/benchmark.git
cd benchmark
            
          

Google Benchmark

build with cmake

Windows


mkdir build
cd build    
cmake ..
cmake --build . --config RelWithDebInfo
          

Linux


mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=RELEASE
make
          

Google Benchmark

select the generator

  • latest installed Visual Studio (e.g. VS2015), 32bit, default gcc on Linux
                    
    cmake ..
                
                  
  • Visual Studio 64 bit
                    
    cmake .. -G "Visual Studio 14 2015 Win64"
                    
                  
  • Clang, Linux
                    
    EXPORT CC=clang-3.6
    EXPORT CXX=clang++-3.6
    cmake ..
                    
                  

Google Benchmark

write your tests

  • add include path (.../include)
  • add library / library path (benchmark.lib)
  • add cpp with BENCHMARK_MAIN() macro
  • add your tests

or add your code to the benchmark library

Google Benchmark

extent the library with own tests

create new folder with tests: demo
            
mkdir demo
echo add_subdirectory(demo) >> CMakeLists.txt
            
          

example 1

minimal example

create 2 new files in the folder demo:
  • CMakeLists.txt
  • minimal.cpp

example 1

CMakeLists.txt

            
add_executable(01-minimal "minimal.cpp")
target_link_libraries(01-minimal benchmark)
            
          

example 1

minimal.cpp

            
#include "benchmark/benchmark.h"

void BM_minimal(benchmark::State& state) {
    while (state.KeepRunning()) {
        auto x = state.range_x(); // ->Arg(7);
        benchmark::DoNotOptimize(x);
    }
}
BENCHMARK(BM_minimal)->Arg(7);

BENCHMARK_MAIN()
            
          

example 1

results

            
Run on (1 X 2195 MHz CPU )
07/01/16 22:41:05
Benchmark           Time           CPU Iterations
-------------------------------------------------
BM_empty            4 ns          4 ns  172307692
            
          

example 1

importent warnings


***WARNING*** Library was built as DEBUG. Timings may be affected.
          
Debug configuration reports useless information.

***WARNING*** CPU scaling is enabled, the benchmark real time measurements may be
noisy and will incur extra overhead.
          
CPU runs with SpeedStep, ... enables: May generate inaccurate results. (Linux)

example 2a

default parameter for std::string


  void foo(const std::string& s = "");
  void bar(const std::string& s = std::string());
  void baz(const std::string& s = std::string(""));
          

example 2a

function implentation


  void foo(const std::string& s) {
    benchmark::DoNotOptimize(s);
  }
  void bar(const std::string& s) {
    benchmark::DoNotOptimize(s);
  }
  void baz(const std::string& s) {
    benchmark::DoNotOptimize(s);
  }
          

example 2a

benchmark test


void BM_StringLiteral(benchmark::State& state) {
    while (state.KeepRunning()) {
      foo();
    }
}
BENCHMARK(BM_StringLiteral);
          

example 2a

results


Run on (1 X 2195 MHz CPU )
07/01/16 23:12:07
Benchmark                     Time           CPU Iterations
-----------------------------------------------------------
BM_StringLiteral             11 ns         11 ns   56000000
BM_EmptyString                4 ns          4 ns  165925926
BM_StringWithLiteral         11 ns         11 ns   64000000
            

example 2b

small string optimization

Lenght of the small strings for the SSO (Small String Optimization)

void BM_SmallStringOpt(benchmark::State& state)
{
  const std::string sIn(state.range_x(), '!');  
  while (state.KeepRunning()) {
    std::string sOut = sIn;
    benchmark::DoNotOptimize(sOut);
  }
}
//BENCHMARK(BM_SmallStringOpt)->Range(5, 20);
BENCHMARK(BM_SmallStringOpt)->DenseRange(5, 20);
          

example 2b

results


Run on (1 X 2195 MHz CPU )
07/02/16 21:16:36
Benchmark                     Time           CPU Iterations
-----------------------------------------------------------
BM_SmallStringOpt/10         18 ns         18 ns   37333333
BM_SmallStringOpt/11         16 ns         16 ns   44800000
BM_SmallStringOpt/12         18 ns         18 ns   40727273
BM_SmallStringOpt/13         16 ns         16 ns   44800000
BM_SmallStringOpt/14         16 ns         16 ns   44800000
BM_SmallStringOpt/15         16 ns         16 ns   40727273
BM_SmallStringOpt/16         84 ns         84 ns    7466667
BM_SmallStringOpt/17         83 ns         82 ns    8960000
BM_SmallStringOpt/18         83 ns         82 ns    7466667
BM_SmallStringOpt/19         83 ns         84 ns    8960000
BM_SmallStringOpt/20         82 ns         82 ns    7466667
          

example 3

std::list vs std::vector

Can I replace std::list with std::vector?

Test 1: add elements at the beginning

example 3

std::list::push_front


class node {};

void BM_pushfront(benchmark::State& state)
{
  while (state.KeepRunning()) {
    std::list<node*> nodes;
    for (int i = 0; i < state.range_x(); ++i) {
      nodes.push_front(nullptr);
    }
    benchmark::DoNotOptimize(nodes);
  }
  state.SetItemsProcessed(state.iterations() * state.range_x() * sizeof(nullptr));
}            
          

example 3

std::vector::insert


class node {};

void BM_insert_front(benchmark::State& state)
{
  while (state.KeepRunning()) {
    std::vector<node*> nodes;
    for (int i = 0; i < state.range_x(); ++i) {
      nodes.insert(nodes.begin(), nullptr);
    }
    benchmark::DoNotOptimize(nodes);
  }
  state.SetItemsProcessed(state.iterations() * state.range_x() * sizeof(nullptr));
}
          

example 3

results


Benchmark                     Time           CPU Iterations
-----------------------------------------------------------
BM_pushfront/1              103 ns        104 ns    7478585   36.5748M items/s
BM_pushfront/8              464 ns        459 ns    1495717   66.4997M items/s
BM_pushfront/64            3686 ns       3671 ns     186965   66.4998M items/s
BM_pushfront/512          32499 ns      33585 ns      21367   58.1552M items/s
BM_pushfront/1024         59536 ns      59797 ns      11218    65.325M items/s
BM_insert_front/1            57 ns         56 ns   11217877   68.5778M items/s
BM_insert_front/8           432 ns        428 ns    1602554   71.2496M items/s
BM_insert_front/64         2757 ns       2753 ns     249286   88.6662M items/s
BM_insert_front/512      101314 ns     102216 ns       6410   19.1078M items/s
BM_insert_front/1024     371662 ns     373707 ns       1795   10.4527M items/s            
          

example 3

results

example 4

unique IDs

collect all unique IDs in a process

  • std::set store every value only once.
  • std::vector store all IDs, at the end call std::sort and std::unique to remove all duplicates.

example 4

create collection of unique elements


  std::set<int> s;
  for (const auto val : input) {
    s.insert(val);
  }
          

  std::vector<int> v;
  //v.reserve(input.size());
  for (const auto val : input) {
    v.push_back(val);
  }
  std::sort(v.begin(), v.end());
  v.erase(std::unique(v.begin(), v.end()), v.end());
          

example 4

set up tests with fixtures


struct UniqueFixture : benchmark::Fixture
{
  void SetUp(const benchmark::State& st)
  {
    size_ = st.range_x();
    for (int i = 0; i < size_; ++i)
      input_.push_back(rand());

    for (int i = 0; i < size_; i+= st.range_y())
      input_[i] = RAND_MAX / 2;
  }
  void TearDown(const benchmark::State&) {}

  int size_;
  std::vector<int> input_;
};
        

example 4

set up tests with fixtures


BENCHMARK_DEFINE_F(UniqueFixture, BM_set)(benchmark::State& state)
{
  ...
}    

BENCHMARK_REGISTER_F(UniqueFixture, BM_set)->RangePair(64, 1 << 10, 2, 20);      
        

example 5

threads


BENCHMARK(BM_empty)->ThreadRange(1,16);              
            
Run the benchmark "BM_empty" parallel with 1,2,4,8 and 16 threads.

API overview

summery

  • command line parameter
  • functions

API overview

command line parameter


void PrintUsageAndExit() {
  fprintf(stdout,
          "benchmark"                                      // default:
          " [--benchmark_list_tests={true|false}]\n"       // false
          " [--benchmark_filter=<regex>]\n"                // "."       
          " [--benchmark_min_time=<min_time>]\n"           // "0.5 sec"
          " [--benchmark_repetitions=<num_repetitions>]\n" // "1"
          " [--benchmark_format=<console|json|csv>]\n"     // "console"
          " [--color_print={true|false}]\n"                // true
          " [--v=<verbosity>]\n");                         // 0-3
  exit(0);
}            
          

API overview

class Benchmark


Benchmark* Arg(int x);
Benchmark* Unit(TimeUnit unit);
Benchmark* Range(int start, int limit);
Benchmark* DenseRange(int start, int limit);
Benchmark* ArgPair(int x, int y);
Benchmark* RangePair(int lo1, int hi1, int lo2, int hi2);
Benchmark* Apply(void (*func)(Benchmark* benchmark));
Benchmark* RangeMultiplier(int multiplier);
Benchmark* MinTime(double t);
Benchmark* Repetitions(int n);
Benchmark* UseRealTime();
Benchmark* UseManualTime();
Benchmark* Complexity(BigO complexity = benchmark::oAuto);
Benchmark* Complexity(BigOFunc* complexity);
Benchmark* Threads(int t);
Benchmark* ThreadRange(int min_threads, int max_threads);
Benchmark* ThreadPerCpu();
          

see benchmark_api.h

API overview

class state


void PauseTiming();
void ResumeTiming();
void SkipWithError(const char* msg);
void SetIterationTime(double seconds);
void SetBytesProcessed(size_t bytes);
void SetItemsProcessed(size_t items);
void SetLabel(const char* label);
int range_x() const;
int range_y() const;
size_t iterations();
          

see benchmark_api.h

questions???