Benchmarking C++

February 15, 2025 5 min read
how-to

Table Of Contents

  1. Generating the inputs
  2. Measuring the time
  3. Plotting the results

In the Data Structures & Algorithms module I am taking, we had to measure the time each algorithm takes and we had to plot the timings. I am writing the steps I followed, so that I can copy-paste the sample code the next time I need it.

I am explaining an example here, the actual code required for different kinds of algorithms might differ.

# Generating the inputs

In order to benchmark one or more algorithms, a large sample input is required. The type and characteristics of the sample input depends on the type of functions to be benchmarked.

For example, to benchmark different sorting algorithms, the sample input must:

The above characteristics must be met to reduce the error.

In my case, I generated 2460 test cases using ChatGPT. LLMs do a good enough job for that. The inputs must be saved in inputs.txt.

# Measuring the time

I believe it’s better to measure the timings in the language we are writing the functions to be benchmarked. Otherwise, there will OS-level overhead which might mess up the timings a lot.

I have to write the algorithms in C++, so I am using chrono library which tracks time.

The timings can just be printed to the terminal, but that’s not much useful as we can’t compare the results. We have to write the timings to a file (benchmark_results.csv in my case) and use that to plot the graph later.

If you don’t know, csv stands for comma-separated values. It’s a spreadsheet-like format for data. It can be opened in a text file.

Here is the code I am using:

#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>
#include <chrono>

using namespace std;
using namespace std::chrono;

// TODO: The functions to be benchmarked must be pasted here

// Function to parse a line of space-separated numbers
vector<int> parseInput(const string &line) {
	vector<int> numbers;
	stringstream ss(line);
	int num;
	while (ss >> num)
	{
		numbers.push_back(num);
	}
	return numbers;
}

int main() {
	ifstream inputFile("inputs.txt");
	ofstream outputFile("benchmark_results.csv");

	if (!inputFile.is_open())
	{
		cerr << "Error: Could not open inputs.txt" << endl;
		return 1;
	}
	if (!outputFile.is_open())
	{
		cerr << "Error: Could not create benchmark_results.csv" << endl;
		return 1;
	}

	// Write CSV header
	outputFile << "Input Size,Input,Program,Execution Time (seconds)\n";

	string line;

	auto start = high_resolution_clock::now();
	auto end = start;
	duration<double> elapsed;
	double elapsed_time;

	while (getline(inputFile, line))
	{
		vector<int> numbers = parseInput(line);
		int size = numbers.size();

		// Benchmark algorithm #1
		start = high_resolution_clock::now();
		// call the algorithm #1 here
		end = high_resolution_clock::now();
		elapsed = end - start;
		elapsed_time = elapsed.count();

		outputFile << size << "," << line << ",algorithm_1," << elapsed_time << "\n";

		// Benchmark algorithm #2
		start = high_resolution_clock::now();
		// call the algorithm #2 here
		end = high_resolution_clock::now();
		elapsed = end - start;
		elapsed_time = elapsed.count();

		outputFile << size << "," << line << ",algorithm_2," << elapsed_time << "\n";
		
		// TODO: Duplicate the above code blocks as much as you want.

		cout << "Benchmarked input size " << size << endl;
	}

	inputFile.close();
	outputFile.close();

	cout << "Benchmarking complete. Results saved in benchmark_results.csv" << endl;
	return 0;
}

The above code:

The benchmark_results.csv file will be written out when the above is ran. A text editor or a spreadsheet application can be used to inspect the file.

The code can be run using:

g++ -o benchmark benchmark.cpp; ./benchmark

# Plotting the results

Now the benchmark results have to be plotted. The plot can be generated using LibreOffice Calc or a similar spreadsheet application. But I believe they require a specific format and that’s not so convenient for me. Instead, I am using matplotlib in Python to plot the graph.

Here is the code:

import matplotlib.pyplot as plt
import csv
from collections import defaultdict

# Store results for each program: input sizes mapped to execution times
program_data = defaultdict(lambda: defaultdict(list))

# Read the CSV file
with open('benchmark_results.csv', mode='r') as file:
    reader = csv.DictReader(file)

    for row in reader:
        input_size = int(row['Input Size'])
        program = row['Program']
        time = float(row['Execution Time (seconds)'])
        
        # Append the time for this input size and program
        program_data[program][input_size].append(time)

# Plot the data
plt.figure(figsize=(12, 7))

for program, size_time_map in program_data.items():
    # Sort the data by input size
    sorted_sizes = sorted(size_time_map.keys())
    avg_times = [sum(size_time_map[size]) / len(size_time_map[size]) for size in sorted_sizes]

    plt.plot(sorted_sizes, avg_times, marker=None, label=program)

# Customize the plot
plt.title('Benchmark Results for Sorting Algorithms')
plt.xlabel('Input Size')
plt.ylabel('Execution Time (seconds)')
plt.legend()
plt.grid(True)

# Show the plot
plt.tight_layout()
plt.show()

One important thing regarding the above code, is that I am taking the average time for a specific input size.