You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+6-1Lines changed: 6 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,11 @@
1
1
# *perf-cpp*: Changelog
2
2
3
+
## v0.12.2
4
+
5
+
-**Metric Functions**: Metrics now support built-in functions such as `ratio(A, B)` and `sum(A, B, C, ...)`, enabling more expressive and reusable formulas (see the [documentation](docs/metrics.md#functions)).
6
+
-**Optimized Compile-time Event Injection**: The generated runtime event registration class is now only created if it does not already exist, reducing unnecessary recompilation.
7
+
-**Improved Live Event Accuracy**: Live event values now account for partial runtime durations via time scaling, improving accuracy when counters were not active for the full measurement window.
8
+
3
9
## v0.12.1
4
10
This update extends event discovery to ARM platforms, improves hardware counter introspection, and enhances the flexibility of metric definitions.
5
11
@@ -29,7 +35,6 @@ The previous flat API is still available but deprecated and will be removed in `
29
35
-**Explicit Latency Attributes**: Vendor-specific latency signals–*cache-access* on Intel and *cache-miss* on AMD–are now surfaced as distinct fields.
30
36
-**Heterogeneous-core Support**: Sampling can target multiple PMU domains (e.g., *cpu_core* and *cpu_atom*) on hybrid Intel processors.
31
37
32
-
33
38
## v0.10.0
34
39
* New feature: The *auxiliary event* is added automatically if required by the (Intel-) hardware (see the [documentation](docs/sampling.md#sapphire-rapids-and-beyond)).
35
40
* New feature: The *Memory Access Analyzer* allows to describe complex data objects and maps sampled memory addresses in order to report latency and access information (see the [documentation](docs/analyzing-memory-access-patterns)).
Copy file name to clipboardExpand all lines: README.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -192,6 +192,7 @@ This is a non-exhaustive list of academic research papers and blog articles (fee
192
192
-[Analyzing memory accesses with modern processors](https://dl.acm.org/doi/abs/10.1145/3399666.3399896) (2020)
193
193
-[Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10068807&tag=1) (2023)
194
194
-[Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE](https://arxiv.org/html/2410.01514v1) (2024)
195
+
-[Breaking the Cycle - A Short Overview of Memory-Access Sampling Differences on Modern x86 CPUs](https://dl.acm.org/doi/pdf/10.1145/3736227.3736241) (2025)
195
196
196
197
### Blog Posts
197
198
-[C2C - False Sharing Detection in Linux Perf](https://joemario.github.io/blog/2016/09/01/c2c-blog/) (2016)
Copy file name to clipboardExpand all lines: docs/metrics.md
+50-43Lines changed: 50 additions & 43 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,41 +1,39 @@
1
1
# Metrics
2
-
Performance metrics are critical for evaluating the efficiency of computer hardware using specific, user-defined calculations based on hardware events.
3
-
One key metric frequently used is the "Cycles per Instruction" (CPI).
4
-
This metric helps to measure how many CPU cycles are consumed for executiong an instruction, providing insight into the system's efficiency—the fewer the cycles needed per instruction, the more efficient the system.
2
+
Performance metrics provide essential insights into hardware efficiency by combining multiple hardware events into meaningful calculations.
3
+
A commonly used metric is "Cycles per Instruction" (CPI), which measures how many CPU cycles are required to execute an instruction.
4
+
This metric reveals system efficiency–fewer cycles per instruction indicates better performance.
5
+
5
6
6
7
> [!TIP]
7
-
> Our examples include a working code-example: **[statistics/metric.cpp](../examples/statistics/metric.cpp)**.
8
+
> Our examples include a working codeexample: **[statistics/metric.cpp](../examples/statistics/metric.cpp)**.
8
9
>
9
-
> When [defining custom metrics](#creating-custom-metrics), you should take a look at the list of metrics in the [Likwid project](https://github.com/RRZE-HPC/likwid/tree/master/groups).
10
-
11
-
> [!NOTE]
12
-
> Metrics are not applicable for [live events](recording-live-events.md).
10
+
> When [defining custom metrics](#creating-custom-metrics), consider reviewing the comprehensive metric definitions in the [Likwid project](https://github.com/RRZE-HPC/likwid/tree/master/groups).
13
11
14
12
---
15
13
## Table of Contents
16
14
-[Built-in Metrics](#built-in-metrics)
17
-
-[Utilizing Metrics](#utilizing-metrics)
15
+
-[Using Metrics](#using-metrics)
18
16
-[Defining Metrics](#creating-custom-metrics)
19
17
---
20
18
21
19
## Built-in Metrics
22
-
*perf-cpp*comes pre-equipped with several built-in metrics which can be used analogously to events.
23
-
To employ these metrics, include their names in the`perf::EventCounter`instance as shown in the [Utilizing Metrics](#utilizing-metrics) section:
|`gigahertz`| Processor frequency during the measurement (`cycles/seconds*1e+09`). |
26
+
|`cycles-per-instruction`|Number of cycles required per instruction.|
27
+
|`instructions-per-cycle`|Number of instructions executed per cycle.|
28
+
|`cache-hit-ratio`|Ratio of cache hits to total cache accesses.|
29
+
|`cache-miss-ratio`|Ratio of cache misses to total cache accesses.|
30
+
|`dTLB-miss-ratio`|Ratio of data TLB misses to data TLB accesses.|
31
+
|`iTLB-miss-ratio`|Ratio of instruction TLB misses to instruction TLB accesses.|
32
+
|`L1-data-miss-ratio`|Ratio of L1 data cache misses to L1 data cache accesses.|
33
+
|`branch-miss-ratio`|Ratio of branch mispredictions to total executed branches. |
34
+
35
+
## Using Metrics
36
+
Metrics work exactly like hardware events within the `perf::EventCounter`:
39
37
40
38
```cpp
41
39
#include<perfcpp/event_counter.h>
@@ -56,21 +54,15 @@ const auto result = event_counter.result();
56
54
const auto cycles_per_instruction = result.get("cycles-per-instruction");
57
55
```
58
56
59
-
When metrics are used, *perf-cpp* internally counts the required hardware events (like cycles and instructions for CPI) and displays only the specified metrics and events.
57
+
When you use metrics, *perf-cpp* automatically counts the necessary hardware events (such as *cycles* and *instructions* for the *cycles-per-instruction* metric) and presents only the requested metrics and events in the results.
60
58
61
59
## Creating Custom Metrics
62
-
Metrics are often based on the performance counters supported by the underlying hardware.
63
-
You can create custom metrics to tailor them to your specific hardware.
60
+
Custom metrics allow you to leverage the specific performance counters available on your hardware platform.
64
61
65
-
> [!TIP]
66
-
> The [Likwid project](https://github.com/RRZE-HPC/likwid/tree/master) gives an excellent and extensive list of available metrics for various CPUs.
67
-
> Take a look at their [groups/ directory](https://github.com/RRZE-HPC/likwid/tree/master/groups).
68
-
69
-
There are two ways to define custom metrics.
70
-
For both, you will need to create your own instance of the `perf::CounterDefinition` and pass it to the `perf::EventCounter` or `perf::Sampler`.
62
+
*perf-cpp* offers two approaches for defining custom metrics: *formula-based* definitions using text expressions, or implementing custom classes that inherit from the `perf::Metric` interface.
71
63
72
64
### Using Formulas
73
-
The first option is to express a metric as a calculation of several hardware and time events, for example:
65
+
The simplest approach is to define metrics using mathematical expressions that combine hardware events and timing data:
74
66
75
67
```cpp
76
68
auto counter_definition = perf::CounterDefinition{};
auto event_counter = perf::EventCounter{ counter_definition };
81
73
```
82
74
83
-
The formular can use the following **operators**: `+`, `-`, `*`, and `/`.
75
+
This example uses Intel SkylakeX architecture events and is adapted from [Likwid](https://github.com/RRZE-HPC/likwid/blob/master/groups/skylakeX/CYCLE_STALLS.txt).
84
76
85
-
In addition, **scientific numbers** (e.g., `1E5`, `1e-5`) can be used.
77
+
#### Operators
78
+
Formulas support the following **mathematical operators**: `+`, `-`, `*`, and `/`.
79
+
You can also use **scientific notation** (e.g., `1E5`, `1e-5`) for constants.
86
80
87
-
> [!NOTE]
88
-
> In formulas, event names that contain *operators* (like `-`in `L1D-misses`) need to be **escaped** using single quotes, e.g., `'L1D-misses'`.
81
+
#### Functions
82
+
Formulas provide built-in functions for common calculations:
89
83
90
-
The example depends on events from the Intel SkylakeX architecture and is taken from [Likwid](https://github.com/RRZE-HPC/likwid/blob/master/groups/skylakeX/CYCLE_STALLS.txt).
|`ratio(a,b)` or `d_ratio(a,b)`| Calculates the ratio between to operands, e.g., `ratio('branch-misses', 'branches')` calculates the *branch-miss ratio*|
87
+
|`sum(a,b,...)`| Adds together two or more operands, e.g., `sum('mem_load_retired.l1_hit', 'mem_load_retired.l2_hit', 'mem_load_retired.l3_hit')` totals all cache hits |
88
+
89
+
Functions can be combined within metric expressions:
> The `perf::CounterDefinition` instance is used to store event configurations (e.g., names) and passed as a reference.
41
38
> Consequently, the instance needs to be alive while using the `EventCounter`.
42
39
40
+
> [!IMPORTANT]
41
+
> We experienced that not mixing live with "traditional" events leads to more consistent results.
42
+
43
+
> [!NOTE]
44
+
> Live events can only capture hardware events but not metrics.
43
45
44
46
## Initializing the Hardware Counters *(optional)*
45
47
Optionally, preparing the hardware counters ahead of time to exclude configuration time from your measurements, though this is also handled automatically at the start if skipped:
@@ -114,17 +116,11 @@ for (auto i = 0U; i < runs; ++i) {
114
116
```
115
117
116
118
## Finalizing and Retrieving Results
117
-
Upon completion, stop the counters and fetch final results for non-live events:
119
+
Upon completion, stop the counters:
118
120
119
121
```cpp
120
122
/// Stop the counter after processing.
121
123
event_counter.stop();
122
-
123
-
/// Calculate the result.
124
-
constauto result = event_counter.result();
125
-
126
-
//// Or print the results as table.
127
-
std::cout << result.to_string() << std::endl;
128
124
```
129
125
130
126
For further information, refer to the [recording basics documentation](recording.md) and the [code example](../examples/statistics/live_events.cpp).
0 commit comments