jmuehlig
diff --git a/‎CHANGELOG.md‎
Lines changed: 6 additions & 1 deletion b/‎CHANGELOG.md‎
Lines changed: 6 additions & 1 deletion
diff --git a/‎CMakeLists.txt‎
Lines changed: 5 additions & 1 deletion b/‎CMakeLists.txt‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 1 addition & 0 deletions b/‎README.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/metrics.md‎
Lines changed: 50 additions & 43 deletions b/‎docs/metrics.md‎
Lines changed: 50 additions & 43 deletions
diff --git a/‎docs/recording-live-events.md‎
Lines changed: 7 additions & 11 deletions b/‎docs/recording-live-events.md‎
Lines changed: 7 additions & 11 deletions
diff --git a/‎examples/access_benchmark.h‎
Lines changed: 15 additions & 0 deletions b/‎examples/access_benchmark.h‎
Lines changed: 15 additions & 0 deletions
diff --git a/‎examples/sampling/branch.cpp‎
Lines changed: 3 additions & 5 deletions b/‎examples/sampling/branch.cpp‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎examples/sampling/context_switch.cpp‎
Lines changed: 3 additions & 5 deletions b/‎examples/sampling/context_switch.cpp‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎examples/sampling/counter.cpp‎
Lines changed: 3 additions & 5 deletions b/‎examples/sampling/counter.cpp‎
Lines changed: 3 additions & 5 deletions
diff --git a/‎examples/sampling/flame_graph.cpp‎
Lines changed: 3 additions & 5 deletions b/‎examples/sampling/flame_graph.cpp‎
Lines changed: 3 additions & 5 deletions
@@ -1,5 +1,11 @@
 # *perf-cpp*: Changelog
 
+## v0.12.2
+
+- **Metric Functions**: Metrics now support built-in functions such as `ratio(A, B)` and `sum(A, B, C, ...)`, enabling more expressive and reusable formulas (see the [documentation](docs/metrics.md#functions)).
+- **Optimized Compile-time Event Injection**: The generated runtime event registration class is now only created if it does not already exist, reducing unnecessary recompilation.
+- **Improved Live Event Accuracy**: Live event values now account for partial runtime durations via time scaling, improving accuracy when counters were not active for the full measurement window.
+
 ## v0.12.1
 This update extends event discovery to ARM platforms, improves hardware counter introspection, and enhances the flexibility of metric definitions.
 
@@ -29,7 +35,6 @@ The previous flat API is still available but deprecated and will be removed in `
 - **Explicit Latency Attributes**: Vendor-specific latency signals–*cache-access* on Intel and *cache-miss* on AMD–are now surfaced as distinct fields.
 - **Heterogeneous-core Support**: Sampling can target multiple PMU domains (e.g., *cpu_core* and *cpu_atom*) on hybrid Intel processors.
 
-
 ## v0.10.0
 * New feature: The *auxiliary event* is added automatically if required by the (Intel-) hardware (see the [documentation](docs/sampling.md#sapphire-rapids-and-beyond)).
 * New feature: The *Memory Access Analyzer* allows to describe complex data objects and maps sampled memory addresses in order to report latency and access information (see the [documentation](docs/analyzing-memory-access-patterns)).
 
@@ -39,13 +39,17 @@ set(PERF_CPP_SRC
     src/exception.cpp
     src/group.cpp
     src/hardware_info.cpp
-    src/metric_expression.cpp
     src/requested_event.cpp
     src/sampler.cpp
     src/sample_decoder.cpp
     src/util/table.cpp
     src/mmap_buffer.cpp
     src/symbol_resolver.cpp
+    src/metric/expression/token.cpp
+    src/metric/expression/tokenizer.cpp
+    src/metric/expression/parser.cpp
+    src/metric/expression/function.cpp
+    src/metric/expression/expression.cpp
     src/analyzer/memory_access.cpp
     src/analyzer/flame_graph_generator.cpp
 )
 
@@ -192,6 +192,7 @@ This is a non-exhaustive list of academic research papers and blog articles (fee
 - [Analyzing memory accesses with modern processors](https://dl.acm.org/doi/abs/10.1145/3399666.3399896) (2020)
 - [Precise Event Sampling on AMD Versus Intel: Quantitative and Qualitative Comparison](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=10068807&tag=1) (2023)
 - [Multi-level Memory-Centric Profiling on ARM Processors with ARM SPE](https://arxiv.org/html/2410.01514v1) (2024)
+- [Breaking the Cycle - A Short Overview of Memory-Access Sampling Differences on Modern x86 CPUs](https://dl.acm.org/doi/pdf/10.1145/3736227.3736241) (2025)
 
 ### Blog Posts
 - [C2C - False Sharing Detection in Linux Perf](https://joemario.github.io/blog/2016/09/01/c2c-blog/) (2016)
 
@@ -1,41 +1,39 @@
 # Metrics
-Performance metrics are critical for evaluating the efficiency of computer hardware using specific, user-defined calculations based on hardware events. 
-One key metric frequently used is the "Cycles per Instruction" (CPI). 
-This metric helps to measure how many CPU cycles are consumed for executiong an instruction, providing insight into the system's efficiency—the fewer the cycles needed per instruction, the more efficient the system.
+Performance metrics provide essential insights into hardware efficiency by combining multiple hardware events into meaningful calculations. 
+A commonly used metric is "Cycles per Instruction" (CPI), which measures how many CPU cycles are required to execute an instruction. 
+This metric reveals system efficiency–fewer cycles per instruction indicates better performance.
+
 
 > [!TIP]
-> Our examples include a working code-example: **[statistics/metric.cpp](../examples/statistics/metric.cpp)**.
+> Our examples include a working code example: **[statistics/metric.cpp](../examples/statistics/metric.cpp)**.
 > 
-> When [defining custom metrics](#creating-custom-metrics), you should take a look at the list of metrics in the [Likwid project](https://github.com/RRZE-HPC/likwid/tree/master/groups).
-
-> [!NOTE]
-> Metrics are not applicable for [live events](recording-live-events.md).
+> When [defining custom metrics](#creating-custom-metrics), consider reviewing the comprehensive metric definitions in the [Likwid project](https://github.com/RRZE-HPC/likwid/tree/master/groups).
 
 ---
 ## Table of Contents
 - [Built-in Metrics](#built-in-metrics)
-- [Utilizing Metrics](#utilizing-metrics)
+- [Using Metrics](#using-metrics)
 - [Defining Metrics](#creating-custom-metrics)
 ---
 
 ## Built-in Metrics
-*perf-cpp* comes pre-equipped with several built-in metrics which can be used analogously to events. 
-To employ these metrics, include their names in the `perf::EventCounter` instance as shown in the [Utilizing Metrics](#utilizing-metrics) section:
-
-| Metric name              | Description                                                           |
-|--------------------------|-----------------------------------------------------------------------|
-| `gigahertz`              | Processor speed during the measurement (`cycles/seconds*1e+09`).      |
-| `cycles-per-instruction` | Represents the number of cycles required per instruction.             |
-| `instructions-per-cycle` | Represents the number of instructions executed per cycle.             |
-| `cache-hit-ratio`        | Indicates the ratio of cache hits to total cache accesses.            |
-| `cache-miss-ratio`       | Indicates the ratio of cache misses to total cache accesses.          |
-| `dTLB-miss-ratio`        | The ratio of data TLB misses to data TLB accesses.                    |
-| `iTLB-miss-ratio`        | The ratio of instruction TLB misses to instruction TLB accesses.      |
-| `L1-data-miss-ratio`     | Reflects the ratio of L1 data cache misses to L1 data cache accesses. |
-| `branch-miss-ratio`      | Reflects the ratio of branch misses to executed branches.             |
-
-## Utilizing Metrics
-Metrics function similarly to hardware events in the  `perf::EventCounter`:
+*perf-cpp* includes several pre-defined metrics that you can use just like hardware events. 
+Simply include their names in your `perf::EventCounter` by treating them as standard events (e.g., `event_counter.add("gigahertz");`):
+
+| Metric name              | Description                                                          |
+|--------------------------|----------------------------------------------------------------------|
+| `gigahertz`              | Processor frequency during the measurement (`cycles/seconds*1e+09`). |
+| `cycles-per-instruction` | Number of cycles required per instruction.                           |
+| `instructions-per-cycle` | Number of instructions executed per cycle.                           |
+| `cache-hit-ratio`        | Ratio of cache hits to total cache accesses.                         |
+| `cache-miss-ratio`       | Ratio of cache misses to total cache accesses.                       |
+| `dTLB-miss-ratio`        | Ratio of data TLB misses to data TLB accesses.                       |
+| `iTLB-miss-ratio`        | Ratio of instruction TLB misses to instruction TLB accesses.         |
+| `L1-data-miss-ratio`     | Ratio of L1 data cache misses to L1 data cache accesses.             |
+| `branch-miss-ratio`      | Ratio of branch mispredictions to total executed branches.           |
+
+## Using Metrics
+Metrics work exactly like hardware events within the `perf::EventCounter`:
 
 ```cpp
 #include <perfcpp/event_counter.h>
@@ -56,21 +54,15 @@ const auto result = event_counter.result();
 const auto cycles_per_instruction = result.get("cycles-per-instruction");
 ```
 
-When metrics are used, *perf-cpp* internally counts the required hardware events (like cycles and instructions for CPI) and displays only the specified metrics and events.
+When you use metrics, *perf-cpp* automatically counts the necessary hardware events (such as *cycles* and *instructions* for the *cycles-per-instruction* metric) and presents only the requested metrics and events in the results.
 
 ## Creating Custom Metrics
-Metrics are often based on the performance counters supported by the underlying hardware.
-You can create custom metrics to tailor them to your specific hardware. 
+Custom metrics allow you to leverage the specific performance counters available on your hardware platform.
 
-> [!TIP]
-> The [Likwid project](https://github.com/RRZE-HPC/likwid/tree/master) gives an excellent and extensive list of available metrics for various CPUs. 
-> Take a look at their [groups/ directory](https://github.com/RRZE-HPC/likwid/tree/master/groups).
-
-There are two ways to define custom metrics.
-For both, you will need to create your own instance of the `perf::CounterDefinition` and pass it to the `perf::EventCounter` or `perf::Sampler`.
+*perf-cpp* offers two approaches for defining custom metrics: *formula-based* definitions using text expressions, or implementing custom classes that inherit from the `perf::Metric` interface.
 
 ### Using Formulas
-The first option is to express a metric as a calculation of several hardware and time events, for example:
+The simplest approach is to define metrics using mathematical expressions that combine hardware events and timing data:
 
 ```cpp
 auto counter_definition = perf::CounterDefinition{};
@@ -80,17 +72,32 @@ counter_definition.add("stalls-by-mem-loads",
 auto event_counter = perf::EventCounter{ counter_definition };
 ```
 
-The formular can use the following **operators**: `+`, `-`, `*`, and `/`.
+This example uses Intel SkylakeX architecture events and is adapted from [Likwid](https://github.com/RRZE-HPC/likwid/blob/master/groups/skylakeX/CYCLE_STALLS.txt).
 
-In addition, **scientific numbers** (e.g., `1E5`, `1e-5`) can be used. 
+#### Operators
+Formulas support the following **mathematical operators**: `+`, `-`, `*`, and `/`.
+You can also use **scientific notation** (e.g., `1E5`, `1e-5`) for constants.
 
-> [!NOTE]
-> In formulas, event names that contain *operators* (like `-` in `L1D-misses`) need to be **escaped** using single quotes, e.g., `'L1D-misses'`.
+#### Functions
+Formulas provide built-in functions for common calculations:
 
-The example depends on events from the Intel SkylakeX architecture and is taken from [Likwid](https://github.com/RRZE-HPC/likwid/blob/master/groups/skylakeX/CYCLE_STALLS.txt).
+| Function                       | Description                                                                                                                                            |
+|--------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------|
+| `ratio(a,b)` or `d_ratio(a,b)` | Calculates the ratio between to operands, e.g., `ratio('branch-misses', 'branches')` calculates the *branch-miss ratio*                                |
+| `sum(a,b,...)`                 | Adds together two or more operands, e.g., `sum('mem_load_retired.l1_hit', 'mem_load_retired.l2_hit', 'mem_load_retired.l3_hit')` totals all cache hits |
+
+Functions can be combined within metric expressions:
+
+```cpp
+counter_definition.add("cache-miss-ratio", 
+                        "ratio( sum('mem_load_retired.l1_miss', 'mem_load_retired.l2_miss', 'mem_load_retired.l3_miss'), sum('mem_load_retired.l1_hit', 'mem_load_retired.l2_hit', 'mem_load_retired.l3_hit') )");
+```
+
+> [!NOTE]
+> Event names containing **mathematical operators** (such as the `-` in `L1D-misses`) must be **enclosed in single quotes**, e.g., `'L1D-misses'`.
 
 ### Implementing Metrics using the Interface
-The second option is to define metrics by implementing the `perf::Metric` interface, for example:
+For more complex calculations, you can create custom metric classes by implementing the `perf::Metric` interface:
 
 ```cpp
 #include <perfcpp/metric.h>
@@ -126,7 +133,7 @@ public:
 };
 ````
 
-After implementing custom metrics, incorporate them into the `perf::CounterDefinition` to utilize them effectively:
+After implementing your custom metric, register it with the `perf::CounterDefinition`:
 
 ```cpp
 auto counter_definition = perf::CounterDefinition{};
 
@@ -27,10 +27,7 @@ auto event_counter = perf::EventCounter{ counter_definition };
 
 try {
     /// Events for live monitoring.
-    event_counter.add_live({"cache-misses", "cache-references"});
-    
-    /// Traditional events for post-processing analysis.
-    event_counter.add({"instructions", "cycles", "branches", "branch-misses", "cache-misses", "cache-references"});
+    event_counter.add_live({"cache-misses", "cache-references", "branches"});
 } catch (std::runtime_error& e) {
     std::cerr << e.what() << std::endl;
 }
@@ -40,6 +37,11 @@ try {
 > The `perf::CounterDefinition` instance is used to store event configurations (e.g., names) and passed as a reference.
 > Consequently, the instance needs to be alive while using the `EventCounter`.
 
+> [!IMPORTANT]
+> We experienced that not mixing live with "traditional" events leads to more consistent results.
+
+> [!NOTE]
+> Live events can only capture hardware events but not metrics.
 
 ## Initializing the Hardware Counters *(optional)*
 Optionally, preparing the hardware counters ahead of time to exclude configuration time from your measurements, though this is also handled automatically at the start if skipped:
@@ -114,17 +116,11 @@ for (auto i = 0U; i < runs; ++i) {
 ```
 
 ## Finalizing and Retrieving Results
-Upon completion, stop the counters and fetch final results for non-live events:
+Upon completion, stop the counters:
 
 ```cpp
 /// Stop the counter after processing.
 event_counter.stop();
-
-/// Calculate the result.
-const auto result = event_counter.result();
-
-//// Or print the results as table.
-std::cout << result.to_string() << std::endl;
 ```
 
 For further information, refer to the [recording basics documentation](recording.md) and the [code example](../examples/statistics/live_events.cpp).
@@ -69,6 +69,21 @@ class AccessBenchmark
   [[nodiscard]] const std::vector<std::uint64_t>& indices() const noexcept { return _indices; }
   [[nodiscard]] const std::vector<cache_line>& data_to_read() const noexcept { return _data_to_read; }
 
+  /**
+   * Makes the compiler think that the result is used – consequently, the optimizer cannot optimize the value away.
+   *
+   * @param result Value that should not be optimized away.
+   */
+  template<typename T>
+  inline void pretend_to_use(T& result) const noexcept
+  {
+#ifdef __clang__
+    asm volatile("" : "+r,m"(result) : : "memory");
+#else
+    asm volatile("" : "+m,r"(value) : : "memory");
+#endif
+  }
+
 private:
   /// Indices, defining the order in which the memory chunk is accessed.
   std::vector<std::uint64_t> _indices;
 
@@ -52,11 +52,9 @@ main()
   for (auto index = 0U; index < benchmark.size(); ++index) {
     value += branchy_function(benchmark[index]);
   }
-  asm volatile(""
-               : "+r,m"(value)
-               :
-               : "memory"); /// We do not want the compiler to optimize away
-                            /// this unused value.
+
+  /// We do not want the compiler to optimize away this (otherwise) unused value (and consequently the loop above).
+  benchmark.pretend_to_use(value);
 
   /// Stop sampling.
   sampler.stop();
 
@@ -36,11 +36,9 @@ main()
   for (auto index = 0U; index < benchmark.size(); ++index) {
     value += benchmark[index].value;
   }
-  asm volatile(""
-               : "+r,m"(value)
-               :
-               : "memory"); /// We do not want the compiler to optimize away
-                            /// this unused value.
+
+  /// We do not want the compiler to optimize away this (otherwise) unused value (and consequently the loop above).
+  benchmark.pretend_to_use(value);
 
   /// Stop sampling.
   sampler.stop();
 
@@ -42,11 +42,9 @@ main()
   for (auto index = 0U; index < benchmark.size(); ++index) {
     value += benchmark[index].value;
   }
-  asm volatile(""
-               : "+r,m"(value)
-               :
-               : "memory"); /// We do not want the compiler to optimize away
-                            /// this unused value.
+
+  /// We do not want the compiler to optimize away this (otherwise) unused value (and consequently the loop above).
+  benchmark.pretend_to_use(value);
 
   /// Stop sampling.
   sampler.stop();
 
@@ -36,11 +36,9 @@ main()
   for (auto index = 0U; index < benchmark.size(); ++index) {
     value += benchmark[index].value;
   }
-  asm volatile(""
-               : "+r,m"(value)
-               :
-               : "memory"); /// We do not want the compiler to optimize away
-                            /// this unused value.
+
+  /// We do not want the compiler to optimize away this (otherwise) unused value (and consequently the loop above).
+  benchmark.pretend_to_use(value);
 
   /// Stop sampling.
   sampler.stop();