Releases: jmuehlig/perf-cpp
Releases · jmuehlig/perf-cpp
v0.13.1
- Configurable Sampling Triggers: Typed triggers now fully support hardware-specific configuration:
- Intel:
perf::MemoryLoadssupports configurablemin_latencyfor PEBS load latency filtering.perf::MemoryStoresandperf::MemoryLoadsAuxprovide type-safe alternatives to string-basedmem-storesandmem-loads-auxtriggers. String-based triggers remain available and documented. - AMD:
perf::IbsOpandperf::IbsFetchnow build their counter configuration directly from hardware capabilities, fully supportingis_uop,is_l3_miss_only, andis_randflags. String-based trigger variants (ibs_op_uops,ibs_op_l3missonly,ibs_op_uops_l3missonly,ibs_fetch_l3missonly) still work but are no longer documented. Use typed triggers for full configurability.
- Intel:
- Fixed-Function PMC Scheduling: On Intel processors, the built-in events
instructions,cycles,cpu-cycles, andref-cyclesare now automatically scheduled to dedicated pinned groups. This prevents the kernel scheduler from placing fixed-function PMC events alongside generic events, which would distort multiplexing ratios. Fixed groups do not count against the generic PMC limit. - Fixed PMC Detection: Added
HardwareInfo::physical_fixed_performance_counters_per_logical_core(), which reads the number of fixed-function performance counters on Intel. - Built-in Event: Added
ref-cycles(reference cycles at a fixed frequency, unaffected by turbo boost or power-saving states) to the built-in hardware event list. - Bugfixes:
- Fixed out-of-bounds access when reading live counter values.
- Fixed wrong parsing order for throttle events in the sample decoder.
- Fixed sample decoder always reporting a data access source even when none was available (e.g., when sampling stores on Intel hardware).
- Fixed various decoding bugs and hardened the sample decoder against malformed records.
- Adding events to an already-opened
EventCounternow raises an exception instead of silently failing.
v0.13.0
- Header Restructuring: Headers have been reorganized into
counter/,sample/,metric/,analyzer/, andutil/subdirectories and renamed from.hto.hpp. The previous.hheaders remain as forwarding includes with deprecation notices and will be removed in v1.0. - Breaking:
EventCounter::add()andstart()(includingSampler::start()and all multi-thread/core variants) now returnvoidinstead ofbool. Errors are communicated via exceptions; the return values were unused. - Compile Flag for AUX Buffer Support: Added
PERFCPP_NO_SAMPLE_AUXcompile flag to disable auxiliary buffer sampling on systems with Linux kernels older than 5.5 that lackPERF_SAMPLE_AUXsupport. Thanks to @rconnorlawson. - Perf File Export: Fixed bugs in perf format when materializing samples into file that can be read via
perf [mem] report. - NMI Watchdog Detection: Hardware counter detection now accounts for the NMI watchdog permanently consuming one hw-PMU counter, fixing incorrect counter counts on systems with the watchdog enabled.
- RAPL Power Metrics: Added built-in
watts-pkg,watts-cores, andwatts-rammetrics for measuring power consumption via RAPL energy counters (see the documentation). - Conan Package: Added Conan 2.x package recipe for easier integration.
- Config Setter Naming: Standardized
Configsetter naming — setters no longer use theis_prefix (e.g.,pinned(bool)instead ofis_pinned(bool)). The oldis_pinned(bool)andis_debug(bool)setters are deprecated and will be removed in v1.0. - SampleResult CSV Export: Added
SampleResult::to_csv()returning astd::string, complementing the existing file-based overload. - Per-Element Results: Added
result_of_thread(thread_id),result_of_process(process_id), andresult_of_core(core_id)to query individual results fromMultiThreadEventCounter,MultiProcessEventCounter, andMultiCoreEventCounter. Process and core variants returnstd::optional<CounterResult>since the ID may not be present. - Documentation: Rewrote and restructured all documentation pages for consistency, conciseness, and correctness. Documentation is now hosted at jmuehlig.github.io/perf-cpp.
v0.12.5
v0.12.4
- Bugfix: The library crashed when events loaded from an external CSV file contained empty spaces (see #8). Thanks to @Liteom.
- Bugfix: The library could not compile for specific Linux kernels not providing
PERF_MEM_LVLNUM_UNCandPERF_MEM_SNOOPX_PEER(see #7). Thanks to @Raphalex46 for pointing out. - Perf Data Export: Samples can now be written as perf data files using
Sampler::to_perf_file(), enabling analysis with standard perf ecosystem tools like perf report (see the documentation). Note that this feature is experimental.
v0.12.3
This update simplifies the handling of counter definitions by introducing a default instance.
- Default Counter Definitions: Supplying a user-defined
perf::CounterDefinitionto eachperf::EventCounterorperf::Sampleris no longer required. If none is provided, a default instance is used automatically. Custom definitions now extend the default set of events instead of duplicating them.
v0.12.2
- Metric Functions: Metrics now support built-in functions such as
ratio(A, B)andsum(A, B, C, ...), enabling more expressive and reusable formulas (see the documentation). - Optimized Compile-time Event Injection: The generated runtime event registration class is now only created if it does not already exist, reducing unnecessary recompilation.
- Improved Live Event Accuracy: Live event values now account for partial runtime durations via time scaling, improving accuracy when counters were not active for the full measurement window.
v0.12.1
- Automatic Event Discovery on ARM: Hardware event types are now automatically detected on ARM architectures when initializing a
perf::CounterDefinitioninstance. - Hardware Counter Introspection: The number of available physical performance counters per logical core, along with the number of events each counter can multiplex, is now determined automatically when creating a
perf::EventCounter. - Recursive and Scientific Metrics: Metric expressions can now reference other metrics recursively. Support for scientific notation (e.g.,
1e5) in formula-based metrics has also been added.
v0.12.0
This release expands symbolic analysis capabilities, introduces FlameGraph generation, and improves hardware event management through both runtime and compile-time support.
- Symbol Resolution: Instruction pointers captured during sampling can now be resolved to function names using
perf::SymbolResolver(see the documentation). - FlameGraph Export: Sampling data can be converted into formats compatible with visualization tools such as Brendan Gregg's FlameGraph, Speedscope, and flamegraph.com using
perf::analyzer::FlameGraphGenerator(see the documentation). - Built-in Event Definitions: A set of
x86-specific hardware events is now bundled in events/x86 and can be loaded at runtime usingperf::CounterDefinition. This serves as an alternative to themake perf-listtarget. - Compile-time Event Injection: Processor-specific event definitions can now be embedded directly at build time by configuring CMake with
-DGEN_PROCESSOR_EVENTS=1. These are immediately available viaperf::CounterDefinition(see the documentation). - Automatic Event Discovery: Additional event types–including RAPL energy counters and AMD IO MMU events–are now automatically detected during the creation of a
perf::CounterDefinitioninstance (issue #6).
v0.11.1
v0.11.0
This version rolls out a redesigned sampling API.
Recorded data are now grouped into dedicated sub-structures (such as Metadata, InstructionExecution, and DataAccess) inside perf::Sample (see the sampling documentation).
The previous flat API is still available but deprecated and will be removed in v0.12.
- New Sampling Interface: Work with clearly separated sample sections, exposing additional AMD IBS fields that are not surfaced by the
perf_event_openrecords. - Explicit Latency Attributes: Vendor-specific latency signals–cache-access on Intel and cache-miss on AMD–are now surfaced as distinct fields.
- Heterogeneous-core Support: Sampling can target multiple PMU domains (e.g., cpu_core and cpu_atom) on hybrid Intel processors.