Skip to content

Add the ability to release and reacquire performance counters in the perfmon API#728

Open
dassschaf wants to merge 2 commits intoRRZE-HPC:masterfrom
dassschaf:master
Open

Add the ability to release and reacquire performance counters in the perfmon API#728
dassschaf wants to merge 2 commits intoRRZE-HPC:masterfrom
dassschaf:master

Conversation

@dassschaf
Copy link
Copy Markdown

Hi everyone,

this PR adds the ability to release and (re-)acquire (perfmon_acquireCounters, perfmon_releaseCounters) the performance counters in the API, extracted from the existing functions perfmon_init and perfmon_finalize.
This addition has several benefits in the context of performance monitoring:

  • A tool to monitor the performance in the background can acquire and release the counters for use so that another application does not have to wait for the finalization upon creation of a lockfile or any other mechanism to pause the monitoring tool for a job requiring access for the counters so the force flag does not need to be used.
  • Acquire/Release is much faster than fully initializing/finalizing (around 5-10ms vs. around 3s on the cluster here in Aachen; see attached code example below). The repeated initialization for monitoring the performance of the cluster in the background causes measurable overhead (such as in issue [BUG] Extremely long wall clock time when using the marker API in OpenMP code #634) and can be avoided with merely acquiring and releasing the counters.

I refrained from running make format on the changes as it changed virtually every line in the two changed files.

To try the attached code example:

$ g++ -std=c++20 -g -O3 -llikwid -o likwid_example.exe likwid_example.cpp
$ ./likwid_minimal.exe
Starting measurements...
....................
20 times initializing/finalizing access took 54471 ms, avg.: 2723 ms 

....................
20 times acquiring/releasing access took 112 ms, avg.: 5 ms 

@TomTheBear
Copy link
Copy Markdown
Member

Thanks for the PR. I'm not 100% sure I understand what you want to solve.

For likwid-perfctr runs, the initialization and finalization happens exactly once. The long initialization time in the linked PR #634 was caused by the lookup of Uncore units on recent Intel systems. This was fixed already.
In monitoring environments they could partly persist, yes. The biggest issue there is that if LIKWID looses control of the counters to another application, it cannot assume that the register configuration is still correct and needs to re-initialize itself.

I see in your code that you call perfmon_finalizeCountersThread in release. This zeros the configuration for the given HW thread (all in your code), thus after release, you need acquire and setup the counters again to be able to use them. Setup is a heavy operations. In your example code, you are not setting up nor reading the counters.

@dassschaf
Copy link
Copy Markdown
Author

The idea is to provide a mechanism that allows a monitoring application to quickly acquire and release access to the counters for periodic measurements without continously having access to the counters. If it had continuous access, a user application requiring access to the counters would have to wait for the next time where the monitoring application checked if another application requires access to the counters or would have to force its access (e. g. having to run likwid-perfctr with the force flag).

Here's an example to illustrate that use case:

#include <cstdlib>
#include <unistd.h>
#include <iostream>
#include <string>
#include <vector>
#include <filesystem>

#include <likwid.h>

// This lockfile would be created whenever an application starts
// that requests access to the performance counters
#define LOCKFILE "./lockfile"

static std::vector<int> cpuIds;

int getNumCpus() {
    return static_cast<int>(cpuIds.size());
}

const std::vector<int> getCpuIds() {
    return cpuIds;
}

static int initialize() {
    // Initialize topology information.
    if (topology_init() != 0) {
        return EXIT_FAILURE;
    }

    // Enumerate all CPUs.
    auto topo = get_cpuTopology();
    cpuIds.resize(topo->activeHWThreads);
    for (unsigned int i = 0; i < cpuIds.size(); i++) {
        cpuIds[i] = static_cast<int>(topo->threadPool[i].apicId);
    }

    // Initialize performance monitoring.
    if (perfmon_init(getNumCpus(), cpuIds.data()) != 0) {

        // finalize in case of error
        perfmon_finalize();
        return EXIT_FAILURE;
    }
    
    return EXIT_SUCCESS;
}

static void finalize() {
    topology_finalize();
    perfmon_finalize();
}


int main(int argc, char * argv[]) {

    bool is_initialized = false;
    bool has_counters = false; 
    
    // initialize LIKWID and release the counters as they're not used immediately
    if (initialize() != EXIT_SUCCESS) 
        return -1;
    perfmon_releaseCounters();

    is_initialized = true;
    has_counters = false;

    // periodic monitoring loop
    // suppose the application gets a newline input every time it shall measure
    std::string line;
    while (std::getline(std::cin, line)) {

        // check if the lockfile exists; if so de-initialize and skip this measuring period
        if (std::filesystem::exists(LOCKFILE)) {
            if (is_initialized) {
                finalize();
                is_initialized = false;
                has_counters = false;
            }
                
            std::cout << "Lockfile, not \"measuring\"..." << std::endl;
            continue;

        } else {
            // re-initialize if necessary ...
            if (!is_initialized) {
                if (initialize() != EXIT_SUCCESS) 
                    return -1;

                is_initialized = true; 
                has_counters = true;
            }

            // ... or just re-acquire the counters
            if (!has_counters) {
                if (perfmon_acquireCounters(getNumCpus(), cpuIds.data()) < 0)
                    return -1;
                
                has_counters = true;
            }

            // actual measurements would happen here
            std::cout << "\"Measuring\" ... " << std::endl;

            // release access to the counters again
            if (has_counters) {
                perfmon_releaseCounters();
                has_counters = false;
            }
        }
    }

    finalize();
    return 0;
}

@TomTheBear
Copy link
Copy Markdown
Member

In a separate meeting, we identified that there is a second lockfile used in this PR, not the one used by LIKWID internally. The original LIKWID lock file disallows all accesses to the registers as soon as it is toggled. The separate lock file allows LIKWID to finish its operations like cleaning up the measurement configuration in the registers so that someone else can use the registers without LIKWID_FORCE (overwriting configurations).

But having acquire/release calls might be good nevertheless as in repeated measurements (with lockfile toggles), we do not need to destroy and recreate all fundamental data structures.

@chriswasser
Copy link
Copy Markdown
Contributor

Thanks for your evaluation 👍 It sounds like, you see no large road block for integrating this feature and extending the public API with the two proposed perfmon_acquireCounters and perfmon_releaseCounters functions. Let us know how you would like to continue with this PR and whether you need any additional information or tests from our side 🤓 Greetings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants