Skip to content

feat(client): rate-limit and reuse RPC for connectionless instance APIs#35175

Open
Pengrongkun wants to merge 4 commits intomainfrom
fix/main/6487654850
Open

feat(client): rate-limit and reuse RPC for connectionless instance APIs#35175
Pengrongkun wants to merge 4 commits intomainfrom
fix/main/6487654850

Conversation

@Pengrongkun
Copy link
Copy Markdown
Contributor

  • Share a single process-wide RPC client (lazy rpcOpen, closed in taos_cleanup) instead of opening/closing per call, to reduce overhead and churn.
  • Enforce a fixed sliding window: at most 100 calls per second (register + list combined); return TSDB_CODE_TSC_INSTANCE_API_RATE_LIMIT when exceeded.
  • Reset the window when the clock moves backwards (NTP) to avoid stuck state.
  • Always set terrno from taos_list_instances return value so successful calls clear a previous rate-limit errno.

Public API remains connectionless (no TAOS*); management endpoints use firstEp/secondEp from client config.

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

- Share a single process-wide RPC client (lazy rpcOpen, closed in taos_cleanup)
  instead of opening/closing per call, to reduce overhead and churn.
- Enforce a fixed sliding window: at most 100 calls per second (register +
  list combined); return TSDB_CODE_TSC_INSTANCE_API_RATE_LIMIT when exceeded.
- Reset the window when the clock moves backwards (NTP) to avoid stuck state.
- Always set terrno from taos_list_instances return value so successful calls
  clear a previous rate-limit errno.

Public API remains connectionless (no TAOS*); management endpoints use
firstEp/secondEp from client config.
Copilot AI review requested due to automatic review settings April 20, 2026 03:06
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a client-side rate limiter (100 calls/sec) and a process-wide RPC singleton for instance-related APIs to protect the mnode and optimize resource usage. Review feedback identifies a performance bottleneck where a global mutex is held during blocking network operations, a potential race condition in the cleanup logic where the mutex is destroyed before the ready flag is reset, and inconsistent handling of RPC buffer deallocation between the registration and listing functions.

Comment thread source/client/src/clientMain.c
Comment thread source/client/src/clientMain.c Outdated
Comment thread source/client/src/clientMain.c Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes connectionless client “instance” management APIs by reusing a process-wide RPC client and adding client-side rate limiting to reduce RPC open/close churn and protect mnode from bursts.

Changes:

  • Add a new client error code/message for instance API rate limiting.
  • Implement a global singleton RPC client for taos_register_instance / taos_list_instances, cleaned up via taos_cleanup.
  • Add a fixed-window (100 calls / 1s) rate limiter shared by register+list, plus a unit/integration test.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
source/util/src/terror.c Adds the error string for the new rate-limit error code.
include/util/taoserror.h Defines TSDB_CODE_TSC_INSTANCE_API_RATE_LIMIT.
source/client/src/clientMain.c Introduces global instance-RPC reuse + shared client-side rate limiting; wires cleanup into taos_cleanup.
source/client/test/instanceTest.cpp Adds a test covering the new rate limiting behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread source/client/src/clientMain.c
Comment thread source/client/test/instanceTest.cpp
Comment thread source/client/src/clientMain.c Outdated
Comment thread source/client/src/clientMain.c
@JinqingKuang
Copy link
Copy Markdown
Contributor

代码审查报告

发现 4 个问题(1 高、1 中、2 低):

🔴 高优先级

[1] instanceRpcAcquire 持锁跨越整个 rpcSendRecv,并发调用被完全串行化

instanceRpcAcquire() 内加锁后,gInstRpcMutex 直到 instanceRpcRelease()(函数末尾,rpcSendRecv 完成之后)才释放。这意味着所有并发调用 taos_register_instance / taos_list_instances 的线程都要阻塞整个 RPC 往返时延(RTT)。原有代码每次调用各开一个 RPC 客户端(可并发),新代码引入共享句柄的同时引入了完全串行化。

修复建议:若 rpcSendRecv 对同一句柄是线程安全的(主流 RPC 框架通常支持),则只在 gInstRpc == NULL 的初始化阶段持锁,初始化完成后立即释放,之后无锁并发调用 rpcSendRecv。若不是线程安全的,需要在代码或文档中明确说明,并评估是否应保留 per-call 客户端方案。


🟡 中优先级

[2] 限流器初始化失败会永久屏蔽所有实例 API 调用,且错误码误导

instRlMutexInit 通过 taosThreadOnce 只执行一次。若 taosThreadMutexInit 失败,gInstRlMutexInited 保持 0,之后所有 instanceApiRateLimitTry 调用均返回 TSDB_CODE_OUT_OF_MEMORY,导致 taos_register_instance / taos_list_instances 永久不可用。TSDB_CODE_OUT_OF_MEMORY 无法准确描述互斥锁初始化失败这一场景。

修复建议:限流器初始化失败时,记录日志后放行(fail-open)——限流器故障不应阻断合法 API 调用;若必须 fail-closed,改用 TSDB_CODE_TSC_INTERNAL_ERROR 并在日志中说明原因。


🔵 低优先级

[3] instanceTest.cpp 末尾缺少换行符

文件以 #pragma GCC diagnostic pop 结束,但没有 trailing newline,会触发 GCC -Wnewline-eof 警告。在末行后添加一个换行即可修复。

[4] taos_list_instances 双变量错误追踪降低可读性

taos_register_instance 全程使用 code,而 taos_list_instances 同时使用 coderetCode,两者风格不一致,错误路径分析困难。建议移除 retCode,统一使用 code

Comment thread source/client/src/clientMain.c Outdated
Comment thread source/client/src/clientMain.c Outdated
Comment thread source/client/test/instanceTest.cpp Outdated
Comment thread source/client/src/clientMain.c Outdated
Copilot AI review requested due to automatic review settings April 21, 2026 06:20
@Pengrongkun Pengrongkun force-pushed the fix/main/6487654850 branch from 6953d4f to 912fcc4 Compare April 21, 2026 06:20
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds client-side safeguards and performance improvements for the connectionless instance management APIs by reusing a single process-wide RPC client and enforcing a per-process rate limit, while extending error reporting and tests accordingly.

Changes:

  • Reuse a singleton RPC client for taos_register_instance / taos_list_instances, with cleanup in taos_cleanup.
  • Add a fixed-window rate limiter (100 calls / 1s) for instance register/list calls and introduce a dedicated error code/message.
  • Add a unit/integration test covering the new rate-limit behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
source/client/src/clientMain.c Implements global instance RPC singleton + rate limiter; hooks singleton cleanup into taos_cleanup; updates register/list to use the shared RPC.
include/util/taoserror.h Adds TSDB_CODE_TSC_INSTANCE_API_RATE_LIMIT error code.
source/util/src/terror.c Adds human-readable error string for the new rate-limit error code.
source/client/test/instanceTest.cpp Adds a new test case for instance API rate limiting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3619 to +3621
instanceRpcRelease();
terrno = code;
return code;
Comment on lines +3301 to +3305
static TdThreadMutex gInstRpcMutex;
static TdThreadCond gInstRpcCond;
static volatile int32_t gInstRpcMutexReady = 0;
static volatile int32_t gInstRpcCondReady = 0;
static void *gInstRpc = NULL;
Comment on lines +323 to +338
taosMsleep(1000);

int32_t code = TSDB_CODE_SUCCESS;
int64_t t0 = taosGetTimestampMs();
for (int i = 0; i < 100; i++) {
code = taos_register_instance("id-1", "taosadapter", "desc:test_instance", 100);
ASSERT_EQ(code, TSDB_CODE_SUCCESS);
}
int64_t t1 = taosGetTimestampMs();

code = taos_register_instance("id-1", "taosadapter", "desc:test_instance", 100);
if (t1 - t0 >= 1000) {
ASSERT_EQ(code, TSDB_CODE_SUCCESS);
} else {
ASSERT_EQ(code, TSDB_CODE_TSC_INSTANCE_API_RATE_LIMIT);
}
@Pengrongkun Pengrongkun requested a review from zitsen as a code owner April 22, 2026 05:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants