Skip to content

feat: support federated query#35184

Draft
dapan1121 wants to merge 37 commits into3.0from
feat/6660036900
Draft

feat: support federated query#35184
dapan1121 wants to merge 37 commits into3.0from
feat/6660036900

Conversation

@dapan1121
Copy link
Copy Markdown
Contributor

Description

Issue(s)

  • Close/close/Fix/fix/Resolve/resolve: Issue Link

Checklist

Please check the items in the checklist if applicable.

  • Is the user manual updated?
  • Are the test cases passed and automated?
  • Is there no significant decrease in test coverage?

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements the foundational framework for federated queries in TDengine, enabling integration with external data sources such as MySQL, PostgreSQL, and InfluxDB. The changes encompass new DDL statements for external source management, catalog-level metadata caching, a dedicated FederatedScan executor operator, and a translation layer to convert physical plans into remote SQL dialects. Feedback identifies several high-severity security vulnerabilities, including potential heap and stack buffer overflows due to unsafe snprintf usage in SQL generation and error formatting, as well as the exposure of external source credentials in plaintext during RPC transmission. Furthermore, the review notes functional issues such as the omission of VARCHAR handling in SQL literal generation, incorrect timestamp formatting for remote databases, a potential memory leak in the catalog metadata cleanup path, and an excessively large default VARCHAR length that could impact internal memory management.

Comment on lines +258 to +316
int32_t capacity = 4096;
char* buf = (char*)taosMemoryMalloc(capacity);
if (!buf) return terrno;

int32_t pos = 0;

// SELECT clause
pos += snprintf(buf + pos, capacity - pos, "SELECT ");
bool first = true;
if (pScanCols) {
SNode* pCol = NULL;
FOREACH(pCol, pScanCols) {
if (nodeType(pCol) == QUERY_NODE_COLUMN) {
if (!first) pos += snprintf(buf + pos, capacity - pos, ", ");
pos += appendQuotedId(buf + pos, capacity - pos, ((SColumnNode*)pCol)->colName, dialect);
first = false;
}
}
}
if (first) {
// empty scan columns → SELECT *
pos += snprintf(buf + pos, capacity - pos, "*");
}

// FROM clause
pos += snprintf(buf + pos, capacity - pos, " FROM ");
pos += appendTablePath(buf + pos, capacity - pos, pExtTable, dialect);

// WHERE clause (best-effort push-down)
if (pConditions) {
char condBuf[2048] = {0};
int32_t condLen = 0;
int32_t code = nodesExprToExtSQL(pConditions, dialect, condBuf, sizeof(condBuf), &condLen);
if (TSDB_CODE_SUCCESS == code && condLen > 0) {
pos += snprintf(buf + pos, capacity - pos, " WHERE %s", condBuf);
}
// On error or unsupported expression: skip WHERE (local Filter operator will handle it)
}

*ppSQL = buf;
return TSDB_CODE_SUCCESS;
}

// ---------------------------------------------------------------------------
// nodesRemotePlanToSQL — public API
// ---------------------------------------------------------------------------
int32_t nodesRemotePlanToSQL(const SPhysiNode* pRemotePlan, const SNodeList* pScanCols,
const SExtTableNode* pExtTable, const SNode* pConditions,
EExtSQLDialect dialect, char** ppSQL) {
if (!pExtTable || !ppSQL) return TSDB_CODE_INVALID_PARA;

if (pRemotePlan != NULL) {
// Phase 2: full plan tree → SQL (not yet implemented)
return TSDB_CODE_EXT_SYNTAX_UNSUPPORTED;
}

// Phase 1: fallback path — build SELECT … FROM … WHERE …
return buildFallbackSQL(pScanCols, pExtTable, pConditions, dialect, ppSQL);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The buildFallbackSQL function and its helpers (like appendQuotedId and appendOperatorExpr) use an unsafe snprintf pattern: pos += snprintf(buf + pos, capacity - pos, ...). Because snprintf returns the number of characters that would have been written, if the output is truncated, pos will exceed capacity. Subsequent calls will then perform out-of-bounds writes because capacity - pos will be interpreted as a very large size_t. Given the fixed 4096-byte buffer, a query with many columns or a complex WHERE clause could trigger a heap buffer overflow. Consider using a dynamic string builder or strictly checking snprintf return values.

Comment on lines +60 to +86
int32_t bufLen = (int32_t)sizeof(pInfo->extErrMsg);
int32_t offset = 0;

offset = snprintf(pInfo->extErrMsg, bufLen, "%s [source=%s, type=%s",
tdErrStr, pErr->sourceName, typeName);

if ((EExtSourceType)pErr->sourceType == EXT_SOURCE_MYSQL && pErr->remoteCode != 0) {
offset += snprintf(pInfo->extErrMsg + offset, bufLen - offset,
", remote_code=%d", pErr->remoteCode);
}
if ((EExtSourceType)pErr->sourceType == EXT_SOURCE_POSTGRESQL &&
pErr->remoteSqlstate[0] != '\0') {
offset += snprintf(pInfo->extErrMsg + offset, bufLen - offset,
", remote_sqlstate=%s", pErr->remoteSqlstate);
}
if ((EExtSourceType)pErr->sourceType == EXT_SOURCE_INFLUXDB && pErr->httpStatus != 0) {
offset += snprintf(pInfo->extErrMsg + offset, bufLen - offset,
", http_status=%d", pErr->httpStatus);
}
if (pErr->remoteMessage[0] != '\0') {
offset += snprintf(pInfo->extErrMsg + offset, bufLen - offset,
", remote_message=%s", pErr->remoteMessage);
}
if (offset < bufLen - 1) {
pInfo->extErrMsg[offset] = ']';
pInfo->extErrMsg[offset + 1] = '\0';
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The fedScanFormatError function uses an unsafe snprintf pattern where offset is incremented by the return value of snprintf without checking for truncation. If any part of the error message causes truncation, offset will exceed bufLen, leading to out-of-bounds writes in subsequent snprintf calls and the final assignment at line 84. This can result in a stack buffer overflow.

code = tlvEncodeCStr(pEncoder, PHY_FEDERATED_SCAN_CODE_SRC_USER, pNode->srcUser);
}
if (TSDB_CODE_SUCCESS == code) {
code = tlvEncodeCStr(pEncoder, PHY_FEDERATED_SCAN_CODE_SRC_PASSWORD, pNode->srcPassword);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The srcPassword field is encoded as a plaintext string in the TLV message. While the comment in plannodes.h suggests that this field is encrypted during serialization, no encryption is performed here. This exposes external source credentials in plaintext during RPC transmission between the client and the server (Dnode/Qnode). Sensitive fields should be encrypted using the system's internal encryption mechanism before being added to the TLV encoder.

Comment on lines +134 to +137
case TSDB_DATA_TYPE_BINARY: // TSDB_DATA_TYPE_VARCHAR has the same integer value
case TSDB_DATA_TYPE_NCHAR:
return appendEscapedString(buf, bufLen, pVal->datum.p, dialect);
case TSDB_DATA_TYPE_TIMESTAMP:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TSDB_DATA_TYPE_VARCHAR type is missing from this switch statement. Since the type mapping logic in extTypeMap.c explicitly produces TSDB_DATA_TYPE_VARCHAR for external string columns, its absence here will cause appendValueLiteral to return 0, resulting in missing values in the generated remote SQL and subsequent syntax errors on the external database. Additionally, the comment incorrectly states that VARCHAR has the same integer value as BINARY; in TDengine 3.x, these are distinct types.

    case TSDB_DATA_TYPE_BINARY:
    case TSDB_DATA_TYPE_VARCHAR:
    case TSDB_DATA_TYPE_NCHAR:
      return appendEscapedString(buf, bufLen, pVal->datum.p, dialect);

Comment on lines +140 to +141
default:
return 0; // unsupported; skip silently
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a discrepancy between the code and the comment regarding timestamp formatting. The code prints a raw integer (milliseconds or microseconds), but the comment claims it formats it as an ISO 8601 string. Most external relational databases (MySQL, PostgreSQL) will not correctly interpret a raw TDengine bigint as a timestamp literal without explicit casting or string formatting (e.g., 'YYYY-MM-DD HH:MM:SS.mmm'). This is likely to cause execution failures on the remote side.

}

// Default VARCHAR/NCHAR column length used when no explicit width is given.
#define EXT_DEFAULT_VARCHAR_LEN 65535
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

EXT_DEFAULT_VARCHAR_LEN is set to 65535, which exceeds the standard TDengine VARCHAR column limit (typically 16383). While external sources may support larger strings, mapping them to such a large default size in TDengine can cause allocation failures or internal errors in the executor and data block management layers, which are optimized for smaller fixed-size or bounded variable-size columns. Consider aligning this default with TSDB_MAX_VARCHAR_LEN.

// free them here. pExtTableMetaRsp's SExtTableMeta* objects are owned by SExtTableNode.pExtMeta (freed by
// nodesDestroyNode); free only the array backing here.
taosArrayDestroyEx(pData->pExtSourceInfo, ctgFreeExtSourceInfoPRes);
taosArrayDestroy(pData->pExtTableMetaRsp);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In ctgDestroySMetaData, pData->pExtTableMetaRsp is destroyed using taosArrayDestroy, which only frees the array's internal buffer but not the SExtTableMeta objects pointed to by its elements. Although these are intended to be owned by SExtTableNode, if a query fails during parsing or planning before the transfer occurs, these objects will be leaked. It is safer to use taosArrayDestroyEx with a free function that safely releases any remaining metadata objects.

- Move non-STMT helper nodes (EXTERNAL_TABLE, EXT_OPTION,
  EXT_ALTER_CLAUSE) from STMT section into the non-STMT section
  (syntax/clause nodes, values 68-70), consistent with the
  'non-STMT first, STMT after' ordering convention.
- Fix physical position of ext source DDL STMT block (CREATE /
  ALTER / DROP / REFRESH / SHOW / DESCRIBE, values 230-235):
  was placed before SHOW_CREATE_VIEW_STMT = 181 which is
  physically wrong; now placed after DROP_ENCRYPT_ALGR_STMT (229).
- Fix QUERY_NODE_PHYSICAL_PLAN_FEDERATED_SCAN: was incorrectly
  occupying the UNUSED_15 slot (1152); restore UNUSED_15 and
  append FEDERATED_SCAN at the tail (after ANALYSIS_FUNC, 1168).
SColRef.refType was removed in a prior review fix; the encode/decode
functions were not updated in sync. Fix:
- Replace refType-based branch with always-encoding refSourceName
  (empty string = internal ref, non-empty = external ref)
- Remove the tDecodeIsEnd backward-compat guard (not needed per
  review: dev branch has no prior-version compatibility requirement)
Both fields duplicate info already carried by pExtTableNode (SExtTableNode
has sourceName and schemaName). The fields were only ever written, never
read by any downstream code. Remove them and let callers access the info
directly via ((SExtTableNode*)pScan->pExtTableNode)->sourceName/schemaName.
…enterprise-only init

- Add 4 new global config variables and cfg items (server-only scope):
    tsFederatedQueryMaxPoolSizePerSource (default 8)
    tsFederatedQueryIdleConnTtlSec       (default 600)
    tsFederatedQueryThreadPoolSize       (default 0)
    tsFederatedQueryProbeTimeoutMs       (default 5000)
- Reuse existing variables for the remaining two fields:
    conn_timeout_ms  <- tsFederatedQueryConnectTimeoutMs
    query_timeout_ms <- tsFederatedQueryQueryTimeoutMs
- Wrap extConnectorModuleInit() blocks in #ifdef TD_ENTERPRISE in both
  clientEnv.c and dmEnv.c so the call is skipped in community edition
- Add #include tglobal.h to dmEnv.c to access the new variables
…handlers

Issues fixed:
1. Remove 4 cfg items not in DS §9.2 design spec:
   federatedQueryMaxPoolSizePerSource / federatedQueryIdleConnTtlSec /
   federatedQueryThreadPoolSize / federatedQueryProbeTimeoutMs
   (These 4 SExtConnectorModuleCfg fields remain as internal-default global
   vars but are not exposed as configuration parameters.)

2. Add dynamic update handlers for the 5 DS §9.2 cfg items (all marked
   '支持' for dynamic modification in the spec):
   - taosCfgDynamicOptionsForServer options array: all 5 items
     (SERVER scope items + BOTH scope items, server function accepts BOTH)
   - taosCfgDynamicOptionsForClient options array: BOTH-scope items only
     (federatedQueryEnable + federatedQueryMetaCacheTtlSec)
   Without these entries, CFG_DYN_BOTH was declared but ALTER DNODE would
   not actually update the global variables at runtime.
- SExtTableNode: replace hard-coded lengths (257, TSDB_USER_LEN,
  TSDB_PASSWORD_LEN, TSDB_DB_NAME_LEN, 4096) with correct
  TSDB_EXT_SOURCE_{HOST,USER,PASSWORD,DATABASE,SCHEMA,OPTIONS}_LEN macros
- extSourcesSchema: fix all VARCHAR byte sizes (source_name 64→TSDB_EXT_SOURCE_NAME_LEN,
  user 24→TSDB_EXT_SOURCE_USER_LEN 129, password 24→8 display-only,
  database/schema→TSDB_EXT_SOURCE_{DATABASE,SCHEMA}_LEN,
  options→TSDB_EXT_SOURCE_OPTIONS_LEN)
tmsg.h:
- add globalVer field to SExtSourceHbRsp for global ext-source catalog
  version tracking; client heartbeat carries this version so mnode can
  push full ext-source refresh when any create/alter/drop changes it

cmake/external.cmake:
- fix PostgreSQL (libpq) build on Linux/macOS: generate errcodes.h and
  catalog headers before building src/interfaces/libpq to avoid missing
  header errors during compilation
- install public PostgreSQL headers alongside libpq so libpq-fe.h
  dependencies (postgres_ext.h etc.) are available to extconnector

source/libs/extconnector/CMakeLists.txt:
- stage ext connector shared libraries (libmariadb.so.3, libpq.so.5,
  libarrow*.so.1600) into build/lib/ after extconnector build so that
  make_install.sh and CPack can find them at install/package time
Add install_ext_connector_libs() to install.sh, which installs
MariaDB Connector/C (libmariadb.so.3), PostgreSQL libpq (libpq.so.5),
and Apache Arrow Flight SQL (libarrow*.so.1600) from driver/ into
/usr/local/lib at tar.gz install time, matching make_install.sh behavior.
Call it from both updateProduct() and installProduct().
…inux

Without running distprep for src/backend/nodes and src/backend/utils,
the following generated headers are absent when libpgport is compiled:
  - nodes/nodetags.h  (from src/backend/nodes/gen_node_support.pl)
  - utils/fmgroids.h / fmgrprotos.h  (from src/backend/utils/Gen_fmgrtab.pl)

This causes a fatal compilation error:
  nodes/nodes.h:30:10: fatal error: nodes/nodetags.h: No such file or directory

Fix: insert two additional BUILD_COMMAND steps before building libpq:
  make -C src/backend/nodes distprep generated-header-symlinks
  make -C src/backend/utils distprep generated-header-symlinks
… ext_curl consumers

When `feat: federated query init code` (b8c7e3a) added
`nodesRemotePlanToSQL` to `libnodes.a` and had `libextconnector.a`
call it, a link-order bug was introduced:

1. `taosudf`: `libextconnector.a` appears twice in the transitive link
   command due to duplicate dependency paths. GNU ld extracts object
   files only once per library scan; by the time the second
   `libextconnector.a` is processed, `libnodes.a` has already been
   scanned and `nodesRemotePlanToSQL` is not retained. Fixed by adding
   `-Wl,--undefined=nodesRemotePlanToSQL` to `taosudf` (enterprise
   Linux only), which forces the linker to extract that TU from
   `libnodes.a` on first encounter.

2. `taosmqtt` / `topic-producer`: `libcurl.a` (ext_curl) is built
   against the internal OpenSSL (ext_ssl). When `5569f0000b1` moved
   `ext_curl` to all platforms and added `DEP_ext_curl(common)`, it
   added the Windows-specific extra libs (`crypt32`, etc.) but omitted
   the Linux/macOS equivalents (`libssl.a`, `libcrypto.a`). Fixed by
   appending `ext_ssl_libs` in `DEP_ext_curl_LIB` for non-Windows.
…replacement

- catalogInt.h: add CTG_OP_REPLACE_EXT_SOURCE_CACHE operation and
  SCtgReplaceExtSourceCacheMsg; declare ctgOpReplaceExtSourceCache and
  ctgReplaceExtSourceCacheEnqueue
- ctgCache.c: implement ctgOpReplaceExtSourceCache which atomically swaps
  the entire pExtSourceHash and updates extSrcGlobalVer on the write
  thread; add ctgBuildNewExtSourceHash (called on HB thread to amortise
  allocation); add ctgExtSourceHashFreeFp to bind SExtSourceCacheEntry
  lifetime to hash-node ref-count; add comments on thread-safety of
  ctgFreeExtDbCache / ctgFreeExtSourceCacheEntry
- ctgAsync.c: implement ctgReplaceExtSourceCacheEnqueue — deep-copies
  pSources and enqueues a single REPLACE op so the HB thread does not
  block on catalog write serialisation
- catalog.c: catalogUpdateAllExtSources now calls
  ctgReplaceExtSourceCacheEnqueue instead of per-entry upsert
- ctgRemote.c: propagate globalVer from SExtSourceHbRsp into catalog
- clientHb.c: add error-log coverage for deserialization failure,
  catalogUpdateAllExtSources failure, alloc failure, and hash-put failure;
  expand error path for catalogGetExtSrcGlobalVer
- clientImpl.c: check and warn (non-fatal) on catalogRemoveExtSource
  failures in handleExtSourceError and REFRESH pre-clear path
…ck design

- add EExtEntryState enum (FREE / IDLE / IN_USE) for atomic CAS-based state
- replace inUse/drainOnReturn bools with volatile int32_t state + generation
- add idleNext/freeNext separate link fields so an entry can safely be in
  idleList while eviction modifies freeNext without interference
- add SExtSlab: fixed-capacity slab with flexible entry array, enabling
  append-only slab chain; entries have stable addresses until destroyPool
- rewrite SExtConnPool: remove mutex + entries array; add atomic idleHead/
  freeHead Treiber-stack heads, slabHead slab chain, drainGeneration, and
  per-pool rwlock (config-change guard only)
query.h:
- split TSDB_CODE_EXT_RESOURCE_EXHAUSTED out of
  NEED_CLIENT_RM_EXT_SOURCE_ERROR into a new dedicated macro
  NEED_CLIENT_RETRY_EXT_POOL_ERROR; add it to NEED_CLIENT_HANDLE_EXT_ERROR
  so pool-exhaustion is handled separately from source-not-found errors

clientInt.h:
- add extPoolRetry counter to SRequestObj for pool-exhaustion retry tracking
- export handleExtSourceError for use in clientMain.c

clientImpl.c:
- implement async pool-exhaustion retry via a dedicated timer
  (tscExtPoolTimer, lazily initialised via pthread_once)
- extPoolRetryTimerCb acquires the request by ref ID (not pointer),
  resets pRequest->retry, then calls restartAsyncQuery
- handleExtSourceError handles NEED_CLIENT_RETRY_EXT_POOL_ERROR by
  scheduling a 1-second delayed retry (max EXT_POOL_RETRY_MAX_TIMES=5)

clientMain.c:
- call handleExtSourceError early in handleQueryAnslyseRes and
  doAsyncQueryFromParse so EXT errors are intercepted before the generic
  NEED_CLIENT_HANDLE_ERROR path

ctgAsync.c / federatedscanoperator.c:
- add comments clarifying that TSDB_CODE_EXT_RESOURCE_EXHAUSTED propagates
  to the user callback and retries must use ref ID asynchronously

extConnectorInt.h:
- clarify idleCount comment: approximate (soft cap), not exact
…ture

test_fq_09_stability.py:
- new test class TestFq09Stability covering four stability areas:
  (1) continuous query mix (single-source / cross-source JOIN / vtable)
  (2) fault injection (unreachable source, slow query, throttle, jitter)
  (3) cache stability (meta/capability cache repeated expiry + REFRESH)
  (4) connection-pool stability (high-frequency burst queries)
- each area runs as a short representative cycle in CI; iteration counts
  controllable via FQ_STAB_* environment variables
- pool-exhaustion test uses fq_pool_test MySQL user (MAX_USER_CONNECTIONS=1)
  to trigger TSDB_CODE_EXT_RESOURCE_EXHAUSTED and verify client retry recovery

ensure_ext_env.sh:
- create pool-exhaustion test user fq_pool_test (configurable via
  FQ_POOL_TEST_USER / FQ_POOL_TEST_PASS / FQ_POOL_TEST_MAX_CONN)
  with MAX_USER_CONNECTIONS limited to 1 in _mysql_reset_env()

federated_query_common.py:
- add TSDB_CODE_EXT_RESOURCE_EXHAUSTED error code constant
- add ExtSrcEnv.POOL_TEST_USER / POOL_TEST_PASS / POOL_TEST_MAX_CONN
  attributes (read from FQ_POOL_TEST_* env vars)
- add ExtSrcEnv.mysql_open_connection() helper that returns a raw pymysql
  connection for pool-exhaustion tests to hold open
…crash

parTranslater.c:
- move ext-source cache pre-clear (catalogRemoveExtSource) into
  translateRefreshExtSource so it happens at parse/translate time on the
  catalog pointer carried in STranslateContext; log failures as non-fatal
  warnings via parserWarn

clientImpl.c:
- remove the duplicate REFRESH pre-clear block from launchQueryImpl RPC path
  (now handled by the parser; doing it twice was redundant and fragile)

federatedscanoperator.c:
- guard nodesRemotePlanToSQL behind a pRemotePlan != NULL check; Mode-2
  leaf nodes carry no pRemotePlan and the connector generates SQL internally
- change SQL generation failure from non-fatal to a proper error return:
  close the connection handle, log the error, and propagate via
  QUERY_CHECK_CODE so the operator signals failure rather than silently
  running with an empty remoteSql
…ederated scan

nodesRemotePlanToSQL (plannodes.h / nodesRemotePlanToSQL.c):
- change second parameter from EExtSQLDialect dialect to int8_t sourceType;
  dialect mapping (MySQL/PG/InfluxQL) is now done internally, so callers
  no longer need to depend on EExtSQLDialect or extConnectorInt.h

federatedscanoperator.c / executorInt.h:
- remove fedScanGetDialect() (mapping now inside nodesRemotePlanToSQL)
- remove remoteSql[4096] cached field from SFederatedScanOperatorInfo
  (EXPLAIN no longer reads from the cached buffer; explain.c regenerates)
- log remote SQL via qDebug on each open; free the heap string immediately

tcommon.h:
- move SFederatedScanExplainInfo (fetchedRows, fetchBlockCount, elapsedTimeUs)
  from federatedscanoperator.c to tcommon.h so explain.c can reference it
  without pulling in executor internals

explain.c:
- add EXPLAIN output for QUERY_NODE_PHYSICAL_PLAN_FEDERATED_SCAN:
  Remote SQL line (regenerated from pRemotePlan, shown without ANALYZE)
  Runtime stats line (rows / blocks / elapsed ms, only with ANALYZE)
…Q guards

nodesRemotePlanToSQL.c:
- introduce SDynSQL — a growable heap buffer (dynSQLInit / dynSQLEnsure /
  dynSQLAppendChar / dynSQLAppendLen / dynSQLAppendStr / dynSQLAppendf /
  dynSQLDetach) so generated remote SQL has no fixed-size limit
- rewrite all SQL assembly helpers to use SDynSQL instead of char[]/snprintf
  with fixed-length buffers; eliminates potential truncation for complex
  queries with long table paths, column lists or WHERE clauses

parAstParser.c:
- in community edition builds (#ifndef TD_ENTERPRISE) reject 3/4-segment
  table paths immediately in collectMetaKeyFromRealTable with
  TSDB_CODE_EXT_FEDERATED_DISABLED to avoid a misleading downstream error

parTranslater.c:
- add explicit error messages in translateRealTable for two cases:
  (1) enterprise edition but tsFederatedQueryEnable=false: report
      TSDB_CODE_EXT_FEDERATED_DISABLED with actionable message
  (2) community edition (#else branch): report the same code with a
      message indicating enterprise is required
…eneration

plannodes.h / nodesCloneFuncs.c / nodesUtilFuncs.c:
- add pRemoteLogicPlan field to SScanLogicNode: holds the chain of
  pushed-down Sort/Project logic nodes (topmost first, set by FqPushdown);
  physical plan generation converts it to SFederatedScanPhysiNode.pRemotePlan
- wire CLONE_NODE_FIELD and nodesDestroyNode for the new field

planOptimizer.c — FqPushdown rule (Phase 1):
- add OPTIMIZE_FLAG_FQ_PUSHDOWN flag; register 'FqPushdown' rule at end of
  optimizeRuleSet so it runs after all other rules
- fqFindExternalScan: DFS find the first unprocessed SCAN_TYPE_EXTERNAL node
- fqPushdownOptimize: collect consecutive pushdownable single-child ancestors
  (Sort / Project) of the external scan, rewire main tree to replace topmost
  with the scan, disconnect bottommost from the scan, hand the detached chain
  to pScan->pRemoteLogicPlan
- add Phase 2 stubs (fqHarvestConditions / fqConvertPartition / fqConvertWindow
  / fqHarvestAgg / fqHarvestLimit / fqMergeJoin / fqPushdownSubquery) as
  no-ops; replace the old 'skip all optimizer rules' bypass with real logic
- add FQ guards in existing optimizer passes (scanPathOpt, pdcDealScan,
  eliminateNotNullCond, sortPriKeyOpt, partTagsIsOptimizableNode,
  pushDownLimitTo) to prevent incorrect optimization of external scan nodes

planPhysiCreater.c:
- add remoteLogicNodeToPhysi: convert one Sort/Project logic node to its
  physi counterpart and wire pChild as its single child; propagate LIMIT
  from the logic node down to pLeaf (SFederatedScanPhysiNode) if not set
- add buildRemotePlanFromLogicPlan: collect pRemoteLogicPlan chain top-down,
  then convert bottom-up with pLeaf as initial bottom; on error leave the
  partial chain for caller cleanup
- call buildRemotePlanFromLogicPlan inside createFederatedScanPhysiNode to
  populate SFederatedScanPhysiNode.pRemotePlan from pRemoteLogicPlan
Replace tdLog.exit() with raise AssertionError throughout the federated
query test suite so pytest can capture failures, show full tracebacks,
and still run teardown.

Key changes per file:

federated_query_common.py
- Add _fmt_result_table() to render actual-vs-expected row-by-row diff
- assert_query_result: show full table diff on row count or cell mismatch
- _assert_error_not_syntax / _assert_external_context: include errno (hex)
  and error_info string in failure messages
- assert_plan_contains: dump full numbered plan on keyword-not-found

test_fq_13_explain.py
- _get_explain_output: capture errno/error_info on query failure
- _assert_contain / _assert_not_contain: dump full numbered output on failure
- _get_remote_sql_line: dump full output when Remote SQL line is missing
- _assert_no_local_operator: show offending line + full plan on failure
- _check_analyze_metrics: show both plain and ANALYZE outputs on failure
- _assert_remote_sql_kw / _assert_remote_sql_no_kw: raise AssertionError

test_fq_14_result_parity.py
- _get_rows: catch query errors and surface SQL + errno + error_info
- Add _fmt_result_tables() for side-by-side local vs external diff with X
- _compare_rows: show full table diff on row count, col count, or value mismatch

test_fq_08_system_observability.py
- assert found: add source name to all 5 bare assert found statements
- row[col] == value: add actual/expected to assertion messages

test_fq_01_external_source.py
- assert desc.get(): add expected/actual to all 9 bare field assertions
Replace sequential strcasecmp chains in mysqlTypeMap/pgTypeMap/influxTypeMap
with per-dialect static sorted tables + bsearch, reducing worst-case lookup
from O(N) linear scan to O(log N).

- Add TypeMapEntry struct and bsearchTypeMap() helper
- Build sorted exact-match tables for MySQL (30 entries), PG (26 entries),
  Influx (12 entries)
- After bsearch miss, dispatch by first character to handle prefix/parameterized
  types (BIT(n), VARCHAR(n), DECIMAL(p,s), CHAR, NCHAR, ENUM, SET, arrays, ranges)
- Keep all helper functions (typeHasPrefix, parseTypeLength, setDecimalMapping) unchanged
- Update test_fq_03_type_mapping.py with expanded coverage for all new paths
…eries

Add fqInjectPkOrderBy() to planOptimizer.c:
- When no Sort node is pushed down (user did not specify ORDER BY) and
  no AGG/WINDOW node sits above the external scan (projection-only query),
  append a SSortLogicNode(ORDER BY <pk_col> ASC) to pRemoteLogicPlan.
- This guarantees external DB returns rows ordered by timestamp pk,
  matching TDengine implicit scan ordering (DS §5.2.x fallback flow).

Update fqPushdownOptimize():
- Track hasSortInChain during chain harvest loop.
- Allow empty chain (pure projection scan) to proceed to pk ORDER BY injection
  rather than bailing out early.
- Clarify comments and pParent scoping.

Add test_fq_06_pushdown_fallback.py:
- Covers projection-only scan (pk ORDER BY injected automatically).
- Covers AGG/WINDOW queries (no ORDER BY injection expected).
- Covers explicit ORDER BY passthrough.
- clientImpl.c: remove resolved 'todo refacto the error code mgmt' comment
- nodesCodeFuncs.c: update section comment from 'stubs' to 'serialization'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant