Skip to content

Fix SQLWCHAR conversion paths on Linux (draft)#291

Draft
fdcastel wants to merge 3 commits intoFirebirdSQL:masterfrom
fdcastel:fix/unicode-widechar-paths
Draft

Fix SQLWCHAR conversion paths on Linux (draft)#291
fdcastel wants to merge 3 commits intoFirebirdSQL:masterfrom
fdcastel:fix/unicode-widechar-paths

Conversation

@fdcastel
Copy link
Copy Markdown
Member

@fdcastel fdcastel commented Apr 20, 2026

Summary

Draft reference implementation against the widechar test suite in #290. Fixes the MainUnicode.cpp Linux branches that assumed sizeof(SQLWCHAR) == sizeof(wchar_t).

Note: this branch is layered on top of #290, so the diff here includes both the tests (from #290) and the Unicode fix. Review the commits individually, or wait for #290 to merge and this will rebase cleanly to show only the Unicode fix.

What this PR does

  • ConvertingString destructor (output path, used by SQLGetDiagRecW / SQLErrorW): Linux branch now does a byte-level widening loop (SQLCHARSQLWCHAR) instead of mbstowcs((wchar_t*)...) — the latter wrote UTF-32 into a UTF-16 buffer on Linux, truncating HY000 to H and smashing the stack on tight output buffers.
  • convUnicodeToString (input path, used by SQLExecDirectW / SQLPrepareW): Linux branch does the inverse byte-narrowing loop instead of wcstombs, and uses a local sqlwcharLen helper instead of wcslen((wchar_t*)...).
  • lengthString = length / sizeof(SQLWCHAR): correct character count in the length-specified ctor.
  • NUL-terminate in SQLWCHAR units. In convUnicodeToString, *ptEndWC = L'\0' through a wchar_t* was a genuine 4-byte write into a 2-byte SQLWCHAR slot on Linux and is the real fix. The sibling *(LPWSTR)... in the destructor is a 2-byte narrowing store — not an overrun, per @irodushka in Add ASAN and Valgrind integration for CI test suite #289 (discussion_r3109004999) — so the replacement there is a style simplification, not a fix.

Mirrors unixODBC's own ansi_to_unicode_copy / unicode_to_ansi_copy approach.

Scope / Out of scope

  • Does not close Tier 1b entirely — the connection->MbsToWcs / WcsToMbs branch and non-ASCII (iconv) handling remain open.
  • This PR also removes the Linux-only GTEST_SKIP guards added in Add widechar ODBC call tests (driving Unicode fix) #290, so the test suite runs on every platform.

Context

Per @irodushka's plan in #289 comment 4279111183. Opened as draft so either:
(a) @irodushka approves and it lands,
(b) @irodushka pushes his own implementation on top, or
(c) @irodushka closes it and opens his own, using #290's tests as the acceptance criteria.

Test plan

…ectW / SQLPrepareW)

Exercises both ConvertingString constructor paths in MainUnicode.cpp:
output buffer (SQLGetDiagRecW / SQLErrorW) and input buffer
(SQLExecDirectW / SQLPrepareW). Linux tests skip with a pointer to the
follow-up Unicode rewrite PR; they are meant to drive that PR's
implementation. Windows runs them for real.

Per the plan in FirebirdSQL#289 comment 4279111183. Tracked as issue FirebirdSQL#287 Tier 1b.
Replace wchar_t-assuming mbstowcs/wcstombs/wcslen calls in the non-
connection Linux branches with SQLWCHAR-aware byte-level widening/
narrowing loops (matches unixODBC's ansi_to_unicode_copy /
unicode_to_ansi_copy). Makes the widechar-tests PR green on Linux.

Remove the Linux-only SKIP guard added by the tests PR — the tests
now run on every platform.

Issue FirebirdSQL#287 Tier 1b is not fully closed by this: the connection
MbsToWcs/WcsToMbs branch and non-ASCII handling remain open.
@irodushka
Copy link
Copy Markdown
Contributor

@fdcastel

NUL-terminate in SQLWCHAR units (no LPWSTR 4-byte write past the end on Linux).

Please read again this explanation #289 (comment)
I tried to explain that the changes you made were not a bug fix, but an optional code styling. That the C/C++ code like

int64_t x = 0;
int32_t y = x;

or

int64_t x = 0;
int32_t y;
int32_t *py = &y;
*py = x;

contains no bug.
Well, maybe you think I'm wrong? Let's discuss it.

LPWSTR is WCHAR* = unsigned short* = SQLWCHAR* on both Windows and
unixODBC/Linux, so `*(LPWSTR)ptr = L'\0'` writes 2 bytes via narrowing
store, not 4 — no overrun. Thanks to @irodushka for the correction
(FirebirdSQL#289 discussion_r3109004999). Keeping the replacement
`unicodeString[len] = 0;` which he endorsed as more compact and
understandable.

The sibling comment in convUnicodeToString stays — that one is a real
4-byte write through an explicit `(wchar_t*)` cast.
fdcastel added a commit to fdcastel/firebird-odbc-driver that referenced this pull request Apr 20, 2026
Per FirebirdSQL#289 discussion with @irodushka: rather than keep ASAN running
under continue-on-error, disable it outright until the Unicode rewrite
lands (draft PR FirebirdSQL#291, tracked as FirebirdSQL#287 Tier 1b). Re-enable the matrix
entry once FirebirdSQL#291 merges.

Valgrind remains strict.
@irodushka
Copy link
Copy Markdown
Contributor

ok!

@fdcastel
Copy link
Copy Markdown
Member Author

@irodushka Agreed on the destructor case — *(LPWSTR)ptr = L'\0' is a 2-byte narrowing store on unixODBC (LPWSTR == SQLWCHAR* == unsigned short*), not an overrun. Already corrected in 0d9bce9: the overrun comment was dropped and unicodeString[len] = 0; was kept only as the style simplification you endorsed. The PR description still overclaimed it as a bug fix — fixing that now.

One distinction worth separating on the sibling case in convUnicodeToString: pre-PR code declared wchar_t *ptEndWC and did *ptEndWC = L'\0';. There the pointee type is already wchar_t, so no narrowing happens — the store is sizeof(wchar_t) == 4 bytes on Linux into a 2-byte SQLWCHAR slot. That one is a genuine 4-byte overrun and the fix stays. Let me know if you read that one differently.

@fdcastel
Copy link
Copy Markdown
Member Author

PR description updated to reflect the distinction.

Comment thread MainUnicode.cpp
#endif
else if ( wcString[length] != 0 )
{
ptEndWC = (wchar_t*)&wcString[length];
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ptEndWC = (wchar_t*)&wcString[length];
ptEndWC = wcString + length;

Comment thread MainUnicode.cpp
{
size_t bytesNeeded;
wchar_t *ptEndWC = NULL;
wchar_t saveWC;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
wchar_t saveWC;
SQLWCHAR saveWC;

Comment thread MainUnicode.cpp
SQLCHAR * convUnicodeToString( SQLWCHAR *wcString, int length )
{
size_t bytesNeeded;
wchar_t *ptEndWC = NULL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SQLWCHAR *ptEndWC = NULL;

@irodushka
Copy link
Copy Markdown
Contributor

Yep. This place

			ptEndWC = (wchar_t*)&wcString[length];
			saveWC = *ptEndWC;
			*ptEndWC = L'\0';

really looks like overrun. Frankly, two overruns - 1) *ptEndWC = L'\0'; - you've fixed it 2) saveWC = *ptEndWC; - out-of-bound read, not fixed yet)) A read is not so deadly as a write, but it can lead to UB or even crash.

and in the second place, at the end of the function:

		if ( ptEndWC )
			*ptEndWC = saveWC;

this's the third real write overrun)) unfixed)

I think it's easier to rewrite this than to try to fix all those issues...

@fdcastel
Copy link
Copy Markdown
Member Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants