Search in files and Find do not work with non-Latin characters

## Problem description

In my Eclipse based product (sorry it is not open source), we found that in particular scenarios (such as "whole word only") Search in files and Find do not work with non-Latin characters with JDK later than v21. 
To implement  "whole word only", Eclipse code created a regex with the text surrounded by "\b". It is not processed correctly. 
We assumed that the problem is in Java SDK and opened a ticket for them, but it was rejected with the following explanation:

```
The '\b' meta character behaviour has been changed in Semeru 21 or later versions. The change has been implemented because of the below OpenJDK issue.

OpenJDK Issue and Java 19 Release Note:
-----------------------------------------------------

[https://bugs.openjdk.org/browse/JDK-8282129](https://urldefense.proofpoint.com/v2/url?u=https-3A__bugs.openjdk.org_browse_JDK-2D8282129&d=DwQCaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=fhaoxtVu0e-iwX8nTZK86CwLEQtEiHmql8Am2TrpEK0&m=H0KLuEUYnT2W6zyi-I9erLSamgGyMUIx8jd872LgBl6Fxu5TvjUErd3pRBZT9FfQ&s=xW2STSDLKBbXNKOOMdqIGA3uJTJVGYmyVnhEDKEZxWU&e=)

[https://www.oracle.com/java/technologies/javase/19-relnote-issues.html#JDK-8264160](https://urldefense.proofpoint.com/v2/url?u=https-3A__www.oracle.com_java_technologies_javase_19-2Drelnote-2Dissues.html-23JDK-2D8264160&d=DwQCaQ&c=BSDicqBQBDjDI9RkVyTcHQ&r=fhaoxtVu0e-iwX8nTZK86CwLEQtEiHmql8Am2TrpEK0&m=H0KLuEUYnT2W6zyi-I9erLSamgGyMUIx8jd872LgBl6Fxu5TvjUErd3pRBZT9FfQ&s=9YDXr169fPzu3GN1s9KGaGhBBJXFuabt2Pq5jceny18&e=)

In Semeru 18 or earlier version, The \b was by default unicode aware. So it was able to process the Hebrew characters successfully. 

However in Semeru 21 or later version, the \b meta character matches ASCII word characters by default. To match Hebrew string, the UNICODE_CHARACTER_CLASS must be set because it contains unicode characters.

How to set UNICODE_CHARACTER_CLASS ?
-----------------------------------------------------------------

Pattern pattern = Pattern.compile(findString, Pattern.UNICODE_CHARACTER_CLASS);

Reason behind the change:
------------------------------------------

In Semeru 18 or earlier versions, the \b (word boundary) behaviour was inconsistent with \w (word character) behaviour. The \w (word character) matches [a-zA-Z_0-9] in the absence of UNICODE_CHARACTER_CLASS being set. However the \b relies on j.l.Character.isLetterOrDigit along with a check for underscores and isLetterOrDigit method matches some unicode characters in addition to the range specified by \w (word character). However when UNICODE_CHARACTER_CLASS is set, the character range of both \w and \b is consistent.

In Semeru 21 or higher version, the \b and \w behaviour will be consistent whether UNICODE_CHARACTER_CLASS is set or not. The \b matcher is now uses ASCII_WORD() predicate in java.util.regex.CharPredicates to get the same range of characters as \w for determining word boundaries.
```

I am aware about two files where the change is required:  `FindReplaceDocumentAdapter` class of `org.eclipse.jface.text` and `PatternConstructor.java` of `org.eclipse.search.core`.
Please set UNICODE_CHARACTER_CLASS there. 
AFAIK there are no side effects for the change.

## Tested under this environment:
* OS & version: Windows 11
* Eclipse IDE/Platform version (as shown in *Help > About*): Version: 2024-03 (4.31.0)


## Community

- [x] I understand reporting an issue to this OSS project does not mandate anyone to fix it. Other contributors may consider the issue, or not, at their own convenience. The most efficient way to get it fixed is that I fix it myself and contribute it back as a good quality patch to the project.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Search in files and Find do not work with non-Latin characters #3892

Problem description

Tested under this environment:

Community

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Search in files and Find do not work with non-Latin characters #3892

Description

Problem description

Tested under this environment:

Community

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions