Home
News
Profile
Contact
Half-Life
Music
PCASTL
Computer Science
Videos
Readings
OpenGL
Elements
C64 sids
Links
|
ICU Example
Boyer-Moore
Merge Sort
Computers
|
ICU C++ use example on MacOS
When I copy files from my Mac to my NAS, their names with composed accentuated characters are automatically decomposed for the copies. This causes my backup tool to take thoses copies as files without originals. See Unicode Normalization Forms for composed characters examples. As a solution, I wrote a program that renames all the original files to their decomposed form. Thus, the automatic decomposition then changes nothing. Here are the first steps that led to this solution. Install of icu4c in Brew:
brew install icu4c
Install of pkgconf in Brew:
brew install pkgconf
Display of /opt in Finder:
sudo chflags nohidden /opt
Setting PKG_CONFIG_PATH to the right value:
PKG_CONFIG_PATH=/opt/homebrew/Cellar/icu4c@77/77.1/lib/pkgconfig
export PKG_CONFIG_PATH transliterate example: #include <iostream> #include <string> #include <unicode/unistr.h> #include <unicode/translit.h> int main(void) { std::string init("t\xC3\xA4st"); // täst icu::UnicodeString ustrc = icu::UnicodeString::fromUTF8(init.c_str()); const char16_t *ustrc_buf = ustrc.getBuffer(); for (int i = 0; i < ustrc.length(); i++) { std::cout << std::hex << ustrc_buf[i] << " "; } std::cout << std::endl; UErrorCode status = U_ZERO_ERROR; icu::Transliterator *myTrans = icu::Transliterator::createInstance("Any-NFD", UTRANS_FORWARD, status); myTrans->transliterate(ustrc); for (int i = 0; i < ustrc.length(); i++) { std::cout << std::hex << ustrc_buf[i] << " "; } std::cout << std::endl; std::string result; icu::StringByteSink<std::string> bs(&result); ustrc.toUTF8(bs); return 0; }
Explanation: To build:
c++ -o example example.cpp -std=c++17 `pkg-config --libs --cflags icu-uc icu-i18n`
The icu-uc parameter is necessary for the data types and the icu-i18n parameter
is necessary to link with createInstance and transliterate. The program displays:
74 e4 73 74
Because 0x00E4 is the UTF-16 encoding of the composed ä. |
Mobile
|