使用cc65打印德國變音符號


3

我想用C64上已編譯的C程序打印帶有變音符號的德語文本。我知道這些字符不在字符集中,我將不得不更改字符集。

但是,似乎變音符號"Ä"和"Ö"映射到ASCII字符$ 76和$ 7c,它們分別打印為字符" D"和" V",我想保留它們自己。是否可以在cc65編譯器中更改變音符號到ASCII的映射?

我已使用該程序進行測試

#include <stdio.h>

void main (void)
{
  printf("äöüÄÖÜß\n"); //mapped to ASCII e4 f6 fc 64 76 7c 7f 0d
}

並使用cl65 test.c -o test.prg進行編譯。

2

I would expect that the CC65 compiler, like most compilers, would output string literals using whatever sequence of bytes appears in the source file. If you want to ensure that particular byte values get included in a string, you can use a backslash followed by a three-digit octal number to include any byte value within a string. While one could use fewer than three octal digits, or else use a hexadecimal syntax, doing things those ways may yield unpleasant results if the next character in a string is interpreted as a digit. For example, `\015FUNNY" or "\15FUNNY" would yield a carriage return followed by FUNNY, but "\015012345" and "\15012345" would yield a carriage return followed by the digits 012345, and a string with character code 104 (150 octal or 0x68 hex) followed by the digits 12345.

Note that on the Commodore 64, I would recommend drawing direct to screen memory rather than trying to use "printf". It's going to be faster, and the screen codes are more predictable and easier to work with.


8

Your source is most likely in ISO 8859-1 (or -15) encoded (*1), so the compiler has to do code conversion between character literals in your source and the designated target. Without a specifying a target CL65 uses, unlike all other tools, the C64 target by default (*2). As specified in target.c line 193, the C64 Target uses the PETSCII table at line 113, which shows exatly the conversations you noted.

Luckily there's a pragma called charmap to change this. So for example if you want to position the Umlauts at the classic 7 bit DIN 66003 positions simply add the following lines (*3):

/* Redefinition of 8859-1 codepoints for Umlauts ("ÄÖÜäöüß") */
/* to ISO-IR-21 aka ISO 646-DE aka DIN 66003                 */
#pragma charmap (0xE4, 0x7B)
#pragma charmap (0xF6, 0x7C)
#pragma charmap (0xFC, 0x7D)
#pragma charmap (0xC4, 0x5B)
#pragma charmap (0xD6, 0x5C)
#pragma charmap (0xDC, 0x5D)
#pragma charmap (0xDF, 0x7E)

This is best put in some generic include for all sources (*4).

Also keep in mind that these changes are not global but only effective afterwards and can be overwritten again by a follow up pragma charmap reassigning any of these codes.


The default mapping maps the whole 0xC0..0xDF section onto 0x60..0x6F, whichare (mostly) the upper case letters.

At first glace this seems quite strange, really strange, until one realizes that this (0xC1..0xDA) is where the shifted letter keys are returned when reading the keyboard (*5). So in cases of keyboard read this might make some sense.

Still, I have no idea why it's done for character literals. My assumption would be some kind of compatibility situation, or a simple left over from such. So i'd say it was a situation of compatibility vs. support or chars that are not available on a standard PETSCII machine (*6) anyway, where they had to choose which foot to shoot ... and it did hit the Umlauts (*7).


*1 - The CC65 suite assumes ISO 8859-1. 8859-15 differs by the Euro symbol at 0xA4 and a few other characters.

*2 - Never trust defaults. Using defaults can result in many wasted ours to learn that different programs use different defaults or act on them different. So adding a -t c64is always a good idea.

*3 - Just as example, you may of course use any other assignment.

*4 - Don't change any default files, as that would make your sources even less portable.

*5 - the lower case do show up at the standard positions of 0x41..0x5A.

*6 - Ignoring that real PET/CBM machines were sold in national variants, offering additional symbols by replacing certain graphic symbols, thus keeping full ASCII compatibility and national characters.

*7 - Which is really sad, as CC65 using ISO-8859-1 source would have allow to support all of them in a standardized fashion.