Bimon’s best

Ok, here is a word from me: this is no program, no procedure, not just a piece of software. This is a monument.

I hereby want to praise a guy who is still pretty alive and even visited me a few days ago in my hospital, looking far better than I do (unforgivable). For the time being, let’s call him “Bimon“. He’s kinda nervous about any sign of popularity, though right now, possible every second Pole knows who he is.

Back in 1986, together with two girl programmers, Bimon wrote the entire software allowing to prepare Tygodnik Mazowsze in computers, with no use of paper, noisy typewriters, hardly anything that could lead the communist secret police to trace us.

One of the helpers, whom Bimon calls Ewa (will add a name if she provides a green light). wrote a procedure to visualize a page of the newspaper on the 80×25 (characters by 25 lines, i.e. 640×200 pixels) mono (green or amber) screen powered by the Hercules graphics card. Yes, you are right now saying this was impossible, and it obviously was impossible, except that Ewa had done it. We could manipulate the size of the font etc. to see if the article will fit into the page,

Another girl, Hania, wrote a text editor simplified for an orangutan use, i.e. for those editorial staff members, whose digital IQ excluded the use of such a complicated tool as the Norton Editor. While te.exe was tiny and fast, I remembered it inherited a few functions from ne.com, such as F2-E exit-with-save. If I’m right, if my memory is not just playing with me now.

Bimon made it a hole and solid piece of epic software. Believe it or not, it worked. From the very beginning, it simply worked, while Bimon kept Bimoning, i.e. complaining this should be corrected, that may be rewritten so as to work faster, etc.

Well, I did have my small input in the Tygodnik Mazowsze software. By then, in 1986, I had rather rare encounters with 8086 or 80286-based PC’s, while officially making money writing and “translating” games for the 65c02 world (Atari XL/XE, C64, Apple II). I wrote however a bunch of batches for the editorial staff ot Tygodnik Mazowsze to use after each edition was ready to print: a “dobranoc.bat” (goodnigt.bat) file physically wiping the data folder and cache contents of the hard disks in the two laptops etc. small worms making the esbecja task more difficult in case they find us.

Perhaps the most important, difficult, monumental part of the job – was the HYPHENATION procedure. This is incredibly complicated in Polish. Call it Mission Impossible. Bimon wrote it in like 3-4 weeks. Just as if it were a guide to Warsaw’s Chinese restaurants (three of them by then, I guess). aa dodziuk, joasia jzczesna, tysiace tekstów…

ok, I shall fully wewrite this article in more plain English (though still Googlish). Just not tonight, sorry,

wlasny standard kodowania

przedrostki słów, osobne zasady, stale aktualizowane

Ok, first credentials done, let’s continue with the monument construction. Just the hyphenate.c here, some *.h and other additions to be intoduced later. Good reading, and remember, this is 1986 and the Polish underground press chased by the esbecja…

 

not found and\n”,

EX_FILE_TABLE);

printf(” exception file %s not found, aborting\n”,
hyphenfile_table);
return(101);
}
}

/* Read the file containing exceptions from hyphenation */
printf(“…hyphen: Reading exception data file %s \n”, hyphenfile_table);
status = hyphen_ini();
if (status == -1) {
/* Error */
printf(“…hyphen: Hyphen exception file error, warning \n”);
}

totalword_no = 0;
totalhyphen_no = 0;
w_no = 0;
while ((c = fgetc(in_stream)) != EOF) {
if (c == ‘\t’) {
c = 32;
}
if ((c != 32) && (c != ‘\n’)) {
w_table[w_no ++] = c;
} else if (w_no >= 1) {
/* End of word that is long enough to hyphenate */
w_table[w_no] = 0;
w_tot = w_no;
h_tot = hyphen_2(w_table, h_table);
if (h_tot == -1) {
printf(“…hyphen: Critical error, aborting \n”);
return(101);
} else if (h_tot == -2) {
printf(“…hyphen: Non-critical error, word %s not hyphenated \n”,
w_table);
}
h_no = 0;
for (w_no = 0; w_no < w_tot; w_no ++) {
fprintf(out_stream, “%c”, w_table[w_no]);
if ((h_no < h_tot) && (w_no == h_table[h_no] – 1)) {
h_no ++;
totalhyphen_no ++;
fprintf(out_stream, “%s”, tmp_table);
}
}
w_no = 0;
w_table[w_no] = 0;
/* Copy the ending character as is */
fprintf(out_stream, “%c”, c);
totalword_no ++;
} else {
/* End of word that is too short to hyphenate */
w_table[w_no] = 0;
fprintf(out_stream, “%s”, w_table);
w_no = 0;
w_table[w_no] = 0;
/* Copy the ending character as is */
fprintf(out_stream, “%c”, c);
totalword_no ++;
}
}
printf(“…hyphen: Hyphenated text written to the file %s\n”, outfile_table);
printf(“…hyphen: %d hyphen marks inserted into %d words\n”, totalhyphen_no,
totalword_no);
}

/*——————————————————————–*/

/*
This is CLIB6.C
Version: 23-Jun-1986, 24-Jun-1986, 25-Jun-1986, 26-Jun-1986,
28-Jun-1986, 27-Jul-1986, 5-Oct-1986, 11-Oct-1986,
12-Oct-1986, 8-Feb-1987, 25-Apr-1987, 6-Jun-1987,
8-Jun-1987, 15-Jul-1987, 16-May-1988, 17-May-1988,
9-Jun-1988, 7-Jul-1988, 2-Jan-1989, 14-Jul-1991 by M.Proszynski

This is a library of functions to hyphenate Polish words.
*/

#define EX_MAX 200 /* maximum number of exeptions */
#define C_MAX 100 /* maximum length of input string (or word) */
#define GV1 11
#define GV2 12
#define GS 13
#define GC 20

static unsigned char *ex_ptable[EX_MAX]; /* list of exception words */
static int ex_tot; /* total number of exceptions */
static int exh_table[EX_MAX]; /* position of the first hyphen in ex. word */
static int exl_table[EX_MAX]; /* length of the exception word */
unsigned char w1_table[C_MAX]; /* tmp. storage, then w_table in lowercase */
int ex_no;

/*——————————————————————–*/

int hyphen_ini()

/* This routine reads the exception file */

{
/* Read the file containing exceptions from hyphenation */

ex_no = 0;
while ((ex_no < EX_MAX) &&
(fscanf(in_2_stream, “%d%s”, exh_table + ex_no, w1_table) != EOF)) {
exl_table[ex_no] = strlen(w1_table);
if ((ex_ptable[ex_no] = malloc(exl_table[ex_no] + 1)) == NULL) {
printf(“…hyphen: Not enough memory, aborting\n”);
return(-1);
}
strcpy(ex_ptable[ex_no], w1_table);
/*
printf(” %d -> %s \n”, ex_no, ex_ptable[ex_no]);
*/
ex_no ++;
}
fclose(in_2_stream);
ex_tot = ex_no;
if (ex_tot >= EX_MAX) {
printf(“…hyphen: Too many exceptions in the exception file,\n”);
printf(” only %d entries allowed, aborting\n”, EX_MAX);
return(-1);
} else if (ex_tot == 0) {
printf(“…hyphen: No exceptions read in, aborting \n”);
} else {
printf(“…hyphen: %d exceptions read in \n”, ex_tot);
}
return(0);
}

/*——————————————————————–*/

int hyphen_2(w_table, h_table)

/*
This routine hyphenates a Polish word and returns:
(i) number of good places to put a hyphen,
(ii) 0 if there is no good place for a hyphen,
(iii) -1 if a critical error is found,
(iv) -2 if a non-critical error is found.
On entry, w_table should contain a word to be hyphenated, ending with \0.
On return, h_table[0] contains a number of characters before the first
good hyphen place, h_table[1] – before the second good place, etc.
Said differently h_table[0] = 1 means that first good place for a hyphen
is before the character number 1 in the w_table, i.e., after the w_table[0].

If non-alpha character other than a minus sign (hyphen) is found within a
word, the segment containing this character (a segment is a number of
non-vowel characters bounded by two vowels) is not hyphenated. Only
characters with ASCII codes from the dec32-dec159 (inclusive) region are
allowed. When the minus sign is found and it is followed
by at least 2 alpha characters, the position after the last
preceding it character is included as a possible hyphen place.

There must be at least 2 alpha characters before the first acceptable
hyphen and at least 2 alpha characters after the last acceptable hyphen.

It is possible that this routine could be simplified. It works, however.
*/

unsigned char w_table[]; /* input, word to hyphenate */
int h_table[]; /* output, good-hyphen places */

{
unsigned char w2_table[C_MAX]; /* w1_table with gen. consonants and vowels */
int v_table[C_MAX]; /* positions of vowels in the current word */
int w1_no, w1_tot, w2_no, w2_tot, v_no, v_tot, h_no, h_tot;
int delta, prefix_hyphen;
int tmp_no, tmp1;
int special_found; /* flag set to 1, if special character is detected */
unsigned char c;
int vowel_found; /* flag set to 1, if a vowel has been found */
int wx_no; /* counter */
int palpha_no; /* number of Polish alpha characters */

/* Change to the lowercase character set */
w1_no = 0;
while ((w1_no < C_MAX) && ((c = w_table[w1_no]) != 0)) {
if ((c >= ‘A’) && (c <= ‘Z’)) {
w1_table[w1_no ++] = c + 32;
} else if ((c >= ‘a’) && (c <= ‘z’)) {
w1_table[w1_no ++] = c;
} else if ((c >= 128) && (c <= 136)) {
w1_table[w1_no ++] = c + 16;
} else if ((c >= 144) && (c <= 152)) {
w1_table[w1_no ++] = c;
} else if ((c < 32) && (c > 175)) {
/* Character outside of the dec32 – dec175 range detected;
should not happen */
printf(“…hyphen: Not allowed ASCII found in %s, warning\n”, w1_table);
return(-2);
} else {
/* Non-alpha character from within the allowed range */
w1_table[w1_no ++] = c;
}
}
if (w1_no >= C_MAX) {
printf(“…hyphen: Too long word %s found, warning\n”, w1_table);
return(-2);
}
w1_tot = w1_no; /* total number of characters in w1_table */
w1_table[w1_no ++] = 0;

/* Create a table of generalized consonant and vowels */
w1_no = 0;
w2_no = 0;
while (w1_no < w1_tot) {
if (((w1_table[w1_no] == ‘i’) && (is_vowel(w1_table[w1_no + 1]) == 1)) ||
((w1_table[w1_no] == ‘a’) && (w1_table[w1_no + 1] == ‘y’)) ||
((w1_table[w1_no] == ‘o’) && (w1_table[w1_no + 1] == ‘e’))) {
/* iV –> GV2 and co., but the above condition may be not sufficient */
w2_table[w2_no ++] = GV2;
w1_no += 2;
} else if (is_vowel(w1_table[w1_no]) == 1) {
/* V –> GV1 */
w2_table[w2_no ++] = GV1;
w1_no ++;
} else if (is_palpha(w1_table[w1_no]) == 0) {
/* Special character */
w2_table[w2_no ++] = GS;
w1_no ++;
} else if (w1_no < w1_tot – 1) {
/* C… –> GC + … */
tmp1 = gen_consonant(w1_table + w1_no);
w2_table[w2_no ++] = GC + tmp1;
w1_no += tmp1;
} else {
/* C –> GC, last consonant in this string */
w2_table[w2_no ++] = GC + 1;
w1_no ++;
}
}
w2_tot = w2_no;

/* Create a table of positions of generalized vowels */
v_no = 0;
for (w2_no = 0; w2_no < w2_tot; w2_no ++) {
if ((w2_table[w2_no] == GV1) || (w2_table[w2_no] == GV2)) {
v_table[v_no ++] = w2_no;
}
}
v_tot = v_no; /* total number of generalized vowels */

if ((w1_tot <= 3) || (v_tot <= 1)) {
/* This word is too short or contains less than two generalized vowels */
return(0);
}

/* Check if this word is in the exception dictionary */
prefix_hyphen = 0;
v_no = 0;
h_no = 0;
for (ex_no = 0; ex_no < ex_tot; ex_no ++) {
if (strncmp(w1_table, ex_ptable[ex_no], exl_table[ex_no]) == 0) {
/* Use an entry from the exception dictionary */
prefix_hyphen = exh_table[ex_no];
v_no ++;
h_no ++;
break;
}
}

while (v_no < v_tot – 1) {
delta = v_table[v_no + 1] – v_table[v_no];
/* Check if a hyphen can be placed here */
special_found = 0;
if (delta > 1) {
for (w2_no = v_table[v_no]; w2_no < v_table[v_no + 1]; w2_no ++) {
if (w2_table[w2_no] == GS) {
special_found = 1;
}
}
}
if (special_found == 0) {
/* Try to hyphenate only if no special characters are present */
if ((delta == 1) && (v_table[v_no] < w2_tot)) {
/* V-V and not the very end of the word */
h_table[h_no ++] = v_table[v_no];
} else if (delta == 2) {
/* V-CV */
h_table[h_no ++] = v_table[v_no];
} else if ((delta == 3) && (w2_table[v_table[v_no] + 1] – GC <= 3)) {
/* VC-CV */
h_table[h_no ++] = v_table[v_no] + 1;
} else if ((delta == 4) && (w2_table[v_table[v_no] + 1] – GC <= 3)) {
/* VC-CCV */
h_table[h_no ++] = v_table[v_no] + 1;
} else {
printf(“…hyphen: Too complicated word %s\n”, w_table);
printf(” to find all possible hyphens, warning\n”);
/*
return(-2);
*/
}
}
v_no ++;
}
h_tot = h_no; /* total number of hyphens */

/*
printf(“\n”);
printf(“…hyphen: h_tot = %d \n”, h_tot);
for (h_no = 0; h_no < h_tot; h_no++) {
printf(“…hyphen: gen. h_table[%3d] = %d \n”, h_no, h_table[h_no]);
}
printf(“\n”);
*/

/* Replace each entry in h_table denoting the number of
generalized consonants and vowels by an entry denoting
the number of real consonants and vowels before each
hyphen; w1_no counts real consonants and vovels,
w2_no counts generalized ones, h_no is an entry number */
h_no = 0;
if (prefix_hyphen > 0) {
h_table[h_no ++] = prefix_hyphen;
}
w1_no = 0;
for (w2_no = 0; w2_no <= h_table[h_tot – 1]; w2_no ++) {
if (w2_table[w2_no] == GV1) {
w1_no ++;
} else if (w2_table[w2_no] == GV2) {
w1_no += 2;
} else if (w2_table[w2_no] == GS) {
w1_no ++;
} else if (w2_table[w2_no] >= GC) {
w1_no += w2_table[w2_no] – GC;
}
if (w2_no == h_table[h_no]) {
h_table[h_no ++] = w1_no;
}
/*
printf(“\n”);
*/
}

/* Check for the minus sign between two alphas (the case of a word
composed of two parts); there must be at least 2 alpha characters
after the minus sign and at least one vowel */
for (w1_no = 1; w1_no < w1_tot – 2; w1_no ++) {
if (w1_table[w1_no] == ‘-‘) {
if ((is_palpha(w1_table[w1_no – 1]) == 1) &&
(is_palpha(w1_table[w1_no + 1]) == 1) &&
(is_palpha(w1_table[w1_no + 2]) == 1)) {
vowel_found = 0;
for (wx_no = w1_no; wx_no < w1_tot; wx_no ++) {
if (is_vowel(w1_table[wx_no]) == 1) {
vowel_found = 1;
break;
}
}
if (vowel_found == 1) {
/* Add a hyphen here */
h_no = h_tot – 1;
while ((h_no >= 0) && (h_table[h_no] > w1_no)) {
h_table[h_no + 1] = h_table[h_no];
h_no –;
}
h_table[h_no + 1] = w1_no;
h_tot ++;
}
}
}
}

/* Check for the case of a too little number of alpha characters
before the first or after the last hyphen */
if (h_tot > 0) {
h_no = 0;
palpha_no = 0;
w1_no = 0;
while (w1_no < h_table[h_no]) {
if (is_palpha(w1_table[w1_no]) == 1) {
palpha_no ++;
}
w1_no ++;
}
if (palpha_no < 2) {
/* Less than 2 alpha characters before the first hyphen; remove
the first hyphen */
h_no = 0;
while (h_no < h_tot – 1) {
h_table[h_no] = h_table[h_no + 1];
h_no ++;
}
h_tot –;
}
}
if (h_tot > 0) {
palpha_no = 0;
w1_no = h_table[h_tot – 1];
while (w1_no < w1_tot) {
if (is_palpha(w1_table[w1_no]) == 1) {
palpha_no ++;
}
w1_no ++;
}
if (palpha_no < 2) {
/* Less than 2 alpha characters after the last hyphen; remove
the last hyphen */
h_tot –;
}
}

/*
printf(“\n”);
printf(“…hyphen: w_table = %s \n”, w_table);
for (h_no = 0; h_no < h_tot; h_no++) {
printf(“…hyphen: norm. h_table[%3d] = %d \n”, h_no, h_table[h_no]);
}
printf(“\n”);
printf(“…hyphen: Returning \n”);
*/

return(h_tot);
}

/*——————————————————————–*/

int is_vowel(c)

/*
Return 1 if c is a Polish vowel, otherwise return 0.
*/

unsigned char c; /* input, character to test */

{
static int vow_tot = 9;
static unsigned char vow_table[9] = {‘a’, ‘e’, ‘i’, ‘o’, ‘u’, ‘y’,
144, 146, 149}; /* >a, >e, >o */
int vow_no;

for (vow_no = 0; vow_no < vow_tot; vow_no ++) {
if (c == vow_table[vow_no]) {
return(1);
}
}
return(0);
}

/*——————————————————————–*/

int is_palpha(c)

/*
Return 1 if c is a Polish alpha, otherwise return 0.
*/

unsigned char c; /* input, character to test */

{
if ((c >= ‘A’) && (c <= ‘Z’)) {
return(1);
} else if ((c >= ‘a’) && (c <= ‘z’)) {
return(1);
} else if ((c >= 128) && (c <= 136)) {
return(1);
} else if ((c >= 144) && (c <= 152)) {
return(1);
}
/* Non-alpha character detected */
return(0);
}

/*——————————————————————–*/

int gen_consonant(w_table)

/*
Return the number of consonants (at the beginning of the supplied
part of the word in w_table) that make one generalized consonant.
*/

unsigned char w_table[]; /* part of the word to analyze for consonants */

{
struct gc {
unsigned char *w_ptr;
int length;
};

/* The table below must be organized in a reverse order */
struct gc static x_table[] = {

/* “\230n” , 2, v1.1 */ /* >zn */
“\230l” , 2, /* >zl */

“zj” , 2,
“zgrz” , 4,
“zgr” , 3,
“zgn” , 3,
“zg\223” , 3, /* zg>l */
“zgl” , 3,
“zg” , 2, /* v1.1 */
“zd\230” , 3, /* zd>z */
“zd\227” , 3, /* zd>>z */
“zdz” , 3,
“zdr” , 3,
“zd” , 2, /* v1.1 */

“wsz” , 3,
/* “ws” , 2, v1.4 */
“wrz” , 3,
“wr” , 2,
/* “wn” , 2, v1.1 */
“w\223” , 2, /* w>l */
“wl” , 2,
/* “wk” , 2, */

“tw” , 2,
“trw” , 3,
“trz” , 3,
“tr” , 2,
/* “tn” , 2, */
“th” , 2, /* v1.1 */

“\226w” , 2, /* >sw */
“\226l” , 2, /* >sl */
“\226c” , 2, /* v1.4 */

“sz\223” , 3, /* v1.1 sz>l */
/* “szcz” , 4, v1.1 */
“sz” , 2,
“stw” , 3,
“strz” , 4,
“str” , 3,
“st” , 2,
“sprz” , 4,
“spr” , 3,
/* “sp” , 2, v1.1 */
“s\223” , 2, /* s>l */
/* “skt” , 3, */
“skrz” , 4,
“skr” , 3,
“sk” , 2,
“sk\223” , 3, /* sk>l */
“skl” , 3,
“sj” , 2, /* v1.1 */
“sc” , 2,

“rz” , 2,
/* “rw” , 2, v1.1 */
/* “rdz” , 3, v1.1 */

“prz” , 3,
“pr” , 2,
“pl” , 2, /* v1.2 */

/* “kt” , 2, */
“ksz” , 3, /* v1.1 */
“krz” , 3,
“krw” , 3,
“kr” , 2,
“k\223” , 2, /* k>l */
“kl” , 2,

“grz” , 3,
“gr” , 2,
“gn” , 2,
/* “g\223” , 2, v1.1 */ /* g>l */
“gl” , 2,
“gdz” , 3,
“gd” , 2,

“d\230” , 2, /* d>z */
“d\227” , 2, /* d>>z */
“dz” , 2,
“drz” , 3,
“dr” , 2,
/* “dn” , 2, */
“d\223” , 2,
“dl” , 2,

/* “czn” , 3, */
“cz\223” , 3, /* v1.2 cz>l */
“czk” , 3, /* v1.2 */
“cz” , 2,
“c\223” , 2, /* v1.1 c>l */
/* “ck” , 2, v1.3 */
“cj” , 2,
“ch” , 2,
“chl” , 3,
“ch\223” , 3, /* ch>l */
“chw” , 3,

“brz” , 3,
“br” , 2,
“b\223” , 2, /* b>l */

“\000” , 0 /* last entry must be a zero */
};

int x_no;

x_no = 0;
while (x_table[x_no].length != 0) {
if (strncmp(w_table, x_table[x_no].w_ptr, x_table[x_no].length) == 0) {
return(x_table[x_no].length);
}
x_no ++;
}
return(1);
}

Posted in .lp

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.