Size and oversize

galois.ai
Oct 10, 2023
15 min read

Updated: Jan 20

Turning the pages of history at the intersection of present-day Slovakia and Austro-Hungary, we noticed an interesting pattern of morphological derivation. The Slovak prefix nad- is quite consistently matched by the prefix over- in Dutch or English. For "proof", here are a few examples from the previous expedition.

časový -> tijdelijk (timely) nadčasový -> tijdloos (timeless) sledovať -> bekijken (to view) nadsledovať -> overzien (oversee) hodnotiť -> waarderen (to value) nadhodnotiť -> overwaarderen (overvalue) merný -> matig (moderate) nadmerný -> overmatig (excessive) etc.

Let's try to discover a similar pattern in Hungarian, that is, find (if it exists) a prefix that shifts the meaning of the root in the same direction as nad- or over-. Let's look for Hungarian equivalents to the four previous examples.

1. časový -> időszerű (timely) nadčasový -> időtlen (timeless) 2. sledovať -> megtekint (to view) nadsledovať -> felügyel (oversee) 3. hodnotiť -> értékel (to value) nadhodnotiť -> túlértékel (overvalue) 4. merný -> mérsékelt (moderate) nadmerný -> túlzott (excessive)

túl- timidly suggests itself as a prefix for "more". We ask some generative AI to go on and try very hard to find Hungarian equivalents in túl-.

5. baliť -> csomagol (to pack) nadbaliť -> túlcsomagol (overpack) 6. miera -> mérték (measure) nadmiera -> túlmérték (excess) 7. obudnúť -> felébred (to awaken) nadobudnúť -> megszerez (survive) 8. priemer -> átlagos (average) nadpriemer -> átlag feletti (above average) | túlátlagos* [Note: "túlátlagos" is not a standard term] 9. riadenie -> irányítás (steering) nadriadenie -> felettes (superior) [Note: No direct "túl-" equivalent] 10. bytok -> lakás (apartment) nadbytok -> bőség (abundance) [Note: No direct "túl-" equivalent]

We're happy to look a little further afield. Our generative AI of choice will go on generating Slovak nad- prefixed words and their Hungarian túl- equivalents.

11. rozvinutý (developed) -> fejlődött nadmerne rozvinutý (overdeveloped) -> túlfejlett 12. hrnúť -> feltölteni (to fill) nadhrnúť -> túltölteni (to overfill) 13. úvati -> felfújni (to inflate) nafukovať nadúvati -> túlfújni (to overinflate) 14. mierne -> mérsékelt (moderate) nadmierne -> túlzottan (excessively) 15. dojiť -> tejelni (to milk) nadojíť -> túltejelni (to overmilk) 16. váha -> súly (weight) nadváha -> túlsúly (overweight) 17. vláda -> kormány (government) nadvláda -> dominancia (dominance) | túluralom* [Note: "túluralom" is not a standard term] 18. tučniť -> hízni (to fatten) nadtučniť -> túlzsírosít (to overgrease) 19. výšenie -> emelkedés (rising) nadvýšenie -> túlemelkedés (elevating) 20. zvukový -> hangsebesség (sound speed) nadzvukový -> hangsebesség feletti (supersonic) 21. limit -> korlát (limit) nadlimit -> korlát feletti (above limit) | túlkorlát* [Note: "túlkorlát" is not a standard term] 22. sila -> erő (force) nadsila -> túlerő (overpower) 23. chutiť -> ízlik (to taste) nadhutiť -> túlízesít (to overseason, to spice up) [Note: no direct "túl-" equivalent for "ízlik"] 24. robiť -> csinál (to do) nadrobiť -> túlcsinál (to overdo) 25. rýchlosť -> sebesség (speed) nadrýchlosť -> túlsebesség (excessive speed) 26. nádej -> remény (hope) nadnádej -> túlzott remények (excessive hopes) | túlremény* [Note: "túlremény" is not a standard term] 27. výkon -> teljesítmény (performance) nadvýkon -> túlteljesítmény (overperformance) 28. práca -> munka (work) nadpráca -> túlmunka (overwork) 29. význam -> jelentőség (significance) nadvýznam -> túljelentőség (oversignificance) 30. množstvo -> mennyiség (quantity) nadmnožstvo -> túlmennyiség (excess)

A few more, although our intuition is already confirmed.

31. žiť -> élni (to live) nadžiť -> túlélni (to outlive) 32. vydržať -> kitart (to endure) nadydržať -> túlkitart (to overendure) 33. rásť -> növekszik (to grow) nadrásť -> túlnövekszik (to overgrow) 34. použiť -> használ (to use) nadpoužiť -> túlhasznál (to overuse) 35. pracovať -> dolgozik (to work) nadpracovať -> túldolgozik (to overwork)

In short, the prefix túl- in Hungarian generally conveys the sense of over, beyond, or excessive.

Because we're a blank page in Korean, we'll choose this language to to carry out the same attempt at inference.

1. fejlődött -> 발달한 (developed) túlfejlett -> 과발달한 (overdeveloped) 2. feltölteni -> 채우다 (to fill) túltölteni -> 과다하게 채우다 (to overfill) 3. felfújni -> 부풀리다 (to inflate) túlfújni -> 과도하게 부풀리다 (to overinflate) 4. erő -> 힘 (force) túlerő -> 압도적인 힘 (overpower) 5. tejelni -> 젖을 짜다 (to milk) túltejelni -> 젖을 지나치게 짜다 (to overmilk) 6. súly -> 무게 (weight) túlsúly -> 과체중 (overweight) 7. emelés -> 들어올리기 (lifting) túlemelés -> 과도하게 들어올리기 (overlifting) 8. jelentőség -> 중요성 (significance) túljelentőség -> 과도한 중요성 (oversignificance) 9. csomagol -> 포장하다 (to pack) túlcsomagol -> 과도하게 포장하다 (overpack) 10. mérték -> 척도 (measure) túlmérték -> 과도 (excess) 11. teljesítmény -> 성과 (performance) túlteljesítmény -> 과도한 성과 (overperformance) 12. munka -> 일 (work) túlmunka -> 과로 (overwork)

Our eyes, unaccustomed to the new alphabet, have to squint as if to decipher something written very small. Nevertheless, we are able to often recognize the original Korean word (in blue) in its derivation with a nuance of excess. The match is perfect here : 발달한 (developed) -> 과발달한 (overdeveloped) 채우다 (to fill) -> 과다하게 채우다 (to overfill) 부풀리다 (to inflate) -> 과도하게 부풀리다 (to overinflate).

It is partial in 척도 (measure) -> 과도 (excess) and quite clear in 젖을 짜다 (to milk) -> 젖을 지나치게 짜다 (to overmilk). 일 is not far from 로 in일 (work) -> 과로 (overwork). We highlight in bold the remainder. 과다하게- is the obvious candidate for our Korean túl-. The first syllable block 과 of this alleged prefix is also found in 과로, or 과도, and the last of 지나치게, 게, in the "remainder" of 젖을 지나치게 짜다.

We have another prefix covering two other cases.

중요성 (significance) -> 과도한 중요성 (oversignificance) 성과 (performance) -> 과도한 성과 (overperformance)

In 과도한-, the first syllable block is the same as in our candidate of choice 과다하게-, while the last, 한, is not far from하. The following example has yet another suffix, 힘 (force) -> 압도적인 힘 (overpower) but generative AI can think of another equivalent, 과도하게 강한 힘 (excessively strong force).

Likewise, it finds an equivalent in 과다하게- for 무게 (weight) -> 과체중 (overweight), 과도하게 무거운(excessively heavy)... Interesting! So now we seems to have in 과다하게- our Korean equivalent of over- (English, Dutch), túl- (Hungarian) or nad- (Slovak).

The truth is, it's not a Korean prefix, but an adverb. 과다하게 means excessively. And 과다한 seen in 과도한 중요성 (oversignifiance) and 과도한 성과 (overperformance) is an adjective meaning excessive.

과다하게 먹다 (gwadahage meokda) = to eat excessively 과다한 소비 (gwadahan sobi) = excessive consumption

And some further examples using the adjective 과다한, excessive,

소비 (consumption) -> 과다한 소비 (overconsumption) 알코올 (alcohol) -> 과다한 알코올 (excessive alcohol) 스트레스 (stress) -> 과다한 스트레스 (excessive stress) 식욕 (appetite) -> 과다한 식욕 (excessive appetite) 이용 (use) -> 과다한 이용 (excessive use) 정보 (information) -> 과다한 정보 (excessive information) 작업 (work) -> 과다한 작업 (excessive work) 운동 (exercise) -> 과다한 운동 (excessive exercise) 자신감 (confidence) -> 과다한 자신감 (excessive confidence) 관심 (interest) -> 과다한 관심 (excessive interest)

A few further truths for a fuller picture.

Korean uses fewer affixes (prefixes and suffixes) than Indo-European languages like English, Dutch, or Slovak, and often employs compound words or descriptive phrases featuring adjectives or adverbs to convey nuanced meanings. While affixes can attach to many root words to create new forms in Indo-European languages, Korean "compound words" are fixed forms made of several components, each conveying meaning or nuance. We have already discussed the adjective 과도한 (excessive), and the adverb 과다하게 (excessively). Below are additional ways to modify a verb or noun to express the sense of "beyond," "over," or "excess." 1. 초과하다 and 지나치다 are verbs meaning "to exceed" or "to go beyond", e.g., 시속 100km를 초과하 다 (to exceed 100 km/h). More often than using 지나치다, its derived adverb, 지나치게, is used, e.g., 지나치게 먹다 (to eat excessively) 2. 너무is an adverb meaning "too" or "excessively", e.g., 너무 많이 먹다 (to eat too much). 3. -배 is a noun used after numbers to indicate multiples or "times", e.g., 3배로 증가하다 (to increase threefold). 4. 과- is a syllable block that can be part of compound nouns, adding a meaning similar to "hyper-" or "over-" in English, e.g., 과속 (overspeeding).

Let's marvel for a moment at the great distance we've traversed thus far.

We started from an observation made earlier, namely that occurrences of the prefix over- in Dutch or English often find an equivalent in nad- in Slovak. That is, that prefixing a word with over- or nad- seem to be ways, in English (or Dutch) and Slovak respectively, of modifying a root word's meaning to signify an "excess of it". We have inferred from numerous examples their Hungarian (the prefix túl-) and Korean counterparts (adverbs such as 과다하게 or 지나치게, adjectives like 과도한 or a syllable block, 과-, for building a compound noun). Let's imagine that the set of words, that is, the lexicon, of a given language, occupies a space (in the mathematical sense, a vector space, which can be visualized simply if it's two- or three-dimensional). In this (semantic) space, the position of each word characterizes its meaning, or rather all its meanings in all observed or conceivable contexts of occurrence. It seems that a number of words from the language's lexicon, located somewhere in semantic space, can be shifted to or transformed into other words located further in the same space, according to the same shift or transformation (according to the same vector) that represents "bringing to the initial word the additional meaning of excess". And we can extend this word representation to longer sequences, e.g., nominal phrases or verbal phrases. In this model, each phrase is represented in a semantic space by a point characterizing its meaning. And some phrases can be transformed, all by the same vector, into some respective "excessive" counterparts. In Slovak, Hungarian, English, Dutch and Korean, we've found "functions" that achieve this shift in meaning quite well, such as appending a given prefix or suffix or adding a given adverb or syllable block. And we can assume that almost every language has its own. Knowing such functions, it's easy enough to reconstruct from the original words their counterparts with a nuance of "excess". Obviously, beyond excess and surpassing, many other notions and transformations span a language's vocabulary. But it can be assumed that there is only a finite number of them, and that the most important ones can be circumscribed fairly easily. Let's make a first effort towards a list that we will complete in subsequent investigations, and which we hope to make the most exhaustive we can.

We can transform a root word to instill a notion of

excess, surplus, or surpassing, "beyond" (to achieve -> to overachieve, to work -> to overwork).
defect, insufficiency, or lack of something (to perform -> to underperform, to estimate -> to underestimate).
thoroughness or completeness (to view -> to overview, to look -> to overlook, to come -> to overcome).
increase, improvement, or upward movement (to grade -> to upgrade, to scale -> to upscale).
decrease, degradation, or movement downward (to grade -> to downgrade, to size -> to downsize)
addition, accrual, or augmentation (to join -> to adjoin, to gain -> to accumulate)
removal, reversal, or negation (to value -> to devalue, to inflate -> to deflate, to tie -> to untie).
movement toward, or the fact of coming closer (to move -> to advance, to move -> to approach).
outward movement, rejection or exclusion (to push or to drive -> to expel, to bound -> to outbound, to add -> to exclude)
inclusion, incorporation, or the fact of enveloping or containing (to fold -> to enfold, to close -> to enclose, to add -> to include).
error, incorrectness, or corruption (to understand -> to misunderstand, to use -> to misuse)
temporal sequence, coming before or later in time (history -> prehistory, of birth -> post-partum)
modification, change, or alteration (to form -> to transform, to carry -> to transport)
recurrence, renewal, or the fact of starting again (new -> renewal, to consider -> to reconsider)
etc.

This list calls for a few comments.

Each notion subsumes a number of them, and we could actually subdivide the categories, since each word carries a particular nuance, until we arrive at one notion per word. Pragmatically, what is most important are the large sets, and the linguistic functions, such as "adding a prefix", that are quite systematically associated with them.
Further, the source or root word of such transformations may not be obvious, or may not exist in the lexicon as such. It could be said that the addition of the prefix ex-, expressing exclusion or rejection, transforms pel* into expel, but pel*, a Latin verb meaning to push or to drive, is not in the English lexicon. The antecedent of expel can be seen in to push or to drive instead. Similarly, advance adds the prefix ad- expressing movement towards to vance*, which doesn't exist in English but betrays the Latin venire, to come. In these and many other instances, the etymology and word formation bear witness to the transformation, but the antecedent is not evident in today's language.
We are far from exhaustive, and in fact, the notions could be organized a little differently, for example by subsuming them under more generic (actually not mutually exclusive) sets. Some are related, literally or figuratively, to location, others to direction or trend, others to movement, still others to recurrence, or number, etc.

In everyday use of a mother tongue, we don't pay attention to the fact that over- in overcome conveys a meaning of completeness or in- in include expresses belonging. But becoming aware of these conceptual clusters (or transformations), and some associated linguistic functions can be a powerful learning strategy. Such focus on a language's transformative functions greatly reduces the learning effort. Considering for instance the following Slovak vocabulary

1. rýchlosť (speed) -> nadrýchlosť (excessive speed) 2. nádej (hope) -> nadnádej (excessive hopes) 3. výkon (performance) -> nadvýkon (overperformance) 4. práca (work) -> nadpráca (overwork) 5. význam (significance) -> nadvýznam (oversignificance) 6. množstvo (quantity) -> nadmnožstvo (excess). Instead of memorizing that rýchlosť means speed and nadrýchlosť, overspeed, výkon, performance and nadvýkon, overerformance, we need only remember the first column and that the function "prefix with túl- or nad-" expresses "excess".

To cut a long story short, two closing remarks (for now):

In the foregoing, we have mainly talked about semantic transformations at the word level. But we can also think, for instance, of the transformation of a proposition into its negation, a main proposition into a relative proposition, a passive sentence into an active sentence, and so on. These transformations, too, respond to (grammatical) rules or (lexical, structural) usages that are quite systematic. Slovak, for example, (often) forms negation by prefixing an adjective or verb with "ne-", e.g., možný (possible) -> nemožný (impossible), vidieť (to see) -> nevidieť (not to see), or using the determiner žiadny (no, none or any in English), as in Žiadny človek to nevie (Nobody knows it). In general, what matters is to get comfortable with a language's basic building blocks and basic patterns around frequent notions. These include small units of meaning, word modifiers like prefixes and suffixes, methods for creating new words, commonly-used adjectives and adverbs, and the rules that govern sentence structure. Understanding these elements and functions makes expression (forming words, phrases or sentences) and understanding (recognizing words or structures) greatly simplified.
How do we learn these patterns and pieces of meaning? Precisely this question has inspired our recent experiments. Our method was exposure to a large, targeted parallel corpus. Slovak-Hungarian, then Korean as well. We master the connotations of a given prefix in a known language, and from a large number of translated examples, we infer one or more ways of expressing the same semantic shift in other languages. Starting from functions and rules, dryly learning that túl- is Hungarian for over-, seems less practical. We are convinced that massive exposure to linguistic data is what really pays off. Hypotheses formulated, rules inferred can seek confirmation (e.g., from the best generative AI). But it's important to ask questions first, to look for analogies, to carry out the sometimes laborious exercise of investigation. This is probably when the links between disparate elements are created in a lasting way, when human neural networks are effectively calibrated and trained.

Finally, a little cultural curiosity. What is the common heredity (if any) between Korean and Hungarian or Slovakian respectively?

In terms of linguistic heredity, Korean, Hungarian, and Slovak do not share a common language family. Korean is part of the Koreanic language family, Hungarian belongs to the Uralic language family, and Slovak is a Slavic language, part of the Indo-European family. There is no established linguistic relationship that connects these languages historically. Nyelvészeti örökség szempontjából a koreai, a magyar és a szlovák nyelv nem oszt meg közös nyelvcsaládot. A koreai a koreai nyelvcsaládhoz tartozik, a magyar az uráli nyelvcsaládhoz, a szlovák pedig egy szláv nyelv, ami az indoeurópai nyelvcsalád része. Nincs megalapozott nyelvészeti kapcsolat, amely történelmileg összekötné ezeket a nyelveket. 언어학적 계승 측면에서 한국어, 헝가리어, 슬로바키아어는 공통의 언어 가족을 공유하지 않습니다. 한국어는 한국어 계열에 속하고, 헝가리어는 우랄어 계열에 속하며, 슬로바키아어는 슬라브어로서 인도유럽어 계열에 속합니다. 이 언어들을 역사적으로 연결하는 확립된 언어학적 관계는 없습니다. Z hľadiska jazykového dedičstva kórejčina, maďarčina a slovenčina nesdílejú spoločnú jazykovú rodinu. Kórejčina patrí do kórejskej jazykovej rodiny, maďarčina do uralskej a slovenčina je slovanský jazyk, ktorý je súčasťou indoeurópskej jazykovej rodiny. Neexistuje žiadny zavedený jazykový vzťah, ktorý by tieto jazyky historicky spájal. Wat betreft taalkundige afkomst delen het Koreaans, Hongaars en Slowaaks geen gemeenschappelijke taalfamilie. Koreaans behoort tot de Koreaanse taalfamilie, Hongaars tot de Oeraalse taalfamilie en Slowaaks is een Slavische taal, onderdeel van de Indo-Europese familie. Er is geen vastgestelde taalkundige relatie die deze talen historisch verbindt.

No relationship. Let's do one last exercise. Let's try to identify, without translation and by observation alone, the equivalents of a few expressions highlighted in the English text, precisely around the lexical field of relationship. We identify without fail közös, spoločnú, gemeenschappelijke for common, especially since we know nyelvcsaládot, jazykovú rodinu, and taalfamilie translate language family. relationship and connects are not complicated either, rendered by kapcsolat, vzťah, relatie and összekötné, spájal, verbindt respectively.

Korean remains an impenetrable forest of symbols. Yet a few hypotheses can be made based on observation. A priori, common language family finds its equivalent somewhere in the proposition 한국어, 헝가리어, 슬로바키아어는 공통의 언어 가족을 공유하지 않습니다. As in English, this begins certainly by listing the three languages, Korean, Hungarian and Slovak. A good candidate for this enumeration is한국어, 헝가리어, 슬로바키아어는, which contains times어. Then, the rest 공통의 언어 가족을 공유하지 않습니다 must match do not share a common language family. Besides, the second sentence includes the nominal group language family three times and Slavic language once. The first and last sentence also feature the adjective linguistic, and the plural languages.

Let's squint our eyes in search of language and its derivatives. We make explicit the alleged correspondences between the Korean and English sequences, and highlight in blue the English expressions to which we should pay special attention.

공통의 언어 가족을 공유하지 않습니다.

do not share a common language family.

언어학적 계승 측면에서 In terms of linguistic heredity

한국어는 한국어 계열에 속하고, Korean is part of the Koreanic language family 헝가리어는 우랄어 계열에 속하며, Hungarian belongs to the Uralic language family 슬로바키아어는 슬라브어로서 인도유럽어 계열에 속합니다 and Slovak is a Slavic language, part of the Indo-European family. 이 언어들을 역사적으로 연결하는 확립된 언어학적 관계는 없습니다

There is no established linguistic relationship that connects these languages historically.

First, we notice a set of syllable blocks (in red) that are very similar to each other, and could just as well mean family as relationship (belonging to, being part of). Let's now focus on the second sentence, as we keep in mind that 한국어, 헝가리어, 슬로바키아어는 must translate Korean, Hungarian and Slovak.

한국어는 (Korean)한국어 (Koreanic language) 계열에 속하고, Korean is part of the Koreanic language family 헝가리어는 (Hungarian) 우랄어 (Uralic language) 계열에 속하며, Hungarian belongs to the Uralic language family 슬로바키아어는 (Slovak) 슬라브어 (Slavic language)로서 인도유럽어 계열에 속합니다 and Slovak is a Slavic language, part of the Indo-European family.

We can easily spot (in blue) the Korean for each language's name and the associated adjective.어 seems to be the common syllable block, which surely translates language. 로서 인도유럽어 features the same last character and quite pretty well match Indo-European (language).

This leaves 계열에 as a potential equivalent to family (of language). Let's go back to the piece of interest.

공통의 언어 가족을 공유하지 않습니다.

do not share a common language family. Highlighted in red, we have어 and a syllable block containing어. Linguistic? 공통의 언어 가족을 공유하지 않습니다.

do not share a common language family.

Here the set in red resembles 계열에, so presumably, family. We've now almost certainly solved language family. We are also looking for equivalents of linguistic relationship and connects in 이 언어들을 역사적으로 연결하는 확립된 언어학적 관계는 없습니다

There is no established linguistic relationship that connects these languages historically.

We can find 언어, linguistic. 관계는 evocates 계열에 and could be connects. Relationship is not obvious, and we're going to stop playing the guessing game. But we have come a long way in Korean, without electronic translation or dictionary, simply by observation and multi-lingual comparison (here with English).

We didn't do too bad.

언어학적 계승 측면에서 한국어, 헝가리어, 슬로바키아어는 공통의 언어 가족을 공유하지 않습니다. 한국어는 한국어 계열에 속하고, 헝가리어는 우랄어 계열에 속하며, 슬로바키아어는 슬라브어로서 인도유럽어 계열에 속합니다. 이 언어들을 역사적으로 연결하는 확립된 언어학적 관계는 없습니다.

공통의 언어 가족 translates to "common language family." 연결 translates to "connect" or "connection." 언어학적 관계 translates to "linguistic relationship."

언어들을 is in fact the plural languages, and 계열에 means in the family (or series) where the last syllable block,에, means in. 관계 was not connects, but relationship - which is not far off.