Joose Rajamäki 🇫🇮🇪🇺 @joose_rajamaeki Working as a data scientist | Doctor of Science (Tech.) from @AaltoUniversity | Voter of the Green League party @vihreat | Tweets mostly in Finnish and English Feb. 15, 2019 1 min read

It's interesting to observe the evolution of natural language processing when my own native language (Finnish) is such an adversarial case that it breaks every system. I haven't seen even a functioning spell checker. Here some reasons why this is the case. #NLP #AI

Thread 1/8

First, every word has dozens of conjugations. For example nouns are conjugated in case (15), number (2). So, in even basic situations each noun can appear in approximately thirty forms. This means that each word occurrence is extremely rare.

2/8

New words can be formed by compounding, just like in German.

For example:
tietokone = knowledge machine (literally) = computer
kämmentietokone = palm knowledge machine (literally) = tablet

This causes many word occurrences of other languages very rare.

3/8

Even non-compound words can be changed with many modifiers.

For example:
syödä = to eat
syömättäkinköhän = Does he/she mean even without eating, I wonder.

These tags can be added independently, which causes a combinatorial explosion making many everyday words ultra rare.

4/8

To make the combinatorial explosion even worse. Many tags can be permuted within the word.

The following mean roughly the same:
syömättäkinköhän
syömättäköhänkin
syömättähänkökin
syömättäkökinhän
etc.

5/8

Compounding some times changes the meaning from literal to figurative:

luotaantyöntävä = unappealing
luotaan työntävä = something that literally thrusts you away

For the latter Google Translate gives the nonsense translation "I trust the pushing".

6/8

These features of the Finnish language, and many more, make it very badly compatible with #NLP systems. In fact Finnish isn't very well compatible even with any IT system. You can observe that in how Finnish speakers have to adapt the language to using hashtags.

7/8

All in all, I don't see deep learning being the way to #AGI before I see even a well functioning Finnish language spell checker. Additionally, machine translation to and from Finnish is usually just garbage. (Which luckily protects us from some foreign disinformation.)

8/8

P.S. This might be of interest to @GaryMarcus.

P.P.S. Try translating these tweets to Finnish and back and see what's lost.


You can follow @joose_rajamaeki.



Bookmark

____
Tip: mention @threader_app on a Twitter thread with the keyword “compile” to get a link to it.

Enjoy Threader? Sign up.