Profile cover photo
Profile photo
Scott MacLeod (WUaS)
World University and School, like Wikipedia with best STEM-centric OCW
World University and School, like Wikipedia with best STEM-centric OCW
About
Posts

Post is pinned.
Apply to matriculate rolling admission for free CC OCW BA/BS degrees @WorldUnivAndSch ~http://worlduniversityandschool.org with ~http://ocw.mit.edu ~http://oyc.yale.edu >https://wiki.worlduniversityandschool.org/wiki/Nation_States … Planned as universities in ~200 countries' official & wiki schools in ALL 8k LANGUAGES ~
Add a comment...

Post has attachment
Add a comment...

Post has attachment
Please join us at our open (hour-long) World Univ & Sch monthly business meeting - https://worlduniversityandschool.blogspot.com/2018/07/agenda-news-for-world-univ-schs-open.html - this Saturday, July 14, 2018, at 9 am Pacific Time, electronically - in a Google group video Hangout, if interested people email WUaS about this info@worlduniversityandschool.org -

https://twitter.com/WorldUnivAndSch/status/1017472250025369600
Add a comment...

Post has attachment
Add a comment...

World University and School News and Q & A Live Hangout on Air https://t.co/333k6Nsl0P 
Mon, July 9, 2018, 10 am PT, 5pm UTC
Topic: Your Questions, Ideas & Thoughts about WUaS re @WorldUnivAndSch & @WUaSPress
The Hangout is hosted by Scott MacLeod
https://t.co/N22Ucd0wg2 ~
Twitter https://twitter.com/WorldUnivAndSch/status/1015729983124221952 ...
Add a comment...

Post has attachment
Add a comment...

Post has shared content
World University and School News and Q & A Live Hangout on Air https://www.youtube.com/WorldUnivandSch
Mon, Jun 25, 2018, 10 am PT, 5pm UTC
Topic: Your Questions, Ideas & Thoughts about WUaS re @WorldUnivAndSch & @WUaSPress
The Hangout is hosted by Scott MacLeod
~ http://worlduniversityandschool.org ~
- https://twitter.com/WorldUnivAndSch/status/1010841106253168640 -
World University and School News and Q & A Live Hangout on Air https://www.youtube.com/WorldUnivandSch
Mon, Jun 25, 2018, 10 am PT, 5pm UTC
Topic: Your Questions, Ideas & Thoughts about WUaS re @WorldUnivAndSch & @WUaSPress
The Hangout is hosted by Scott MacLeod
~ http://worlduniversityandschool.org ~
- https://twitter.com/WorldUnivAndSch/status/1010841106253168640 -
World University and School -
World University and School -
worlduniversityandschool.org
Add a comment...

Post has attachment
World University and School News and Q & A Live Hangout on Air https://www.youtube.com/WorldUnivandSch
Mon, Jun 25, 2018, 10 am PT, 5pm UTC
Topic: Your Questions, Ideas & Thoughts about WUaS re @WorldUnivAndSch & @WUaSPress
The Hangout is hosted by Scott MacLeod
~ http://worlduniversityandschool.org ~
- https://twitter.com/WorldUnivAndSch/status/1010841106253168640 -
World University and School -
World University and School -
worlduniversityandschool.org
Add a comment...

Post has shared content
This is a great family tree of alphabets — and isn't very conjectural at all, since we actually know how writing spread. The color codes are by kinds of writing.

The oldest forms of writing are true pictograms, not shown here; these are scripts like the earliest forms of Egyptian, Sumerian, and Chinese writing, which are basically pictures (slightly stylized) of physical objects. These aren't full writing systems, in that they can generally only code things like "three sheep, four barrels of wine..."

These quickly evolved into logograms, a few of which are shown here in blue — not only the bulk of ancient Egyptian, but also modern Chinese and part of modern Japanese writing as well. In logograms, a small group of symbols represents a word, not phonetically but conceptually. (This is why the different Chinese languages, which sound almost nothing alike, can nonetheless share a single writing system! The writing codes ideas, not sounds.)

A common extension of logograms is to add sound representations, typically starting with using words to represent homonyms, and then adding logographic marks to indicate "the word symbolized by <X> which sounds like <Y>" to clarify synonyms, and so on. Nearly all logographic writing systems adopted this.

Ancient Egyptian did in particular, and an entire subbranch of its writing system started to adopt this more seriously, starting to use purely phonetic representations — that is, symbols that described sounds instead of concepts. This is one of the earliest forms of alphabetic writing.

This kind of "phonetic writing" then has a history which you can see here.

Abjads are scripts like Hebrew and Arabic, where each letter represents a syllable, but only uniquely describes the consonants; you're supposed to know the vowels from context. These work well in languages where the vowels vary following predictable rules and primarily indicate parts of speech, and so are still used in such languages to this day. (The name "abjad" comes from the first four letters of the old Arabic alphabet, a, b, j, and d.)

Abugidas (green) and alphabets (red) take this further, adding accent marks (in abugidas) or separate letter-signs (alphabets) for the vowels, as well. As Barry Powell argued in Homer and the Origins of the Greek Alphabet, this likely emerged as a pattern whenever abjads reached areas where the local language didn't have the same kinds of rules for vowels as Semitic languages, and the ability to explicitly code vowels was important for telling words apart — and, critically, for recording poetry and verse.

Finally, featural alphabets take the march towards phonetic clarity even further. The classic example of this is Hangul, the script invented for Korean in the 15th century. In these writing systems, symbols go beyond coding for sounds — they code for individual features, like "plosive sounds" (you stop the air and then suddenly release it, like t or p), "aspirated sounds" (with a breath), and so on. So for example, ㅌ can be immediately recognized as a voiceless, aspirated, alveolar plosive, or tʰ.

English is in many ways a strange case in this family. The Latin alphabet that it uses is a true alphabet: someone reading Latin immediately knows how to pronounce any word they see, just like someone reading Spanish or Polish would. But English both assembled its lexicon from a bunch of languages, and standardized its spelling system much earlier than most other modern languages — and unfortunately, did so not too long before a major change in how words were pronounced, which gives it all sorts of oddities like "silent e," how tough it is to cough through a rough slough, and so on. (There are not many languages where "being able to spell things correctly" is a televised sport!) In fact, despite its use of an alphabet, English is in many ways moving back towards being a logographic language, where you have to know what a word is (and which language it comes from) to know how to pronounce it.

Via +John Hardy
Photo
Add a comment...

Post has shared content
Hi Olya, Wikibabel, GerardM, and Wikidatans,

How does Wikidata's new lexicographical project work with regard to Swahili (since it is a Wikipedia / Wikidata language) and Google Translate / GNMT re your "Our approach leverages Google Translate to make English Wikipedia articles accessible to underserved communities" (re: https://medium.com/@oirzak/wikibabel-equalizing-information-access-on-a-budget-4038f750e90e)?

Will a Wikibabel team you help create add Swahili lexemes to the lexicographical project - https://www.wikidata.org/wiki/Wikidata:Lexicographical_data - and then Google GNMT - which is end-to-end translation software ... https://1.bp.blogspot.com/-jwgtcgkgG2o/WDSBrwu9jeI/AAAAAAAABbM/2Eobq-N9_nYeAdeH-sB_NZGbhyoSWgReACLcB/s1600/image01.gif (https://ai.googleblog.com/2016/11/zero-shot-translation-with-googles.html) - use this new Swahili lexicographical data by processing this through its algorithms?

(WUaS seeks to facilitate machine translation in all 7097 living languages, and by growing out of Google GNMT; WUaS donated itself for co-development to Wikidata in 2015).

Cheers,
Scott



Hi Olya, Lucie, and Wikidatans,

Very interesting projects. And thanks for publishing, Lucie - very helpful!

With regard to Swahili, Arabic (both African languages!) and Esperanto, and leveraging Google Translate / GNMT, I've been looking at this Google GNMT gif image - https://1.bp.blogspot.com/-jwgtcgkgG2o/WDSBrwu9jeI/AAAAAAAABbM/2Eobq-N9_nYeAdeH-sB_NZGbhyoSWgReACLcB/s1600/image01.gif - and wondering how the triplets of the Linked Open Data of Wikidata structured Knowledge Base (KB) would stream through this in multiple smaller languages?

I couldn't deduce from this paper - https://arxiv.org/pdf/1803.07116.pdf - here, for example ...

2.1 Encoding the Triples The encoder part of the model is a feed-forward architecture that encodes the set of input triples into a fixed dimensionality vector, which is subsequently used to initialise the decoder. Given a set of un-ordered triples FE = {f1, f2, . . . , fR : fj = (sj , pj , oj )}, where sj , pj and oj are the onehot vector representations of the respective subject, property and object of the j-th triple, we compute an embedding hfj for the j-th triple by forward propagating as follows: hfj = q(Wh[Winsj ;Winpj ;Winoj ]) , (1) hFE = WF[hf1 ; . . . ; hfR−1 ; hfR ] , (2) where hfj is the embedding vector of each triple fj , hFE is a fixed-length vector representation for all the input triples FE. q is a non-linear activation function, [. . . ; . . .] represents vector concatenation. Win,Wh,WF are trainable weight matrices. Unlike (Chisholm et al., 2017), our encoder is agnostic with respect to the order of input triples. As a result, the order of a particular triple fj in the triples set does not change its significance towards the computation of the vector representation of the whole triples set, hFE .

... whether this would address streaming triplets through GNMT?

Would this? And since Swahili, Arabic and Esperanto, are all active languages in - https://translate.google.com/ - no further coding on the GNMT side would be necessary. (I'm curious how best for WUaS to grow small languages not yet in either Wikipedia/Wikidata's 287-301 languages or in GNMT's ~100+ languages?).

How could your Wikidata / Wikibabel work interface with Google GNMT more fully with time, building on your great Wikidata coding/papers?

Cheers,
Scott

https://en.wikipedia.org/wiki/User:Scott_WUaS

*
Posted these here too - https://scott-macleod.blogspot.com/2018/06/indian-stone-curlew-world-univ-and-sch.html -


Hi Olya, Wikibabel, GerardM, and Wikidatans,

How does Wikidata's new lexicographical project work with regard to Swahili (since it is a Wikipedia / Wikidata language) and Google Translate / GNMT re your "Our approach leverages Google Translate to make English Wikipedia articles accessible to underserved communities" (re: https://medium.com/@oirzak/wikibabel-equalizing-information-access-on-a-budget-4038f750e90e)?

Will a Wikibabel team you help create add Swahili lexemes to the lexicographical project - https://www.wikidata.org/wiki/Wikidata:Lexicographical_data - and then Google GNMT - which is end-to-end translation software ... https://1.bp.blogspot.com/-jwgtcgkgG2o/WDSBrwu9jeI/AAAAAAAABbM/2Eobq-N9_nYeAdeH-sB_NZGbhyoSWgReACLcB/s1600/image01.gif (https://ai.googleblog.com/2016/11/zero-shot-translation-with-googles.html) - use this new Swahili lexicographical data by processing this through its algorithms?

(WUaS seeks to facilitate machine translation in all 7097 living languages, and by growing out of Google GNMT; WUaS donated itself for co-development to Wikidata in 2015).

Cheers,
Scott



Hi Olya, Lucie, and Wikidatans,

Very interesting projects. And thanks for publishing, Lucie - very helpful!

With regard to Swahili, Arabic (both African languages!) and Esperanto, and leveraging Google Translate / GNMT, I've been looking at this Google GNMT gif image - https://1.bp.blogspot.com/-jwgtcgkgG2o/WDSBrwu9jeI/AAAAAAAABbM/2Eobq-N9_nYeAdeH-sB_NZGbhyoSWgReACLcB/s1600/image01.gif - and wondering how the triplets of the Linked Open Data of Wikidata structured Knowledge Base (KB) would stream through this in multiple smaller languages?

I couldn't deduce from this paper - https://arxiv.org/pdf/1803.07116.pdf - here, for example ...

2.1 Encoding the Triples The encoder part of the model is a feed-forward architecture that encodes the set of input triples into a fixed dimensionality vector, which is subsequently used to initialise the decoder. Given a set of un-ordered triples FE = {f1, f2, . . . , fR : fj = (sj , pj , oj )}, where sj , pj and oj are the onehot vector representations of the respective subject, property and object of the j-th triple, we compute an embedding hfj for the j-th triple by forward propagating as follows: hfj = q(Wh[Winsj ;Winpj ;Winoj ]) , (1) hFE = WF[hf1 ; . . . ; hfR−1 ; hfR ] , (2) where hfj is the embedding vector of each triple fj , hFE is a fixed-length vector representation for all the input triples FE. q is a non-linear activation function, [. . . ; . . .] represents vector concatenation. Win,Wh,WF are trainable weight matrices. Unlike (Chisholm et al., 2017), our encoder is agnostic with respect to the order of input triples. As a result, the order of a particular triple fj in the triples set does not change its significance towards the computation of the vector representation of the whole triples set, hFE .

... whether this would address streaming triplets through GNMT?

Would this? And since Swahili, Arabic and Esperanto, are all active languages in - https://translate.google.com/ - no further coding on the GNMT side would be necessary. (I'm curious how best for WUaS to grow small languages not yet in either Wikipedia/Wikidata's 287-301 languages or in GNMT's ~100+ languages?).

How could your Wikidata / Wikibabel work interface with Google GNMT more fully with time, building on your great Wikidata coding/papers?

Cheers,
Scott

https://en.wikipedia.org/wiki/User:Scott_WUaS


Add a comment...
Wait while more posts are being loaded