Course:CPSC312-2017-Blissymbols-Interface

From UBC Wiki
Jump to: navigation, search

Concept

Natural language processing traditionally focusses on written language but not visual language.

sample Blissymbols

There exists a visual language called Blissymbols.[1] In the 1960s its creator Charles Bliss set out to design an intuitive and universal language with lax grammar yet complete semantics. Each Blissymbol represents an atomic unit of meaning which can either stand on its own or be combined with other Blissymbols to produce any concept. Consequently, Blissymbols are both adaptable to any language and adept at expressing meaning, constituting a unique niche in natural language processing.

The Wizard of Oz partly in Blissymbols

As it stands, Blissymbols have no existing natural language interface. As a visual and universalizable language, Blissymbols could pose an effective intermediary for translating between languages, allowing all language to be represented conceptually through Blissymbols. Due to its formulaic and simple derivational structure, Blissymbols lends itself almost innately to language processing tools such as sentimental analysis, making it a useful modeling language to understand and decipher the subtext behind and beneath auditory languages.

Much of the above information on Blissymbols draws from the Radiolab podcast, which in 2012 released a show on Charles Bliss.[2]

Something Extra

We intend to prototype a natural language processor to categorize the visual language of Blissymbols in a similar way as written language. Specifically, we will construct a natural language interface for processing Blissymbols and use this interface to craft new Blissymbols from derivative concepts, as well as determine Blissymbol translations in various languages.

Since written and visual languages differ in grammar and semantics, their domains of representation will naturally differ, as will their implementations. Unlike written language, which is constituted from (unicode-friendly) characters, Blissymbols has no unicode representation. To solve our problem, we will use the unicode Blissymbols encoding proposed by Michael Everson[3]. Since Blissymbols in this unicode alphabet represent atomic concepts, and these atoms can merge to form new concepts, we believe Prolog suitable for a Blissymbols interface, where Prolog can represent each Blissymbols atom as well as all their possible combinations.

Conclusion

The level of recursion implicit to Prolog made the process of breaking down complex units such as sentences, complex words, and concept-denoting words, into their component parts straightforward and easy to implement. This, combined with the nature of Blissymbols as discrete atomic units with definite properties, creates a system of simple and effective analytic tools for decomposing natural language.

Because Blissymbols exist in either atomic or complex-atomic form, it is often difficult to parse the point at which a complex atom begins and another ends. For example, in the sentence "The dog eats the cat." there is debate as to whether or not "dog eats" could refer to the complex Blissymbol for Dog food, since the complex symbol for dog food contains a grouping of the symbols of "dog" and "eat". This poses a challenge to parsing long chains of adjectives as well as some complex actions in the current implementation.

Were we to further invest in this project, we would focus on the ability of the parser to use contextual analysis to more precisely define the boundaries of complex atoms in a given sentence. Princeton Wordnet provides Prolog files containing an extensive synonym sets (synsets) for the English language. These files connect the definitions of synonymous words in a fashion that Prolog can easily trace, and would make a fantastic addition to our parser. Using these Wordnet files we could derive atomic Blissymbols which could approximate unknown or non-atomic meanings by tracing through the connected synonyms of a word as well as the word gloss (i.e., its dictionary definition). This process would add robustness to the Natural language Parser, as well as add complexity to the process of Blissymbol generation, allowing it to create more accurate approximations of abstract words.

Code located here: https://www.dropbox.com/sh/5uwown9dkyp1rih/AACgBkSjn4Jr5fVyBusYNr9Za?dl=0

  1. Maintained by Blissymbolics International: http://www.blissymbolics.org
  2. Radiolab episode here: http://www.radiolab.org/story/257194-man-became-bliss/
  3. http://std.dkuug.dk/JTC1/SC2/WG2/docs/n1866.pdf