Course:FNEL 382/Rapid Word Collection & WeSay

From UBC Wiki
Tools of the Week
FNELLogo.jpg
Week 4
FNEL is a program with a focus on revitalizing endangered languages in Canada and around the world. This page is one small part of that goal which examines two dictionary-making tools.

FNEL 382 is a course on lexicography, or 'dictionary-making', and how certain tools might be used to document and revitalize endangered languages. Each week, one student creates a presentation and a wiki on one or more tools.

The tools of the week this week (January 26th, 2017) were Rapid Word Collection and WeSay. The following sections provide information on the theoretical basis behind them, how they actually work, and how they are funded.

Dictionary Development Process

Working as a linguist for over 20 years, Ronald Moe developed the Dictionary Development Process (DDP) alongside SIL International [1].

Semantic Domains

Semantic Domains are the key theoretical basis for the DDP. Semantic Domains are groups of closely related words that are organized by lexical relation.

For example, 'Person' is one subdomain ('2') which is concerned with states and parts connected with the human form. Within this subdomain is 'Body' ('2.1'), which is concerned only with the human body parts. One step further is 'Head' ('2.1.1'), and so on.

Currently, there are 1,792 subdomains [2], with the deepest categories being five digits (e.g. 1.2.3.4.2). These are the 9 basic subdomains:

  1. Universe/Creation
  2. Person
  3. Language and Thought
  4. Social Behaviour
  5. Daily Life
  6. Work and Occupation
  7. Physical Actions
  8. States
  9. Grammar

SIL International

SIL International is a faith-based, nonprofit organization with the goal of building language development, confidence, and literacy in communities around the world. They do so through researching and translating languages, followed by building learning and teaching materials.

Rapid Word Collection and WeSay are two tools funded largely by SIL International, or it's subsidiaries and partners.

Rapid Word Collection

Rapid Word Collection (RWC) is the practice of using semantic domains to quickly recall similar words and create a word list (and eventually a dictionary). The programmers of RWC recommend doing so in a workshop in the language community lasting around 2 weeks, where they suggest that around 10,000 to 15,000 words can be collected. As many speakers of endangered languages are old and in less developed, rural communities, many of them are not technologically proficient and so RWC has physical forms which can be filled out with pens or pencils.

Logo of Rapid Word Collection. SIL International's connection is clearly visible.

Rapid Word Collection is usually practiced in communities with at least 25 community members and one linguist to train the word collectors and smooth out the process. The community members who speak the 'vernacular' language in question split up into groups of 5 or 6 people. Because the language being documented most likely does not have a standardized written form, the questionnaires being distributed to elicit words are written in another (usually national) language. Therefore, each group needs a speaker who can also read the other language. The questionnaire is split into each semantic domain having at least one, but sometimes several, questions to elicit words. The creators of the program stress to not simply translate the example words given, but to think in terms of the spoken language itself (i.e. an attempt at a more emergent dictionary).

After collecting words, another smaller group of 2 or 3 people give the words a short gloss or 1 or 2 word definition in the other language. This results in a bilingual wordlist, or dictionary.

The next stage has other community members, or if the area is particularly remote, a team of people elsewhere entering the data into a computer program such as WeSay or SIL Fieldworks (FLEx).

WeSay

WeSay is a created as a co-production of PALASO, SIL Papua New Guinea and SIL International. Payap Language Software Development Group (PALASO) is part of the Linguistics Institute at Payap University; which is itself a partnership between Payap University and SIL International.

WeSay Logo

WeSay is a fully functioning, open source, dictionary making software taking up a relatively small 53mb of space. It is designed to be easy to use and technologically simple to be used on as many computers as possible and in hot or humid environments. Additionally, the software contains a 'Configuration Tool' which can be used to choose which specific aspect of dictionary construction the user would like to focus on. Upon choosing, the Configuration Tool closes and the program runs as selected in the Configuration Tool. This process allows for focused progress in one section of the lexicographical process, instead of losing direction by trying to tackle too much of a project at once. The separation between the Configuration Tool and the program itself also makes it possible for speakers of an endangered language, who may be less technologically proficient, to be guided by a linguist or other community member to work only on the tasks required.

It can be used to input words collected during a RWC workshop, as it is organized with the same 1,792 semantic domains and with the same questions found on the RWC questionnaires.

A user of WeSay, David Rowbory, has created a series of videos on how to set up and use WeSay.

References