For GW project

4th December 2019

Report for Gamilaraay/Yuwaalaraay lexical data management

DJN 4th Dec 2019

A Libre Office Base database now contains all the lexical data. It has 12 tables, representing all the data extracted from Gayarragi Winangali. In addition, much of the data has been checked, corrected, and processed (normalised). The data is now in a standard, "relational" form that can be implemented in any SQL or other relational database system, or lightly processed and exported to a format such as LIFT for app development.

The database can be downloaded here: gy-1.odb [Open Office Base, 400KB]

The Libre Office database "Base" is being used as a transition and testbed for transition to full server-hosted version, also allowing portability and free installation on either Windows or Mac computers. Note that by itself, the database does not display a dictionary; it is just a repository for storing formally-correct structural data. For download of Libre Office, go to https://www.libreoffice.org/download/download/

Tables:

lemma
audio
entry-sense
sense
sensenote
pos
pos-n
language
language-n
subentry
finderlist
person

Of particular note is the method used to encode special data types within the entries (senses and sense notes). These use an adapted form of "markdown", which provides simple, human-readable versions of linked and tagged data. These are designed for easy parsing and formatting, eg using PHP or JS and CSS, while remaining or "falling back" to human readable content if not processed. The conventions I have designed here are:

language content, that can be rendered direct, or formatted suitably, in plain angle brackets (<...>)
linked special content, such as language content, can be followed by a link with syntax </...>). These consist of a closed set of initial labels, and a target value. The labels are:
- /lid - link to lexical id
- /rec - link to named recorder
- /spk - link to named speaker
- /tax - Latin (taxonomic) term

Note: the database does not contain these resources, which are stored separately:

audio : 30 songs, 14 stories, 2677 words, approx 900 sentences (best versions to be confirmed)
text: other texts as separate text files: 14 story-texts, 30 song texts, 958 sentences
images : 185 images, mostly interface elements but also including thumbnail images of speakers

23rd December 2017

Second draft of JSON export from GW

Download: gydic2.json approx 1 MB

I added an "audio" attribute whose value should be the filename for the lexical item. Note that there is not a perfect correspondence between John's set (here: https://www.dropbox.com/sh/u7rtl0mlyiv80fn/AAAOTPrad7Kgu7cX3cN4wi2Ka/WordsS%2BM?dl=0) and the names mapped by Gayarragi, Winangali and previously adapted for Ma! I imagine that some changes were made to the lexical stock for Ma! However, I have made as close a likely correspondence set as I can. Someone should check the matches between written and audio forms later. I can supply a spreadsheet of the correspondences that have been used to generate the JSON.
I took away the long-form language name attribute "lgs2" as it's probably not needed.
I omitted to mention in previous notes (22nd Dec 2017) that I have also supplied attributes for sense numbers ("snum") where there is more than one sense for an entry.

22nd December 2017

First draft of JSON export from GW

Download: gydic.json approx 1 MB

Following discussion with Ben and my work with the existing dictionary data and lexicographic principles, the following suggested amendments/additions have been made to the JSON schema:

Subentries. The ovided schema had no provision for subentries, whether embedded in their parent entries or otherwise distinguished. I added an attribute "issub" at the headword level which indicates whether an entry is a subentry or not (these are in serial order, ie a subentry will logically (but not syntactically) be a child of a preceding head entry).
Part of speech. The supplied schema has a "ps" attribute (which I presume is part of speech) - but at the sense level, which is an unusual lexicographic practice. I have placed a "ps" attribute at the headword level.
Languages. As discussed, John wants the relevant language(s) indicated. Language information can apply at the headword or the sense level, so I have used our data to supply a "lgs" attribute at the headword level and a "slgs" attribute at the sense level. Also I added the other data we have at the headword level with full names of the languages "lgs2" - I can delete this by regenerating with different rules or using regex if it's a problem.
Status (eg 'new'). John wants the capability to indicate if words are new (and perhaps other related info, such as links to discussion), so I added an attribute at the headword level "newstatus" and also at the sense level "snewstatus" (because existing words can be given new senses).
Sounds - as I think we discussed in Cairns, in Gayarragi, Winangali the pronounced words are represented as a group of word sounds on a fixed time grid within single audio files, where individual words are played at appropriate calculated offsets. I think John said that he has individual sound files but I don't have them, so I am not sure of their filenames. John may have said that the sound files were simply named as the "id" attribute value plus the appropriate extension for the audio format. So this might need a little further investigation or organising. If you can't generate the audio filenames/links, then John could let me know the exact forms of the audio filenames and I can add them to the JSON.
As discussed, the senses have only "def", not "ge".
There are a small number (about 30) of cross references (coded as a kind of HTML link) which I have left in place - these could be deleted via simple regex if they are a problem (or let me know, I can do it).
One thing I couldn't understand is why senses (and sentence examples) are represented as objects if there is one instance only, but as arrays if there are multiple instances. This made generation a little more complex to program and I am not convinced that it is good "data modeling" (they should be consistent, e.g. arrays of one or more objects). Nevertheless, I have supplied the data with those structures.
I checked the JSON file for validity and it was valid according to two different validators, so I am confident that it is well formed.
If you need any changes to the JSON, just let me know, as I can easily make modifications to the generating program now that the main logic is written.

30 June 2014

Third Mac version of Gayarragi, winangali!

Download: GW.dmg approx 300 MB

Note: this is the application only - if you want the links to resources and help to work, place the new app (once taken out of the dowloaded volume etc) in the previous folder together with 'resources' and 'gwhelp'

A revised version of GW.app with these fixes:

the 'little rogue blocks' that appeared were artifacts of semi-incompatibility of Flash-based radio button objects on some Macs. To address this, I built new native-Director composite programmed objects and replaced all the Flash buttons and checkbox sets. Multiple improvements, as now they are also better graphically, and graphically scalable, as well as being clickable over the labels as well as buttons themselves.
as part of the above, also changed the behaviour of "Play to end" in songs/stories - now play to end is the default (rather than just play this page as before)
incorrect menu highlight on viewing sentences is now fixed
some character entities in the main dictionary entry body panel did not display correctly in Mac (these were the flags and quotes for example sentences). I have replaced these using 'example: ' as flag and single quotes around the sentence gloss.

Other minor notes (to add to previous changelog):

sound for bunduun 'sacred kingfisher' - end slightly cut off
previously noted graphical screens that we will probably need to ask Christine to update - also include screen with text 'about this CD' - should be replaced with 'about this app'
(minor) Baa buluuy verse 2 mssing space 2nd line

17 June 2014

Second Mac version of Gayarragi, winangali!

More details soon to come about many changes, corrections, and fixes in this version.

This version: draft of 17 June 2014
Download:GW-mac2.dmg approx 300 MB
Notes:

To run, download DMG file, mount it (double click), open the device/folder, then copy the 3 items (one app file and two folders) to your desktop, or to a single new folder on your desktop
File with more details about changes, corrections, and fixes to come - 20 June: click here (MS Word file GWchangelog-20140619.doc)
Known issues: Intro and credit screens need updating (but are images originally from Christine); Help needs some updating; at some times when running the app, a row of little dark blocks appears at the top left side of the header (doesn't affect running, but I need more time to investigate it).

8 Dec 2013

1. Tab delimited file for subentries
John requested this. This file has all subentries, extracted and formatted as if they are normal headword entries.

Download: DictionaryTarget-Subentries.txt 213KB - tab-delimited file

2. Correction for entry balan
John noted that the entry for balan does not display properly. This was due to an error in the original data (also in the original CD, now corrected). In the Data Upload Template, sheet Dictionary Target, please change Detailed Entry (HTML) cell for 13002 balan as follows:

CHANGE:
<div class='lemma'><span class='form-main'>balan</span> <span class='pos'>noun</span> </div><div class='defblock'><span class='gloss'>z</span><div class='sensenotes'>ero</div></div>

TO:
<div class='lemma'><span class='form-main'>balan</span> <span class='pos'>noun</span> </div><div class='defblock'><span class='gloss'>zero</span></div>

Details also are also here as a plain text file: Download: balan-correction.txt 1KB - plain text file

26 Oct 2013

Tab delimited file for full languages data

Download: Languages2.txt 119KB - tab-delimited file (columns: Dictionary target ID -- Target Word -- Languages (global for word) -- Source word (English: gloss string + languages))

25 Oct 2013

Tab delimited file for additional languages data

Download: Languages.txt 106KB - tab-delimited file (columns: Dictionary target ID -- Target Word -- Languages -- Source word (English: gloss string))

24 Oct 2013

Tab delimited file for Data Upload Template, worksheet Dictionary Source

Download: DictionarySource.txt 101KB - tab-delimited file (columns: Dictionary Source ID -- Dictionary target ID -- Source word (English) -- Part of Speech -- Target Word)

Tab delimited file for Data Upload Template, worksheet Dictionary Target

Download: DictionaryTarget.txt 1.05MB - tab-delimited file (columns: Dictionary Target ID -- Target Word -- Languages -- Audio URL)

23 Oct 2013

File of HTML strings correponding to display entries for each headword. Indexed by IDs. In this version, subentries are nested under their correponding main entries.

Download: GW-entries-HTML.txt 1MB - tab-delimited file (columns: Original_ID -- new_ID -- display entry HTML)

Simple CSS file for display of HTML display entries - can be adapted for other purposes.

Download: gydex.css 1KB css/text file

22 Oct 2013

Gloss-based list with IDs. I'm still not clear how glosses are to be handled but here is a first export for comment. Note: the suffix forms are preceded by an en-rule, not hyphen (just to make it easier to work with this data in Excel)

Download: GW-glosses.txt 104KB - tab-delimited file

21 Oct 2013

First simple export of GW data for Ma!

It's not clear to me what data we'll settle on, so I thought to start by an initial simple export and get response/request from there for planning next steps.

Download: lexData.txt 124KB - tab-delimited text file (columns: Original_ID -- new_ID -- part of speech -- word form (with conjugation marker) -- gloss(es))

Here is a persistent file with the mapping of all lexical IDs from the current GW app to new IDs that should be suitable for Ma!

Download: maIDmapping.xls 1200KB - MS Excel file

HTML display versions of all entries have been generated (and can be supplied in modified form) - see this intermediate web version

29 April 2013

First Mac version of Gayarragi, winangali!

There is still much work to do, but the major hurdles overcome.

This version: draft of 28 April 2013
Download: GW-Mac-001.dmg approx 600 MB
This version has:

New custom modules for regex to run on Mac Director 11.5
Sentences now show speaker thumbnails with rollover information
Changes to dictionary Topics menu and search options popup menu
Story notes now seamlessly available from Story menu page

30 July 2012

Final GW user survey results. The survey is now closed. We got a total of 52 responses. Please view the documents below before our discussion about how to proceed.

all_summary.pdf Summary of all responses (except graphs don't display)
all_graphs.pdf All the graphs
all_details.pdf Full details of responses including comments/feedback

teacher_summary.pdf Summary of teacher responses
all_teachercomments.pdf Relevant details of teacher responses

learner_summary.pdf Summary of learner responses
all_learner-comments.pdf Relevant details of learner responses

parent_summary.pdf Summary of parent responses

GW-survey-questions.pdf For records only - original blank survey
GW-survey-notes.txt Summary notes

12 July 2012

GW-survey-results-3.doc Early GW user survey results

30 May 2012

Draft survey, to get evaluations and feedback in preparation for Mac and 2nd edition

11 MARCH 09

Update files only, for new stories
Download: newfiles.zip approx 30 MB

Notes: Please just write these files over the existing versions, and leave other files as they are. Note that they are not all in the same folder, eg main.cxt is in the top level, and the stories folder goes inside the mvcst folder (just look for matches on filenames). Let me know if you have any problems.

3 MARCH 09

Latest version
Download: movie.zip approx 162 MB

13 NOVEMBER 2008

Latest version
Download: GY20081113.zip approx 160 MB

Please see the Googledoc for details

25 AUGUST 08

Latest version
Download: gy.zip approx 160 MB
This version has:

All sentences implemented and corrected
Most of new design material implemented
Misc

1 July 08

Latest version
Download: gy.zip approx 100 MB
This version has:

Topics section fully implemented (top level navigation needs better look)
New menu system
New background for search screen (not all controls yet migrated)
New opening screen

16 May 2007

Mac version, almost same as 27 April (with minor changes only)
Download: GY2.DMG approx 200 MB

27 April 2007

Updated version - to keep download smaller, please re-use previous versions of the following:
startGY.exe, xtras folder, songs folder, word_sounds.cxt, sentence_sounds.cxt
Download: PROD14_Win.zip approx 8 MB
This version has:

Browse GY-English section - some scrolling and queuing issues fixed
English - GY index/finderlist completed
1 new story implemented (Burraalgaa)
Change to search options panel
Other minor improvements and bugfixes
Sorry I didn't get the Mac version completed by this date, but it will definitely be available next week
Please keep your previous version in case of any problems with this download.

10 March 2007

Updated version - please re-use previous versions of songs, stories and xtras folders, and startGY.exe.
Download: PROD13_Win.zip approx 60 MB
This version has:

Browse GY-English section working
All extra sounds converted and installed with system for playing exceptions to 10's
Some bugs fixed in scrollbars: scroll persistent for GY-E main list
Authored also for Macintosh - Mac version to follow
Note also that other fields such as cross references are yet to be included. Please keep your previous version in case of any problems with this download.

28 December 2006

Updated version - please re-use previous versions of songs, stories and xtras folders, and word_sounds.cxt, and startGY.exe.
Download: PROD12.zip approx 10 MB
This version has:

Sentence display and navigation developed further
Sentence numbers shown (sequential, rather than original sentence number)
Sentence sounds play, via icons
Forward and back buttons usability improved
Notes: some earlier feedback and ideas about main dictionary panel not implemented yet. Note also that other fields such as cross references are yet to be included. Please keep your previous version in case of any problems with this download.

17 November 2006

Updated version - re-use previous versions of songs, stories and xtras folders, and word_sounds.cxt, and startGY.exe.
Download: PROD11.zip approx 2.1 MB
This version has:

Word sounds play on click speaker icon
Several bugs in word history fixed
Sentence navigation and display system now working in first draft version
Notes: some earlier feedback and ideas about main dictionary panel not implemented yet. Note also that other fields such as cross references are yet to be included. Please keep your previous version in case of any problems with this download.

17 Oct 2006

Updated version - re-use previous versions of songs and stories folders, and word_sounds.cxt, and startGY.exe. This download has a new xtras folder.
Download: PROD10.zip approx 4 MB
This version has:

Crossword game, 2 levels
Sentences partly implemented but not yet displayed
Note: Also KEEP your previous version in case you find instabilities in this version!
Note: some earlier feedback and ideas about main dictionary panel not implemented yet. Note also that other fields such as cross references are yet to be included.

10 Oct 2006

Updated version - re-use previous versions of songs and stories folders, and word_sounds.cxt, and startGY.exe.
Download: PROD9.zip approx 2 MB
This version has:

Queue/history of displayed dictionary entries
Queue of pages visited (remembers last viewed page only of any song or story)
Search box accepts arrow keys, shows "..." if text overflows box
Note: some earlier feedback and ideas about main dictionary panel not implemented yet. Note also that other fields such as cross references are yet to be included.

28 September 2006

Updated version - re-use previous versions of songs and stories folders, and word_sounds.cxt, and startGY.exe. Hopefully it will all work!
Download: PROD8.zip approx 2 MB
This version has:

Dictionary display now includes all sentences
Dictionary display panel reorganised, larger font etc
Dictionary search options moved into new pop-up tabbed panel. This panel disappears once it loses focus, unless it is positioned outside the main window.
Note: some earlier feedback and ideas about main dictionary panel not implemented yet - I wanted to wait until we had some maximal dictionary entries. Note also that other fields such as cross references are yet to be included.

19 August 2006

Updated version - re-use previous versions of songs and stories folders, and word_sounds.cxt, and startGY.exe. Hopefully it will all work!
Download: PROD7.zip approx 2 MB
This version has:

click on entry form in search hits now displays full entry
... headword, part of speech, languages
... senses, with alpha numbers, and languages (where different to entry-global languages)
... sense comments
... language materials in comments have popup note
... language materials in comments can be clicked to launch search - this function needs more refining, comments and suggestions welcome
colour, font etc of full entry components is modifiable and there could eventually be user settings or a set of "looks" to choose from
other content, such as sentences, entry-global comments, cross references, not yet implemented

5 June 2006

Updated version - all files except for folders "txts" and StartGY.exe (re-use previous versions of these and copy into correct locations).
Download: PROD6.zip approx 65 MB
This version has:

improved ranking of search hits
- better recognition of whole-expression strings
- recognises homophones properly
- weighted to distinguish by POS (ie suffixes from lexical items)
- tunable to balance effect of length, POS
system to search words by each language (GR, YR, YY) - controls and indexes
- note: not yet completely correctly operational - still some bugs in algorithm
many more word sounds installed and playing
stories system now working (data for one story converted so far)
sentences data installed (not visible yet)

5 May 2006

Updated version - all files except for folders "txts" and "xtras".
Download: GY_5May06.zip approx 3.8 MB
This version has:

Searched-for string is now highlighted in list of hits
Better management of large numbers of hits
Hitbox overflowed where multiple entries wrapped - I designed and programmed an automatically adaptive scrollbar system
Other misc bugs fixed and search sped up
Clicking on GY entry in hit-box plays the sound - only first 30 words have sound available at this stage (perhaps this was also on prev version)
Shift-click on morphemes in song player takes you to dictionary search for that form

To install this update, unzip all the files, then place whole of previously-sent directories "txts" and "xtras" in their relevant locations (ie same as in previous versions). let me know if any problems.

25 April 2006

Updated version - replacement files only.
Download: PROD3.zip approx 1.7 MB
This version has:

added "starts with" to search types
search returns ranked hits
- English search ranked:
  whole gloss unit > whole word > whole parenthetic > word with apostrophe or hyphen > anything else THEN by shorter > longer
- GY search ranked:
  whole lexical unit > anything else THEN by shorter >
bug with radio buttons fixed (second click on same button cancels)
some indexing and speed optimisations

Notes:

This update has only the changed files. Unzip the package then replace the files in your existing installation with the updated files. Check that files belong in two locations. Let me know if the program does not work properly.
The search ranking is working reasonably well, but can be improved - suggestions welcome.
Next step will be to add sentences, and to create full-entry displays (that will be displayed after clicking on hits)

9 April 2006

Updated version of GY CD with dictionary search implemented.
Download: PROD3.zip approx 40 MB
See previous version info for unpack/install instructions. This version has:

core dictionary data installed and indexed
system for searching dictionary data, with various options:
- search in GY or English
- search in 3 modes: normal (forms and glosses), extended (includes latin, other strongly related data such as cross reference data fields, language content in comments), and deep (will search in sentences and other data: not yet implemented)
- search for whole words only or for string in any part of word
- search returns formatted summary form of entries as list of hits in sets of 10 (configurable)
google-like navigation in search hits
songs morpheme line connected to dictionary search

Related but not yet implemented:

you cannot yet click on search hits to get full entries or further info
sentence data not yet included
other dictionary fields, eg comments, not fully implemented
hits are presented in default order, not ranked

Previous versions:

24 Jan 2006:

Updated version of GY CD with song player.
Download: GY_songs_draft2.zip 40 MB
New song player engine. New song menu etc. Other improvements. All requested changes made.
Morphemes link to dictionary index - temporary popup shows basic data.
Some errors remain - missing data, incorrect trs files etc.
Comments cannot be synchronised with pages as there is no supporting data.
Please inform me re any other errors etc.

Previous version:

Download: GY_songs_draft1.zip 41 MB
Very first draft of GY CD with song player

Notes/please check the following:

To install, unzip all the files into a new, empty folder - they should organise themselves correctly
To run, double click on StartGYSongs.exe
Let me know immediately if it does not work for you
The Sounds area is the only development that you can see at this stage
Click on a song name from the song list to go to player. Use menu to go back to song list (later, "Back" will do this too)
Check the organisation of the songs into verses/screens etc
Check all the metadata (titles, other info) carefully for content (and layout). Record all concerns. Some are too long or should be split up differently (and cause overlap with song text)
I know there are at least some errors with sound/text allocation/alignment - please check rigorously and record all you find. Eg in song 14 there are misalignments.
Check for any typos etc in songs
Check carefully the line wrapping - this is done automatically and is the most complex bit. I haven't found errors yet but ...
Font sizes and colours (and line length and spaces between text groups) can all be easily changed. I chose the current sizes to suit the chunking I made for the sense of the songs, so that they would not run off the screen.
Also screen layout is not difficult to change, eg lines can be shorter (but will wrap more), free gloss could be moved independently of interlinear stuff etc
Slight (time) gap in going from one screen to next will depend on your computer - but I will be optimising the programming later, once all the functionalities have settled
Morpheme gloss line is "live" (ie highlighted on mouse over, also program knows the dictionary id number of the word you are over) but no related function yet till the dictionary is intalled
There may be programming errors - would show up as "Script Error" boxes, but you can probably continue. If this occurs, please note which page, song you are on and if possible, what you were doing at the time. Note that the live text function (see prev point) is work in progress and causes some of these errors.

Email me with any other questions etc

David