For GW project
4th December 2019
Report for Gamilaraay/Yuwaalaraay lexical data management
DJN 4th Dec 2019
A Libre Office Base database now contains all the lexical data. It has 12 tables, representing all the data extracted from Gayarragi Winangali. In addition, much of the data has been checked, corrected, and processed (normalised). The data is now in a standard, "relational" form that can be implemented in any SQL or other relational database system, or lightly processed and exported to a format such as LIFT for app development.
The database can be downloaded here: gy-1.odb [Open Office Base, 400KB]
The Libre Office database "Base" is being used as a transition and testbed for transition to full server-hosted version, also allowing portability and free installation on either Windows or Mac computers. Note that by itself, the database does not display a dictionary; it is just a repository for storing formally-correct structural data. For download of Libre Office, go to https://www.libreoffice.org/download/download/
Tables:
- lemma
- audio
- entry-sense
- sense
- sensenote
- pos
- pos-n
- language
- language-n
- subentry
- finderlist
- person
Of particular note is the method used to encode special data types within the entries (senses and sense notes). These use an adapted form of "markdown", which provides simple, human-readable versions of linked and tagged data. These are designed for easy parsing and formatting, eg using PHP or JS and CSS, while remaining or "falling back" to human readable content if not processed. The conventions I have designed here are:
- language content, that can be rendered direct, or formatted suitably, in plain angle brackets (<...>)
- linked special content, such as language content, can be followed by a link with syntax </...>). These consist of a closed set of initial labels, and a target value. The labels are:
- /lid - link to lexical id
- /rec - link to named recorder
- /spk - link to named speaker
- /tax - Latin (taxonomic) term
Note: the database does not contain these resources, which are stored separately:
- audio : 30 songs, 14 stories, 2677 words, approx 900 sentences (best versions to be confirmed)
- text: other texts as separate text files: 14 story-texts, 30 song texts, 958 sentences
- images : 185 images, mostly interface elements but also including thumbnail images of speakers
23rd December 2017
Second draft of JSON export from GW
Download: gydic2.json approx 1 MB
- I added an "audio" attribute whose value should be the filename for the lexical item. Note that there is not a perfect correspondence between John's set (here: https://www.dropbox.com/sh/u7rtl0mlyiv80fn/AAAOTPrad7Kgu7cX3cN4wi2Ka/WordsS%2BM?dl=0) and the names mapped by Gayarragi, Winangali and previously adapted for Ma! I imagine that some changes were made to the lexical stock for Ma! However, I have made as close a likely correspondence set as I can. Someone should check the matches between written and audio forms later. I can supply a spreadsheet of the correspondences that have been used to generate the JSON.
- I took away the long-form language name attribute "lgs2" as it's probably not needed.
- I omitted to mention in previous notes (22nd Dec 2017) that I have also supplied attributes for sense numbers ("snum") where there is more than one sense for an entry.
22nd December 2017
First draft of JSON export from GW
Download: gydic.json approx 1 MB
Following discussion with Ben and my work with the existing dictionary data and lexicographic principles, the following suggested amendments/additions have been made to the JSON schema:
- Subentries. The ovided schema had no provision for subentries, whether embedded in their parent entries or otherwise distinguished. I added an attribute "issub" at the headword level which indicates whether an entry is a subentry or not (these are in serial order, ie a subentry will logically (but not syntactically) be a child of a preceding head entry).
- Part of speech. The supplied schema has a "ps" attribute (which I presume is part of speech) - but at the sense level, which is an unusual lexicographic practice. I have placed a "ps" attribute at the headword level.
- Languages. As discussed, John wants the relevant language(s) indicated. Language information can apply at the headword or the sense level, so I have used our data to supply a "lgs" attribute at the headword level and a "slgs" attribute at the sense level. Also I added the other data we have at the headword level with full names of the languages "lgs2" - I can delete this by regenerating with different rules or using regex if it's a problem.
- Status (eg 'new'). John wants the capability to indicate if words are new (and perhaps other related info, such as links to discussion), so I added an attribute at the headword level "newstatus" and also at the sense level "snewstatus" (because existing words can be given new senses).
- Sounds - as I think we discussed in Cairns, in Gayarragi, Winangali the pronounced words are represented as a group of word sounds on a fixed time grid within single audio files, where individual words are played at appropriate calculated offsets. I think John said that he has individual sound files but I don't have them, so I am not sure of their filenames. John may have said that the sound files were simply named as the "id" attribute value plus the appropriate extension for the audio format. So this might need a little further investigation or organising. If you can't generate the audio filenames/links, then John could let me know the exact forms of the audio filenames and I can add them to the JSON.
- As discussed, the senses have only "def", not "ge".
- There are a small number (about 30) of cross references (coded as a kind of HTML link) which I have left in place - these could be deleted via simple regex if they are a problem (or let me know, I can do it).
- One thing I couldn't understand is why senses (and sentence examples) are represented as objects if there is one instance only, but as arrays if there are multiple instances. This made generation a little more complex to program and I am not convinced that it is good "data modeling" (they should be consistent, e.g. arrays of one or more objects). Nevertheless, I have supplied the data with those structures.
- I checked the JSON file for validity and it was valid according to two different validators, so I am confident that it is well formed.
- If you need any changes to the JSON, just let me know, as I can easily make modifications to the generating program now that the main logic is written.
30 June 2014
Third Mac version of Gayarragi, winangali!
Download: GW.dmg approx 300 MB
Note: this is the application only - if you want the links to resources and help to work, place the new app (once taken out of the dowloaded volume etc) in the previous folder together with 'resources' and 'gwhelp'
A revised version of GW.app with these fixes:
- the 'little rogue blocks' that appeared were artifacts of semi-incompatibility of Flash-based radio button objects on some Macs. To address this, I built new native-Director composite programmed objects and replaced all the Flash buttons and checkbox sets. Multiple improvements, as now they are also better graphically, and graphically scalable, as well as being clickable over the labels as well as buttons themselves.
- as part of the above, also changed the behaviour of "Play to end" in songs/stories - now play to end is the default (rather than just play this page as before)
- incorrect menu highlight on viewing sentences is now fixed
- some character entities in the main dictionary entry body panel did not display correctly in Mac (these were the flags and quotes for example sentences). I have replaced these using 'example: ' as flag and single quotes around the sentence gloss.
Other minor notes (to add to previous changelog):
- sound for bunduun 'sacred kingfisher' - end slightly cut off
- previously noted graphical screens that we will probably need to ask Christine to update - also include screen with text 'about this CD' - should be replaced with 'about this app'
- (minor) Baa buluuy verse 2 mssing space 2nd line
17 June 2014
Second Mac version of Gayarragi, winangali!
More details soon to come about many changes, corrections, and fixes in this version.
This version: draft of 17 June 2014
Download:GW-mac2.dmg approx 300 MB
Notes:
- To run, download DMG file, mount it (double click), open the device/folder, then copy the 3 items (one app file and two folders) to your desktop, or to a single new folder on your desktop
- File with more details about changes, corrections, and fixes to come - 20 June: click here (MS Word file GWchangelog-20140619.doc)
- Known issues: Intro and credit screens need updating (but are images originally from Christine); Help needs some updating; at some times when running the app, a row of little dark blocks appears at the top left side of the header (doesn't affect running, but I need more time to investigate it).
8 Dec 2013
1. Tab delimited file for subentries
John requested this. This file has all subentries, extracted and formatted as if they are normal headword entries.
Download: DictionaryTarget-Subentries.txt 213KB - tab-delimited file
2. Correction for entry balan
John noted that the entry for balan does not display properly. This was due to an error in the original data (also in the original CD, now corrected). In the Data Upload Template, sheet Dictionary Target, please change Detailed Entry (HTML) cell for 13002 balan as follows:
CHANGE:
<div class='lemma'><span class='form-main'>balan</span> <span class='pos'>noun</span> </div><div class='defblock'><span class='gloss'>z</span><div class='sensenotes'>ero</div></div>
TO:
<div class='lemma'><span class='form-main'>balan</span> <span class='pos'>noun</span> </div><div class='defblock'><span class='gloss'>zero</span></div>
Details also are also here as a plain text file: Download: balan-correction.txt 1KB - plain text file
26 Oct 2013
Tab delimited file for full languages data
Download: Languages2.txt 119KB - tab-delimited file (columns: Dictionary target ID -- Target Word -- Languages (global for word) -- Source word (English: gloss string + languages))
25 Oct 2013
Tab delimited file for additional languages data
Download: Languages.txt 106KB - tab-delimited file (columns: Dictionary target ID -- Target Word -- Languages -- Source word (English: gloss string))
24 Oct 2013
Tab delimited file for Data Upload Template, worksheet Dictionary Source
Download: DictionarySource.txt 101KB - tab-delimited file (columns: Dictionary Source ID -- Dictionary target ID -- Source word (English) -- Part of Speech -- Target Word)
Tab delimited file for Data Upload Template, worksheet Dictionary Target
Download: DictionaryTarget.txt 1.05MB - tab-delimited file (columns: Dictionary Target ID -- Target Word -- Languages -- Audio URL)
23 Oct 2013
File of HTML strings correponding to display entries for each headword. Indexed by IDs. In this version, subentries are nested under their correponding main entries.
Download: GW-entries-HTML.txt 1MB - tab-delimited file (columns: Original_ID -- new_ID -- display entry HTML)
Simple CSS file for display of HTML display entries - can be adapted for other purposes.
Download: gydex.css 1KB css/text file
22 Oct 2013
Gloss-based list with IDs. I'm still not clear how glosses are to be handled but here is a first export for comment. Note: the suffix forms are preceded by an en-rule, not hyphen (just to make it easier to work with this data in Excel)
Download: GW-glosses.txt 104KB - tab-delimited file
21 Oct 2013
First simple export of GW data for Ma!
It's not clear to me what data we'll settle on, so I thought to start by an initial simple export and get response/request from there for planning next steps.
Download: lexData.txt 124KB - tab-delimited text file (columns: Original_ID -- new_ID -- part of speech -- word form (with conjugation marker) -- gloss(es))
Here is a persistent file with the mapping of all lexical IDs from the current GW app to new IDs that should be suitable for Ma!
Download: maIDmapping.xls 1200KB - MS Excel file
HTML display versions of all entries have been generated (and can be supplied in modified form) - see this intermediate web version
29 April 2013
First Mac version of Gayarragi, winangali!
There is still much work to do, but the major hurdles overcome.
This version: draft of 28 April 2013
Download: GW-Mac-001.dmg approx 600 MB
This version has:
- New custom modules for regex
to run on Mac Director 11.5
- Sentences now show speaker thumbnails
with rollover information
- Changes to dictionary Topics menu and search options popup menu
- Story notes now seamlessly available from Story menu page
30 July 2012
Final GW user survey results. The survey is now closed. We got a total of 52 responses. Please view the documents below before our discussion about how to proceed.
all_summary.pdf Summary of all responses (except graphs don't display)
all_graphs.pdf All the graphs
all_details.pdf Full details of responses including comments/feedback
teacher_summary.pdf Summary of teacher responses
all_teachercomments.pdf Relevant details of teacher responses
learner_summary.pdf Summary of learner responses
all_learner-comments.pdf Relevant details of learner responses
parent_summary.pdf Summary of parent responses
GW-survey-questions.pdf For records only - original blank survey
GW-survey-notes.txt Summary notes
12 July 2012
GW-survey-results-3.doc Early GW user survey results
30 May 2012
Draft survey, to get evaluations and feedback in preparation for Mac and 2nd edition
11 MARCH 09
Update files only, for new stories
Download: newfiles.zip approx 30 MB
Notes: Please just write these files over the existing versions, and leave other files as they are. Note that they are not all in the same folder, eg main.cxt is in the top level, and the stories folder goes inside the mvcst folder (just look for matches on filenames). Let me know if you have any problems.
3 MARCH 09
Latest version
Download: movie.zip approx 162 MB
13 NOVEMBER 2008
Latest version
Download: GY20081113.zip approx 160 MB
Please see the Googledoc for details
25 AUGUST 08
Latest version
Download: gy.zip approx 160 MB
This version has:
- All sentences implemented and corrected
- Most of new design material implemented
- Misc
1 July 08
Latest version
Download: gy.zip approx 100 MB
This version has:
- Topics section fully implemented (top level navigation needs better look)
- New menu system
- New background for search screen (not all controls yet migrated)
- New opening screen
16 May 2007
Mac version, almost same as 27 April (with minor changes only)
Download: GY2.DMG approx 200 MB
27 April 2007
Updated version - to keep download smaller, please re-use previous versions of the following:
startGY.exe, xtras folder, songs folder, word_sounds.cxt, sentence_sounds.cxt
Download: PROD14_Win.zip approx 8 MB
This version has:
- Browse GY-English section - some scrolling and queuing issues fixed
- English - GY index/finderlist completed
- 1 new story implemented (Burraalgaa)
- Change to search options panel
- Other minor improvements and bugfixes
- Sorry I didn't get the Mac version completed by this date, but it will definitely be available next week
- Please keep your previous version in case of any problems with this download.
10 March 2007
Updated version - please re-use previous versions of songs, stories and xtras folders, and startGY.exe.
Download: PROD13_Win.zip approx 60 MB
This version has:
- Browse GY-English section working
- All extra sounds converted and installed with system for playing exceptions to 10's
- Some bugs fixed in scrollbars: scroll persistent for GY-E main list
- Authored also for Macintosh - Mac version to follow
- Note also that other fields such as cross references are yet to be included. Please keep your previous version in case of any problems with this download.
28 December 2006
Updated version - please re-use previous versions of songs, stories and xtras folders, and word_sounds.cxt, and startGY.exe.
Download: PROD12.zip approx 10 MB
This version has:
- Sentence display and navigation developed further
- Sentence numbers shown (sequential, rather than original sentence number)
- Sentence sounds play, via icons
- Forward and back buttons usability improved
- Notes: some earlier feedback and ideas about main dictionary panel not implemented yet. Note also that other fields such as cross references are yet to be included. Please keep your previous version in case of any problems with this download.
17 November 2006
Updated version - re-use previous versions of songs, stories and xtras folders, and word_sounds.cxt, and startGY.exe.
Download: PROD11.zip approx 2.1 MB
This version has:
- Word sounds play on click speaker icon
- Several bugs in word history fixed
- Sentence navigation and display system now working in first draft version
- Notes: some earlier feedback and ideas about main dictionary panel not implemented yet. Note also that other fields such as cross references are yet to be included. Please keep your previous version in case of any problems with this download.
17 Oct 2006
Updated version - re-use previous versions of songs and stories folders, and word_sounds.cxt, and startGY.exe. This download has a new xtras folder.
Download: PROD10.zip approx 4 MB
This version has:
- Crossword game, 2 levels
- Sentences partly implemented but not yet displayed
- Note: Also KEEP your previous version in case you find instabilities in this version!
- Note: some earlier feedback and ideas about main dictionary panel not implemented yet. Note also that other fields such as cross references are yet to be included.
10 Oct 2006
Updated version - re-use previous versions of songs and stories folders, and word_sounds.cxt, and startGY.exe.
Download: PROD9.zip approx 2 MB
This version has:
- Queue/history of displayed dictionary entries
- Queue of pages visited (remembers last viewed page only of any song or story)
- Search box accepts arrow keys, shows "..." if text overflows box
- Note: some earlier feedback and ideas about main dictionary panel not implemented yet. Note also that other fields such as cross references are yet to be included.
28 September 2006
Updated version - re-use previous versions of songs and stories folders, and word_sounds.cxt, and startGY.exe. Hopefully it will all work!
Download: PROD8.zip approx 2 MB
This version has:
- Dictionary display now includes all sentences
- Dictionary display panel reorganised, larger font etc
- Dictionary search options moved into new pop-up tabbed panel. This panel disappears once it loses focus, unless it is positioned outside the main window.
- Note: some earlier feedback and ideas about main dictionary panel not implemented yet - I wanted to wait until we had some maximal dictionary entries. Note also that other fields such as cross references are yet to be included.
19 August 2006
Updated version - re-use previous versions of songs and stories folders, and word_sounds.cxt, and startGY.exe. Hopefully it will all work!
Download: PROD7.zip approx 2 MB
This version has:
- click on entry form in search hits now displays full entry
- ... headword, part of speech, languages
- ... senses, with alpha numbers, and languages (where different to entry-global languages)
- ... sense comments
- ... language materials in comments have popup note
- ... language materials in comments can be clicked to launch search - this function needs more refining, comments and suggestions welcome
- colour, font etc of full entry components is modifiable and there could eventually be user settings or a set of "looks" to choose from
- other content, such as sentences, entry-global comments, cross references, not yet implemented
5 June 2006
Updated version - all files except for folders "txts" and StartGY.exe (re-use previous versions of these and copy into correct locations).
Download: PROD6.zip approx 65 MB
This version has:
- improved ranking of search hits
- better recognition of whole-expression strings
- recognises homophones properly
- weighted to distinguish by POS (ie suffixes from lexical items)
- tunable to balance effect of length, POS
- system to search words by each language (GR, YR, YY) - controls and indexes
- note: not yet completely correctly operational - still some bugs in algorithm
- many more word sounds installed and playing
- stories system now working (data for one story converted so far)
- sentences data installed (not visible yet)
5 May 2006
Updated version - all files except for folders "txts" and "xtras".
Download: GY_5May06.zip approx 3.8 MB
This version has:
- Searched-for string is now highlighted in list of hits
- Better management of large numbers of hits
- Hitbox overflowed where multiple entries wrapped - I designed and programmed an automatically adaptive scrollbar system
- Other misc bugs fixed and search sped up
- Clicking on GY entry in hit-box plays the sound - only first 30 words have sound available at this stage (perhaps this was also on prev version)
- Shift-click on morphemes in song player takes you to dictionary search for that form
To install this update, unzip all the files, then place whole of previously-sent directories "txts" and "xtras" in their relevant locations (ie same as in previous versions). let me know if any problems.
25 April 2006
Updated version - replacement files only.
Download: PROD3.zip approx 1.7 MB
This version has:
- added "starts with" to search types
- search returns ranked hits
- English search ranked:
whole gloss unit > whole word > whole parenthetic > word with apostrophe or hyphen > anything else THEN by shorter > longer
- GY search ranked:
whole lexical unit > anything else THEN by shorter >
- bug with radio buttons fixed (second click on same button cancels)
- some indexing and speed optimisations
Notes:
- This update has only the changed files. Unzip the package then replace the files in your existing installation with the updated files. Check that files belong in two locations. Let me know if the program does not work properly.
-
The search ranking is working reasonably well, but can be improved - suggestions welcome.
-
Next step will be to add sentences, and to create full-entry displays (that will be displayed after clicking on hits)
9 April 2006
Updated version of GY CD with dictionary search implemented.
Download: PROD3.zip approx 40 MB
See previous version info for unpack/install instructions.
This version has:
- core dictionary data installed and indexed
- system for searching dictionary data, with various options:
- search in GY or English
- search in 3 modes: normal (forms and glosses), extended (includes latin, other strongly related data such as cross reference data fields, language content in comments), and deep (will search in sentences and other data: not yet implemented)
- search for whole words only or for string in any part of word
- search returns formatted summary form of entries as list of hits in sets of 10 (configurable)
- google-like navigation in search hits
- songs morpheme line connected to dictionary search
Related but not yet implemented:
- you cannot yet click on search hits to get full entries or further info
- sentence data not yet included
- other dictionary fields, eg comments, not fully implemented
- hits are presented in default order, not ranked
Previous versions:
24 Jan 2006:
Updated version of GY CD with song player.
Download: GY_songs_draft2.zip 40 MB
New song player engine. New song menu etc. Other improvements. All requested changes made.
Morphemes link to dictionary index - temporary popup shows basic data.
Some errors remain - missing data, incorrect trs files etc.
Comments cannot be synchronised with pages as there is no supporting data.
Please inform me re any other errors etc.
Previous version:
Download: GY_songs_draft1.zip 41 MB
Very first draft of GY CD with song player
Notes/please check the following:
- To install, unzip all the files into a new, empty folder - they should organise themselves correctly
- To run, double click on StartGYSongs.exe
- Let me know immediately if it does not work for you
- The Sounds area is the only development that you can see at this stage
- Click on a song name from the song list to go to player. Use menu to go back to song list (later, "Back" will do this too)
- Check the organisation of the songs into verses/screens etc
- Check all the metadata (titles, other info) carefully for content (and layout). Record all concerns. Some are too long or should be split up differently (and cause overlap with song text)
- I know there are at least some errors with sound/text allocation/alignment - please check rigorously and record all you find. Eg in song 14 there are misalignments.
- Check for any typos etc in songs
- Check carefully the line wrapping - this is done automatically and is the most complex bit. I haven't found errors yet but ...
- Font sizes and colours (and line length and spaces between text groups) can all be easily changed. I chose the current sizes to suit the chunking I made for the sense of the songs, so that they would not run off the screen.
- Also screen layout is not difficult to change, eg lines can be shorter (but will wrap more), free gloss could be moved independently of interlinear stuff etc
- Slight (time) gap in going from one screen to next will depend on your computer - but I will be optimising the programming later, once all the functionalities have settled
- Morpheme gloss line is "live" (ie highlighted on mouse over, also program knows the dictionary id number of the word you are over) but no related function yet till the dictionary is intalled
- There may be programming errors - would show up as "Script Error" boxes, but you can probably continue. If this occurs, please note which page, song you are on and if possible, what you were doing at the time. Note that the live text function (see prev point) is work in progress and causes some of these errors.
Email me with any other questions etc
David