World Archives Project: International Characters and Diacritics
There are many projects within the World Archives Project that use international characters or diacritics. In order to maintain the integrity of documents, keyers are asked to key all diacritical marks as seen on the original image.
What is a diacritic?
A diacritic is a glyph that accompanies a letter and is mainly used to change the sound of the letter. A diacritical mark typically appears above or below a letter but they can be positioned within the letter itself.
Examples: Ã Ę Ł
Map of International Characters used in the World Archives Project
How to enter diacritical marks in the keying tool
In some cases you'll need to enter international characters that are not found on your computer's keyboard. To enter the characters, click the International characters icon located in the menu bar just above the form where data is entered. Once the International Characters window appears, click the character you wish to enter once to highlight it, then select the Insert button. Alternately, you can double click on the letter and it will insert it for you.
When you key letters that have diacritical marks, you should only use the international character set provided or the numeric shortcut shown in the keying tool. In the image above the capital letter A with acute is highlighted. This letter also has a numeric shortcut of Alt+0193
Note: Not all diacritics will have a shortcut displayed. Do not use other keyboard shortcuts.
There can be and are multiple methods of getting the same diacritic to display using special combinations of keys, but the character will not be recognized during processing in the World Archives tool and will cause errors.
Known problems with Windows XP
There are some diacritics that older Microsoft Windows operating systems do not fully recognize. Windows XP default fonts for example do not recognize some diacritics such as the Romanian T-comma. When your operating system does not recognize a particular diacritic you may see words in the wiki pages or in the keying tool (particularly during review and arbitration) that look like they have a box inserted as a letter.
Example: Mie□dzyrzecz instead of Międzyrzecz.
These errors are typically happening with letters that have a ring such as ů or a letter with a comma, cedilla, (ș ş) or in the example above the E ogonek.
Windows XP Remedies
One way to remedy this is to upgrade your operating system to Windows Vista or Windows 7.
If upgrading is not possible and you are running Windows XP, you may be able to install the European Union Expansion Font Update, which adds support for 6 additional characters: Ș, ș, Ț, ț, Ѝ, ѝ (s and t with comma, Cyrillic i with grave ). If these characters display correctly, you do not need to install this.
Note: Ancestry.com does not endorse, guarantee or provide support for any of these products.
Notes for Arbitrators and Reviewers
Arbitrators and reviewers who are using Windows XP operating systems: If you see the □ displayed when you are reviewing or arbitrating an image set, you will want to consider getting the font update described above. Since you do not have the ability to see the correct diacritic you will need to key the proper diacritic using the international characters symbols provided to ensure the proper diacritic is being entered. If all keyers use the same method of entering in diacritics, then during arbitration the system will not flag them as a discrepancy to be arbitrated.