Difference between revisions of "World Archives Project: International Characters and Diacritics"

From Ancestry.com Wiki
Jump to: navigation, search
Line 8: Line 8:
Map of International Characters used in the World Archives Project  
Map of International Characters used in the World Archives Project  
[[File:AWAP International Characters.png]]

Revision as of 13:31, 27 April 2011

There are many projects within the World Archives Project that use international characters or diacritics. In order to maintain the integrity of documents, keyers are asked to key all diacritical marks as seen on the original image.

What is a diacritic?

A diacritic is a glyph that accompanies a letter and is mainly used to change the sound of the letter. A diacritical mark typically appears above or below a letter but they can be positioned within the letter itself.

Examples: Ã Ę Ł

Map of International Characters used in the World Archives Project

AWAP International Characters.png

How to enter diacritical marks in the keying tool

When you key letters that have diacritical marks, you should only use the international character set provided or the numeric shortcut shown in the keying tool.

NOTE: Not all diacritics will have a shortcut displayed. Do not use other keyboard shortcuts.

There can be and are multiple methods of getting the same diacritic to display using special combinations of keys, but the character will not be recognized during processing in the World Archives tool and will cause errors.

Known problems with Windows XP

There are some diacritics that older Microsoft Windows operating systems do not fully recognize. Windows XP default fonts for example do not recognize some diacritics such as the Romanian T-comma. When your operating system does not recognize a particular diacritic you may see words in the wiki pages or in the keying tool (particularly during review and arbitration) that look like they have a box inserted as a letter.

Example: Mie□dzyrzecz instead of Międzyrzecz.

These errors are typically happening with letters that have a ring such as ů or a letter with a comma, cedilla, (ș ş) or in the example above the E caudate.


One way to remedy this is to upgrade your operating system to Windows Vista or Windows 7. If upgrading is not possible and you are running Windows XP's you may be able to install the European Union Expansion Font Update, which adds support for many missing diacritics. http://www.microsoft.com/downloads/en/details.aspx?FamilyID=0ec6f335-c3de-44c5-a13d-a1e7cea5ddea&DisplayLang=en Note: Ancestry.com and Ancestry World Archives Project are not responsible for any third party software updates and installing any updates are done at your own risk.

Notes for Arbitrators and Reviewers

Arbitrators and reviewers who are using Windows XP operating systems: If you see the □ displayed when you are reviewing or arbitrating an image set, you will want to consider getting the font update described above. Since you do not have the ability to see the correct diacritic you will need to key the proper diacritic using the international characters symbols provided to ensure the proper diacritic is being entered. If all keyers use the same method of entering in diacritics, then during arbitration the system will not flag them as a discrepancy to be arbitrated.