Frank derFrankie Neulichedl

View Original

Convert Dual Greek text to Unicode

Here follows a handy macro I've written for converting text written with Dual Greek Codepage to Unicode. Since I worked for a lot of firms operating internationally I've run into a lot of language and font issues. Here the solution to a common problem - converting legacy texts to unicode.

What are codepages and why should you care?

In the times before Opentype every font could only hold a certain number of glyphs (single characters), 256 to be exact. The first 127 characters where defined by a standard (ANSI), the rest not. So if you needed country specific glyphs they where placed after the 127 defined characters. This are the codepages. To make it short - it was a little bit of a mess. To write a text in Greek for example you had to setup your PC to a certain codepage, choose the right font and have the right keyboard. Unicode was on it's way but took long to be incorporated and still not all professional publishing programs support fully Unicode (Xpress just added it in one of its last versions and Framemaker supports it since Version 8 released in 2008). This means that if you have documents created a couple of years ago you can be sure that it has non-Unicode fonts and codepages. If you copy this text into a new document or change the font to a Unicode font (all Opentype fonts are) your text will NOT be readable.

Why use Unicode?

Unicode fixes all the problems mentioned above. You can think of it as a giant table where all characters existing in this world have their fixed place. There is plenty of room for more and every Opentype fonts uses this "places" - meaning, that if you have a text written in a Unicode font and you change it to another Unicode font it will stay exactly the same. To say it the way of the Unicode Consortium: "Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language."

Converting legacy text to Unicode

Until now (if you don't want to spend thousands of dollars) you could only rewrite legacy texts with a Unicode font. I had quite some text to convert so I've written this Word Macro. It basically searches for single characters and replaces it with the Unicode one. Its a table transformation. If in codepage the letter "α" is represented by the number 224, the same letter in Unicode is represented by 945. In my example I look for characters of Dual Greek ( Codepage 437 - click here for the explanation) and replace them with Unicode characters:
Attribute VB_Name = "Frankie_Macros"
Sub Convert_DualGreek()
Attribute Convert_DualGreek.VB_Description = "Converts DualGreek - Codepage 437 text to Unicode - www.frankie.bz"
Attribute Convert_DualGreek.VB_ProcData.VB_Invoke_Func = "Normal.Maico_Macros.Convert_DualGreek"
'
' Convert_Dual_Greek_to_Unicode Code Page Macro
'
' (C)2007 Frank Neulichedl
'
' Creative Commons 3.0 Attribution
'
Dim nDummy as Integer
Dim nOffset as Integer

nOffset=720 ' Change this for other codepages

	For nDummy=180 to 210
		Selection.Find.ClearFormatting
		Selection.Find.Replacement.ClearFormatting
		With Selection.Find
			.Text = Chr(nDummy)
			.Replacement.Text = ChrW(nDummy nOffset)
			.Forward = True
			.Wrap = wdFindAsk
			.Format = False
			.MatchCase = True
			.MatchWholeWord = False
			.MatchWildcards = False
			.MatchSoundsLike = False
			.MatchAllWordForms = False
		End With
		Selection.Find.Execute Replace:=wdReplaceAll
		Selection.Find.ClearFormatting
		Selection.Find.Replacement.ClearFormatting
    Next

	For nDummy=211 to 255
		Selection.Find.ClearFormatting
		Selection.Find.Replacement.ClearFormatting
		With Selection.Find
			.Text = Chr(nDummy)
			.Replacement.Text = ChrW(nDummy nOffset)
			.Forward = True
			.Wrap = wdFindAsk
			.Format = False
			.MatchCase = True
			.MatchWholeWord = False
			.MatchWildcards = False
			.MatchSoundsLike = False
			.MatchAllWordForms = False
		End With
		Selection.Find.Execute Replace:=wdReplaceAll
		Selection.Find.ClearFormatting
		Selection.Find.Replacement.ClearFormatting
    Next

	' ---------- This section is for single special characters - to customize entirely
	With Selection.Find
        .Text = Chr(164)
        .Replacement.Text = ChrW(8364)
        .Forward = True
        .Wrap = wdFindAsk
        .Format = False
        .MatchCase = True
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
    With Selection.Find
        .Text = Chr(170)
        .Replacement.Text = ChrW(890)
        .Forward = True
        .Wrap = wdFindAsk
        .Format = False
        .MatchCase = True
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
	    With Selection.Find
        .Text = Chr(175)
        .Replacement.Text = ChrW(8213)
        .Forward = True
        .Wrap = wdFindAsk
        .Format = False
        .MatchCase = True
        .MatchWholeWord = False
        .MatchWildcards = False
        .MatchSoundsLike = False
        .MatchAllWordForms = False
    End With
    Selection.Find.Execute Replace:=wdReplaceAll
    Selection.Find.ClearFormatting
    Selection.Find.Replacement.ClearFormatting
End Sub
Please refer to the Microsoft Help to learn how to use macros. If you want to customize this code to another codepage you have to change the offset (for the most common characters) and the section where single characters are converted. To find the offset just search for the originating codepage and then look here to find the right codes. It should be straight forward.
Foto by jakebouma