Return to Snippet

Revision: 7291
at July 17, 2008 06:51 by scarfboy


Initial Code
reCombining = re.compile(u'[\u0300-\u036f\u1dc0-\u1dff\u20d0-\u20ff\ufe20-\ufe2f]',re.U)
 
def remove_diacritics(s):
    " Decomposes string, then removes combining characters "
    return reCombining.sub('',unicodedata.normalize('NFD',unicode(s)) )

Initial URL


Initial Description
Useful when creating canonical forms of strings for indexing.

Initial Title
remove diacritics

Initial Tags
python

Initial Language
Python