Created
February 28, 2015 11:18
-
-
Save akora/51b2933a2554776d7144 to your computer and use it in GitHub Desktop.
Removes all diacritics from strings (e.g. names) in a Google spreadsheet
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
=SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(SUBSTITUTE(C2,"ö","o"),"ü","u"),"ó","o"),"ő","o"),"ú","u"),"é","e"),"á","a"),"ű","u"),"í","i"),"Ö","O"),"Ü","U"),"Ó","O"),"Ő","O"),"Ú","U"),"É","E"),"Á","A"),"Ű","U"),"Í","I") |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@NoSubstitute, I believe you talk about
removeDiacritics()
function.The reason why it does not work is the regex defined on L44. It contains a list of characters that should not be replaced. It contains
\s
which matches[ \t\r\n\f]
(aka whitespace characters), specifically a space, a tab, a carriage return, a line feed, or a form feed (src).Also it contains a mistake:
,-.
matches any character between a comma and a full stop / dot. There are no characters between these two, thus it matches only,
or.
, however, I wanted to also match-
. A hyphen (when matched literally) should always go to the end of the list.I have updated the script in my GitLab repository, however, I moved it to a new one and (as the function is a simple JS function without any dependency on Google APIs) I moved it to
js/
folder.I think I want to keep
\s
without replacing it. Now, when you copy the updated function, all you need to do is:' ': '-'
(as you did) toreplacements
(before L34);//
) L41.Anyway, I updated the function in gSheets script, however, as I didn’t remove
\s
fromcharsToKeep
, it won’t work for your case as is.