March 5th, 2019 19:30


Ana Gelez
Ana Gelez

There is this issue on GitHub: to make clean slugs/identifiers/URL for users, blogs and articles. The problems are that allowing any UTF-8 character (= any character that exists in the world) is not nicely displayed by many websites when making links (you end up with %-encoded characters everywhere), and that it makes it easier to do phishing/to impersonate someone else, as you can find visually similar characters quite easily.

And as I explained there, I don't know how we should deal with that.

We could only allow ASCII (= english alphabet) characters, but what about people using languages with accents, or even worse, people who don't use the latin alphabet at all?

We could replace too visually similar characters with their ASCII equivalent, but that would require a huge amount of work (there are thousands of characters in UTF-8, how can we handle them all?)

We could also decide to just take the risk of phishing or impersonation. It will always be more or less possible anyway, since you can always create an account with the same name on another instance if you want.

Or maybe there is a better solution?

What do you think?


tcit March 6th, 2019 08:27

Isn't that the whole point of slugs ? Using URI-safe characters, stripping non-ASCII characters, replacing symbols by dashes for example ? I'm not sure what's the issue is here (since it's only displayed inside the URL).

Ana Gelez

Ana Gelez March 7th, 2019 11:28

Yes, it is, but the problem is what do you do when you only have non-ASCII characters in your title (not everybody is using the latin alphabet)?
And some "slugs" are also displayed in the interface, since they are also used as blog or user identifiers for federation.