Substring seems to break multi-byte characters

Substring seems to break multi-byte characters.

substring(𝐱𝐱𝐱𝐱𝔁𝔁𝔁𝔁;0;1) returns a broken character. Does anyone know a workaround?

Hi @Michael_Friedman,

The first character in your string is unicode U+1D431, it is actually made of two characters, this is why substring(string;0;2) returns that character.

I have found this interesting blog post: https://thenewtoys.dev/blog/2021/01/26/what-is-a-string/

1 Like

@loic.wiseflow , thanks for the info.

I’m not sure if Make runs in Javascript or not - we certainly do not have access to the full JavaScript language.

However, any modern system should handle Unicode characters no matter how many bytes they have in them.

Make integrates with third party systems that have Unicode characters. For example, we are manipulating data from LinkedIn that contains these characters because people use them in posts to simulate bold and italic - this is the raw data in the wild.

We need to load data into Monday because that is our core company operating system.

Both LinkedIn and Monday properly handle Unicode.

We need to truncate text to put the data into some Monday fields. Monday will not accept data with broken code points.

How can we do this?

Hi @Michael_Friedman,

If you truncate an even number you should not get errors.

That could work if you just truncate long strings and I does not need to be super accurate.

1 Like

@loic.wiseflow I do not think that works. You are assuming that all the characters in the string use surrogate pairs. If so, that would work. But realistically, these characters are mixed with regular characters.

Hi Michael, yeah sorry you are right. I tried a bunch of string manipulation and I can’t figure out a way to truncate those strings consistently either.