This is a strange take. In Japanese it’s literally a consonant cluster [ts], which is to say it’s literally a Japanese “t” followed by a Japanese “s”. The Japanese “t” and “s” are not exactly the same as English, but they’re close enough, and English has the same cluster in, say, the plural “mats” of “mat”.
What “tsunami” breaks in English is not really the sound, but instead just the fact that English doesn’t allow [ts] unless it’s preceeded by a vowel.
Where are you getting this information? This “pull your cheeks together a bit” sounds completely out of left field to me.