code-switch speech have different voice

I used your model. The experiment used the open source biaobei dataset and LJspeech dataset. It synthesized 22000 steps and successfully synthesized Chinese and English mixed speech, but the Chinese audio sound is the voice of biaobei and the English audio sound is the voice of LJspeech.
Is the number of training steps insufficient?
Thanks