There are some DNA strings in the datasets that either partially or entirely consist of masked strings, e.g., the 7th sequence in the DemoHumanOrWorm training set (checked via dset[6]), is a string of 'NNNNNNN....NNNN'. Maybe consider extracting the DNA strings from the unmasked genome?
There are some DNA strings in the datasets that either partially or entirely consist of masked strings, e.g., the 7th sequence in the DemoHumanOrWorm training set (checked via dset[6]), is a string of 'NNNNNNN....NNNN'. Maybe consider extracting the DNA strings from the unmasked genome?