This issue is a follow-up of the results obtained for different genes #52 . It is still not clear why few oncogenes produced such bad results. Before analyzing genes themselves, I got puzzled by one thing in the code.
If we want to run the classifier for a different gene, the only part that is currently changed is y, i.e., vector of labels y=Y[GENE]. Matrix X, which contains our feature values, remains the same. This means that one set of feature values can belong to class '0' in one iteration, while in another iteration same set is denoted as class '1'. Even though each iteration corresponds to a different gene, classifier sees it as another combination of '0' and '1' for which model has to be built.
If the matrix X is static, i.e., its values are completely reliable, I guess the main question is how reliable are the labels given in matrix Y and would it be possible to measure that reliability.
This issue is a follow-up of the results obtained for different genes #52 . It is still not clear why few oncogenes produced such bad results. Before analyzing genes themselves, I got puzzled by one thing in the code.
If we want to run the classifier for a different gene, the only part that is currently changed is y, i.e., vector of labels y=Y[GENE]. Matrix X, which contains our feature values, remains the same. This means that one set of feature values can belong to class '0' in one iteration, while in another iteration same set is denoted as class '1'. Even though each iteration corresponds to a different gene, classifier sees it as another combination of '0' and '1' for which model has to be built.
If the matrix X is static, i.e., its values are completely reliable, I guess the main question is how reliable are the labels given in matrix Y and would it be possible to measure that reliability.