Improve intermediate layer extraction explanation#1338
Improve intermediate layer extraction explanation#1338palonso wants to merge 3 commits intoMTG:masterfrom
Conversation
dbogdanov
left a comment
There was a problem hiding this comment.
This looks good! I've left a proposal to improve the description of the algorithms' output in the DOC string.
| "Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or " | ||
| "class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
| "parameter is set to an intermediate layer with more dimensions, the output will be " | ||
| "flattened to 2D.\n" |
There was a problem hiding this comment.
Rephrased version (trying to simplify):
Note: The algorithm outputs a time series of class activations or embedding vectors, with a 2D shape [time, feature vector]. Feature vector values will be flattened if the
outputparameter is set to extract an intermediate layer with multiple dimensions.
| "class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
| "parameter is set to an intermediate layer with more dimensions, the output will be " | ||
| "flattened to 2D.\n" | ||
| "\n" |
There was a problem hiding this comment.
Same comments as for TensorflowPredictEffnetDiscogs
| "Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or " | ||
| "class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
| "parameter is set to an intermediate layer with more dimensions, the output will be " | ||
| "flattened to 2D.\n" |
There was a problem hiding this comment.
Same comment as for TensorflowPredictEffnetDiscogs
| "Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or " | ||
| "class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
| "parameter is set to an intermediate layer with more dimensions, the output will be " | ||
| "flattened to 2D.\n" |
There was a problem hiding this comment.
Same comment as for TensorflowPredictEffnetDiscogs
| _featsSize = tensor.dimension(3); | ||
|
|
||
| if (_channels != 1 && !_warned) { | ||
| E_WARNING("TensorToVectorReal: The channel axis (dimension 1) of the input tensor has size larger than 1, but the output of this algorithm is 2D. The batch, channel, and time axes (dimensions 0, 1, 2) will be flattened to the first dimension of the output matrix."); |
There was a problem hiding this comment.
We output a vector of vector of reals, so the "matrix" terminology may be misleading.
TensorToVectorReal converts tensors to 2D arrays by flattening all axis but the last one into the first dimension.
model-specific prediction algorithms (e.g., TensorflowPredictVGGish) use this algorithm to return 2D arrays since they are primarily intended for time-wise predictions or embeddings. However, it is possible to use these algorithms to extract intermediate layers of the models that may have more than two dimensions. In this case, all dimensions but the last one will be flattened. To address this:
Note that it is also possible to retrieve intermediate layers with their original shape using TensorflowPredict as discussed here.