HIVE-29551: Avoid quadratic runtime in ColumnStatsSemanticAnalyzer#ge…#6443
HIVE-29551: Avoid quadratic runtime in ColumnStatsSemanticAnalyzer#ge…#6443tanishq-chugh wants to merge 2 commits intoapache:masterfrom
Conversation
| if (typeInfo.getCategory() != ObjectInspector.Category.PRIMITIVE) { | ||
| logTypeWarning(colName, type); | ||
| } else { | ||
| nonPrimColNames.add(colName); |
There was a problem hiding this comment.
the variable name should be PrimColNames instead of nonPrimColNames. As the primitve type will enter the else flow.
There was a problem hiding this comment.
@Aggarwal-Raghav My bad, i validated the columnTypes/names being returned for primitive types and used the wrong variable name. Updated in commit - 4a6804d .
Thanks for pointing this out !
|
| } else { | ||
| colTypes.add(type); | ||
| } | ||
| Map<String, String> colTypeMap = new HashMap<>(); |
There was a problem hiding this comment.
Thanks for the PR! When I created HIVE-29551, I had in mind do it without a HashMap if possible. There are two types of usages, depending on where the column names came from:
- ColumnStatsSemanticAnalyzer#getColumnName
- Utilities.getColumnNamesFromFieldSchema
The latter iterates over a list of FieldSchema, so the type info can be obtained from these items as well.
The HashMap is only needed when the ASTNode has 3 children.



…tColumnTypes
What changes were proposed in this pull request?
Improve time complexity in ColumnStatsSemanticAnalyzer#getColumnTypes
Why are the changes needed?
Performance Improvement
Does this PR introduce any user-facing change?
No
How was this patch tested?
Manual Testing + CI