Skip to content

HIVE-29551: Avoid quadratic runtime in ColumnStatsSemanticAnalyzer#ge…#6443

Open
tanishq-chugh wants to merge 2 commits intoapache:masterfrom
tanishq-chugh:HIVE-29551
Open

HIVE-29551: Avoid quadratic runtime in ColumnStatsSemanticAnalyzer#ge…#6443
tanishq-chugh wants to merge 2 commits intoapache:masterfrom
tanishq-chugh:HIVE-29551

Conversation

@tanishq-chugh
Copy link
Copy Markdown
Contributor

…tColumnTypes

What changes were proposed in this pull request?

Improve time complexity in ColumnStatsSemanticAnalyzer#getColumnTypes

Why are the changes needed?

Performance Improvement

Does this PR introduce any user-facing change?

No

How was this patch tested?

Manual Testing + CI

if (typeInfo.getCategory() != ObjectInspector.Category.PRIMITIVE) {
logTypeWarning(colName, type);
} else {
nonPrimColNames.add(colName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable name should be PrimColNames instead of nonPrimColNames. As the primitve type will enter the else flow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Aggarwal-Raghav My bad, i validated the columnTypes/names being returned for primitive types and used the wrong variable name. Updated in commit - 4a6804d .
Thanks for pointing this out !

@sonarqubecloud
Copy link
Copy Markdown

} else {
colTypes.add(type);
}
Map<String, String> colTypeMap = new HashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! When I created HIVE-29551, I had in mind do it without a HashMap if possible. There are two types of usages, depending on where the column names came from:

  • ColumnStatsSemanticAnalyzer#getColumnName
  • Utilities.getColumnNamesFromFieldSchema
    The latter iterates over a list of FieldSchema, so the type info can be obtained from these items as well.

The HashMap is only needed when the ASTNode has 3 children.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants