Skip to content

Question entity to node hybrid search #52

@emrec1

Description

@emrec1

How can we improve the accuracy of the Node retrieval?

Main function: decision_main()
/home/ecalik/CardioGuidelinesGraph/src/cardio_graph_core/query/query_helper_functions.py

Hybrid search over DecisionNode using:
  - lexical search on entity_original + entity_standardized_candidate
  - vector search on embedding_entity_standardized
  - file found at ```/home/ecalik/CardioGuidelinesGraph/src/cardio_graph_core/query/query_helper_functions.py```
def decision_main(
    URI,
    AUTH,
    entity,
    model="mxbai-embed-large:latest",
    host="http://localhost:11434",
    embed=False,
):

    embedder = SimpleOllamaEmbedder(
        model=model,
        host=host,
    )
    if embed:
        embed_decisionnodes_and_create_indexes(
            uri=URI,
            auth=AUTH,
            embedder=embedder,
            dimensions=1024,
            source_property="entity_standardized_candidate",
            embedding_property="embedding_entity_standardized",
        )

    driver = GraphDatabase.driver(URI, auth=AUTH)

    results = hybrid_search_decisionnodes(
        driver=driver,
        embedder=embedder,
        query_text=entity,
        top_k_vector=100,
        top_k_fulltext=100,
        min_vector_score=0.8,
        min_fulltext_score=0.0,
        final_limit=200,
    )

    pretty_print_decisionnode_hits(results)
    collapsed = collapse_decisionnode_hits(results)
    filtered = filter_grounded_groups_strict_short_query(entity, collapsed)

    pretty_print_filtered_groups(filtered)

    driver.close()
    return filtered

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions