Exception: ValueError
--------------------------------------------------------------------------------
Message: Metadata length (134) is longer than chunk size (128). Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
--------------------------------------------------------------------------------
Traceback: Traceback (most recent call last):
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/py_modules_files/_ray_pkg_9d9949a5341176cd/syftr/tuner/qa_tuner.py", line 349, in objective
obj1, obj2, metrics, flow_json = evaluate(params, study_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/py_modules_files/_ray_pkg_9d9949a5341176cd/syftr/tuner/qa_tuner.py", line 112, in evaluate
obj1, obj2, results = _evaluate(params, study_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/py_modules_files/_ray_pkg_9d9949a5341176cd/syftr/tuner/qa_tuner.py", line 305, in _evaluate
flow = build_flow(params, study_config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/py_modules_files/_ray_pkg_9d9949a5341176cd/syftr/tuner/qa_tuner.py", line 186, in build_flow
rag_retriever, rag_docstore = build_rag_retriever(study_config, params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/py_modules_files/_ray_pkg_9d9949a5341176cd/syftr/retrievers/build.py", line 163, in build_rag_retriever
dense_index, dense_docstore = get_or_build_dense_index(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/py_modules_files/_ray_pkg_9d9949a5341176cd/syftr/retrievers/build.py", line 47, in get_or_build_dense_index
index, docstore = _build_dense_index(
^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/py_modules_files/_ray_pkg_9d9949a5341176cd/syftr/retrievers/build.py", line 75, in _build_dense_index
nodes = pipeline.run(
^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 324, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/ingestion/pipeline.py", line 550, in run
nodes = run_transformations(
^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/ingestion/pipeline.py", line 98, in run_transformations
nodes = transform(nodes, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 324, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/node_parser/interface.py", line 194, in __call__
return self.get_nodes_from_documents(nodes, **kwargs) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/node_parser/interface.py", line 166, in get_nodes_from_documents
nodes = self._parse_nodes(documents, show_progress=show_progress, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 324, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/node_parser/interface.py", line 261, in _parse_nodes
splits = self.split_text_metadata_aware(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/instrumentation/dispatcher.py", line 324, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/tmp/ray/session_2025-06-05_07-19-04_642991_1676/runtime_resources/pip/dd5f956fcc327946303d03fcf07dea86900ea86c/virtualenv/lib/python3.12/site-packages/llama_index/core/node_parser/text/token.py", line 122, in split_text_metadata_aware
raise ValueError(
ValueError: Metadata length (134) is longer than chunk size (128). Consider increasing the chunk size or decreasing the size of your metadata to avoid this.
{'additional_context_enabled': False,
'few_shot_embedding_model': 'sentence-transformers/paraphrase-multilingual-mpnet-base-v2',
'few_shot_enabled': True,
'few_shot_top_k': 15,
'hyde_enabled': False,
'lats_max_rollouts': 2,
'lats_num_expansions': 3,
'rag_embedding_model': 'BAAI/bge-multilingual-gemma2',
'rag_method': 'dense',
'rag_mode': 'lats_rag_agent',
'rag_query_decomposition_enabled': False,
'rag_top_k': 9,
'reranker_enabled': False,
'response_synthesizer_llm': 'Qwen/Qwen2.5',
'splitter_chunk_exp': 7,
'splitter_chunk_overlap_frac': 0.0,
'splitter_method': 'token',
'template_name': 'concise'}
Describe the bug
Repeated issue when using the sustainable living subset of the bright dataset.
To Reproduce
Following configurations causes the issue: