You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: apis/python/src/tiledbsoma/soma_dataframe.py
+19-21Lines changed: 19 additions & 21 deletions
Original file line number
Diff line number
Diff line change
@@ -169,9 +169,9 @@ def read(
169
169
# TODO: batch_size
170
170
# TODO: partition,
171
171
# TODO: platform_config,
172
-
) ->Iterator[pa.RecordBatch]:
172
+
) ->Iterator[pa.Table]:
173
173
"""
174
-
Read a user-defined subset of data, addressed by the dataframe indexing column, optionally filtered, and return results as one or more ``Arrow.RecordBatch``.
174
+
Read a user-defined subset of data, addressed by the dataframe indexing column, optionally filtered, and return results as one or more ``Arrow.Table``.
175
175
176
176
:param ids: Which rows to read. Defaults to ``None``, meaning no constraint -- all rows.
177
177
@@ -217,18 +217,16 @@ def read(
217
217
else:
218
218
iterator=query.df[ids]
219
219
220
-
fordfiniterator:
221
-
batches=df.to_batches()
222
-
forbatchinbatches:
223
-
# XXX COMMENT MORE
224
-
# This is the 'decode on read' part of our logic; in dim_select we have the
This is a convenience method around ``read``. It iterates the return value from ``read`` and returns a concatenation of all the record batches found. Its nominal use is to simply unit-test cases.
246
+
This is a convenience method around ``read``. It iterates the return value from ``read`` and returns a concatenation of all the table-pieces found. Its nominal use is to simply unit-test cases.
Write an Arrow.RecordBatch to the persistent object.
276
+
Write an Arrow.Table to the persistent object.
279
277
280
-
:param values: An Arrow.RecordBatch containing all columns, including the index columns. The schema for the values must match the schema for the ``SOMADataFrame``.
278
+
:param values: An Arrow.Table containing all columns, including the index columns. The schema for the values must match the schema for the ``SOMADataFrame``.
281
279
282
-
The ``values`` Arrow RecordBatch must contain a ``soma_rowid`` (uint64) column, indicating which rows are being written.
280
+
The ``values`` Arrow Table must contain a ``soma_rowid`` (uint64) column, indicating which rows are being written.
This is a convenience method around ``read``. It iterates the return value from ``read`` and returns a concatenation of all the table-pieces found. Its nominal use is to simply unit-test cases.
232
+
"""
233
+
returnutil_arrow.concat_tables(
234
+
self.read(
235
+
row_ids=row_ids,
236
+
col_ids=col_ids,
237
+
result_order=result_order,
238
+
)
181
239
)
182
240
183
241
defwrite_tensor(
184
242
self,
185
243
coords: SOMADenseNdCoordinates,
244
+
*,
245
+
row_ids: Optional[Sequence[int]] =None,
246
+
col_ids: Optional[Sequence[int]] =None,
247
+
set_index: Optional[bool] =False,
248
+
) ->pa.Table:
249
+
"""
250
+
This is a convenience method around ``read_as_pandas``. It iterates the return value from ``read_as_pandas`` and returns a concatenation of all the table-pieces found. Its nominal use is to simply unit-test cases.
251
+
"""
252
+
dataframes= []
253
+
generator=self.read_as_pandas(
254
+
row_ids=row_ids,
255
+
col_ids=col_ids,
256
+
set_index=set_index,
257
+
)
258
+
fordataframeingenerator:
259
+
dataframes.append(dataframe)
260
+
returnpd.concat(dataframes)
261
+
262
+
defwrite(
263
+
self,
264
+
# TODO: rework callsites with regard to the very latest spec rev
Copy file name to clipboardExpand all lines: apis/python/src/tiledbsoma/soma_indexed_dataframe.py
+14-16Lines changed: 14 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -209,9 +209,9 @@ def read(
209
209
column_names: Optional[Sequence[str]] =None,
210
210
result_order: Optional[SOMAResultOrder] =None,
211
211
# TODO: more arguments
212
-
) ->Iterator[pa.RecordBatch]:
212
+
) ->Iterator[pa.Table]:
213
213
"""
214
-
Read a user-defined subset of data, addressed by the dataframe indexing columns, optionally filtered, and return results as one or more Arrow.RecordBatch.
214
+
Read a user-defined subset of data, addressed by the dataframe indexing columns, optionally filtered, and return results as one or more Arrow.Table.
215
215
216
216
:param ids: for each index dimension, which rows to read. Defaults to ``None``, meaning no constraint -- all IDs.
217
217
@@ -258,14 +258,12 @@ def read(
258
258
else:
259
259
iterator=query.df[ids]
260
260
261
-
fordfiniterator:
262
-
batches=df.to_batches()
263
-
forbatchinbatches:
264
-
# XXX COMMENT MORE
265
-
# This is the 'decode on read' part of our logic; in dim_select we have the
This is a convenience method around ``read``. It iterates the return value from ``read`` and returns a concatenation of all the record batches found. Its nominal use is to simply unit-test cases.
282
+
This is a convenience method around ``read``. It iterates the return value from ``read`` and returns a concatenation of all the table-pieces found. Its nominal use is to simply unit-test cases.
Write an Arrow.RecordBatch to the persistent object. As duplicate index values are not allowed, index values already present in the object are overwritten and new index values are added.
290
+
Write an Arrow.Table to the persistent object. As duplicate index values are not allowed, index values already present in the object are overwritten and new index values are added.
293
291
294
-
:param values: An Arrow.RecordBatch containing all columns, including the index columns. The schema for the values must match the schema for the ``SOMAIndexedDataFrame``.
292
+
:param values: An Arrow.Table containing all columns, including the index columns. The schema for the values must match the schema for the ``SOMAIndexedDataFrame``.
This is a convenience method around ``read``. It iterates the return value from ``read`` and returns a concatenation of all the table-pieces found. Its nominal use is to simply unit-test cases.
Return the sparse array as a single Pandas DataFrame containing COO data.
317
+
This is a convenience method around ``read_as_pandas``. It iterates the return value from ``read_as_pandas`` and returns a concatenation of all the table-pieces found. Its nominal use is to simply unit-test cases.
Iterates a generator of ``pyarrow.RecordBatch`` (e.g. ``SOMADataFrame.read``) and returns a concatenation of all the record batches found. The nominal use is to simply unit-test cases.
145
+
Iterates a generator of ``pyarrow.Table`` (e.g. ``SOMADataFrame.read``) and returns a concatenation of all the table-pieces found. The nominal use is to simply unit-test cases.
0 commit comments