ืกื˜ืจื™ืžื™ื ื’ ื–ื™ื™ึทืœ ื“ืึทื˜ืŸ ืžื™ื˜ ืึทืคึผืึทื˜ืฉื™ ืขืจืึธื•

ื“ื™ ืื™ื‘ืขืจื–ืขืฆื•ื ื’ ืคื•ืŸ ื“ืขื ืึทืจื˜ื™ืงืœ ืื™ื– ื’ืขื•ื•ืขืŸ ืฆื•ื’ืขื’ืจื™ื™ื˜ ืกืคึผืึทืกื™ืคื™ืงืœื™ ืคึฟืึทืจ ื“ื™ ืกื˜ื•ื“ืขื ื˜ืŸ ืคื•ืŸ ื“ืขื ืงื•ืจืก "ื“ืึทื˜ืึท ื™ื ื–ืฉืขื ื™ืจ".

ืกื˜ืจื™ืžื™ื ื’ ื–ื™ื™ึทืœ ื“ืึทื˜ืŸ ืžื™ื˜ ืึทืคึผืึทื˜ืฉื™ ืขืจืึธื•

ืื™ื‘ืขืจ ื“ื™ ืœืขืฆื˜ืข ื‘ื™ืกืœ ื•ื•ืึธื›ืŸ ืžื™ืจ ื”ืึธื‘ืŸ ื ืึธื ื’ ืœื™ ืฆื•ื’ืขื’ืขื‘ืŸ ืฆื• ืึทืคึผืึทื˜ืฉื™ ืขืจืึธื• ื‘ื™ื™ื ืขืจื™ ืกื˜ืจื™ืžื™ื ื’ ืคึฟืึธืจืžืึทื˜, ืงืึทืžืคึผืœืึทืžืขื ื˜ื™ื ื’ ื“ื™ ื™ื’ื–ื™ืกื˜ื™ื ื’ ื˜ืจืึทืค ืึทืงืกืขืก / IPC ื˜ืขืงืข ืคึฟืึธืจืžืึทื˜. ืžื™ืจ ื”ืึธื‘ืŸ Java ืื•ืŸ C ++ ื™ืžืคึผืœืึทืžืึทื ืฅ ืื•ืŸ ืคึผื™ื˜ื”ืึธืŸ ื‘ื™ื™ื ื“ื™ื ื’ื–. ืื™ืŸ ื“ืขื ืึทืจื˜ื™ืงืœ, ืื™ืš ื•ื•ืขื˜ ื“ืขืจืงืœืขืจืŸ ื•ื•ื™ ื“ื™ ืคึฟืึธืจืžืึทื˜ ืึทืจื‘ืขื˜ ืื•ืŸ ื•ื•ื™ื™ึทื–ืŸ ื•ื•ื™ ืื™ืจ ืงืขื ืขืŸ ื“ืขืจื’ืจื™ื™ื›ืŸ ื–ื™ื™ืขืจ ื”ื•ื™ืš ื“ืึทื˜ืŸ ื˜ืจื•ืคึผื•ื˜ ืคึฟืึทืจ ืึท ืคึผืึทื ื“ืึทืก ื“ืึทื˜ืึทืคืจืึทืžืข.

ืกื˜ืจื™ืžื™ื ื’ ื–ื™ื™ึทืœ ื“ืึทื˜ืึท

ื ืคึผืจืึธืกื˜ ืงืฉื™ื ืื™ืš ื‘ืึทืงื•ืžืขืŸ ืคื•ืŸ ืขืจืึธื• ื™ื•ื–ืขืจื– ืื™ื– ื“ื™ ื”ื•ื™ืš ืคึผืจื™ื™ึทื– ืคื•ืŸ ืžื™ื™ื’ืจื™ื™ื˜ื™ื ื’ ื’ืจื•ื™ืก ืฉื˜ืขืœื˜ ืคื•ืŸ ื˜ืึทื‘ื•ืœืึทืจ ื“ืึทื˜ืŸ ืคื•ืŸ ืึท ืจื•ื“ืขืจืŸ ืึธื“ืขืจ ืจืขืงืึธืจื“-ืึธืจื™ืขื ื˜ื™ื“ ืคึฟืึธืจืžืึทื˜ ืฆื• ืึท ื–ื™ื™ึทืœ-ืึธืจื™ืขื ื˜ื™ื“ ืคึฟืึธืจืžืึทื˜. ืคึฟืึทืจ ืžืึทืœื˜ื™-ื’ื™ื’ืื‘ื™ื™ื˜ ื“ืึทื˜ืึทืกืขืฅ, ื˜ืจืึทื ืกืคึผืึธืกื™ื ื’ ืื™ืŸ ื–ื›ึผืจื•ืŸ ืึธื“ืขืจ ืื•ื™ืฃ ื“ื™ืกืง ืงืขื ืขืŸ ื–ื™ื™ืŸ ืึท ืึธื•ื•ื•ืขืจื•ื•ืขืœืžื™ื ื’ ืึทืจื‘ืขื˜.

ืฆื• ืกื˜ืจื™ืžื™ื ื’ ื“ืึทื˜ืŸ, ืฆื™ ื“ื™ ืžืงื•ืจ ื“ืึทื˜ืŸ ื–ืขื ืขืŸ ืจื•ื“ืขืจืŸ ืึธื“ืขืจ ื–ื™ื™ึทืœ, ืื™ื™ืŸ ืึธืคึผืฆื™ืข ืื™ื– ืฆื• ืฉื™ืงืŸ ืงืœื™ื™ืŸ ื‘ืึทื˜ืฉืึทื– ืคื•ืŸ ืจืึธื•ื–, ื™ืขื“ืขืจ ืžื™ื˜ ืึท ืงืึธืœื•ืžื ืขืจ ืื•ื™ืกืœื™ื™ื’ ื™ืŸ.

ืื™ืŸ ืึทืคึผืึทื˜ืฉื™ ืขืจืึธื•, ื“ื™ ื–ืึทืžืœื•ื ื’ ืคื•ืŸ ืื™ืŸ-ื–ื™ืงืึธืจืŸ ื–ื™ื™ึทืœ ืขืจื™ื™ื– ืจืขืคึผืจื™ื–ืขื ื˜ื™ื ื’ ืึท ื˜ื™ืฉ ืคึผื™ื™ึทื“ืข ืื™ื– ื’ืขืจื•ืคืŸ ืึท ืจืขืงืึธืจื“ ืคึผืขืงืœ. ืฆื• ืคืึธืจืฉื˜ืขืœืŸ ืึท ืื™ื™ืŸ ื“ืึทื˜ืŸ ืกื˜ืจื•ืงื˜ื•ืจ ืคื•ืŸ ืึท ืœืึทื“ื–ืฉื™ืงืึทืœ ื˜ื™ืฉ, ืขื˜ืœืขื›ืข ื‘ืึทื˜ืฉืึทื– ืคื•ืŸ ืจืขืงืึธืจื“ืก ืงืขื ืขืŸ ื–ื™ื™ืŸ ื’ืขื–ืืžืœื˜.

ืื™ืŸ ื“ื™ ื™ื’ื–ื™ืกื˜ื™ื ื’ "ืจืึทื ื“ืึธื ืึทืงืกืขืก" ื˜ืขืงืข ืคึฟืึธืจืžืึทื˜, ืžื™ืจ ืจืขืงืึธืจื“ื™ืจืŸ ืžืขื˜ืึทื“ืึทื˜ืึท ืžื™ื˜ ื“ื™ ื˜ื™ืฉ ืกื˜ืฉืขืžืึท ืื•ืŸ ื‘ืœืึธืง ืื•ื™ืกืœื™ื™ื’ ืื™ืŸ ื“ื™ ืกื•ืฃ ืคื•ืŸ ื“ืขืจ ื˜ืขืงืข, ืึทืœืึทื•ื™ื ื’ ืื™ืจ ืฆื• ื’ืึธืจ ื‘ื™ืœื™ืง ืื•ื™ืกืงืœื™ื™ึทื‘ืŸ ืงื™ื™ืŸ ืคึผืขืงืœ ืคื•ืŸ ืจืขืงืึธืจื“ืก ืึธื“ืขืจ ืงื™ื™ืŸ ื–ื™ื™ึทืœ ืคื•ืŸ ืึท ื“ืึทื˜ืŸ ืฉื˜ืขืœืŸ. ืื™ืŸ ืึท ืกื˜ืจื™ืžื™ื ื’ ืคึฟืึธืจืžืึทื˜, ืžื™ืจ ืฉื™ืงืŸ ืึท ืกืขืจื™ืข ืคื•ืŸ โ€‹โ€‹ืึทืจื˜ื™ืงืœืขืŸ: ืึท ืึทื•ื˜ืœื™ื™ืŸ, ืื•ืŸ ื“ืขืžืึธืœื˜ ืื™ื™ื ืขืจ ืึธื“ืขืจ ืžืขืจ ื‘ืึทื˜ืฉืึทื– ืคื•ืŸ ืจืขืงืึธืจื“ืก.

ื“ื™ ืคืึทืจืฉื™ื“ืขื ืข ืคึฟืึธืจืžืึทื˜ื™ืจื•ื ื’ืขืŸ ืงื•ืงืŸ ืขืคึผืขืก ื•ื•ื™ ื“ืึธืก:

ืกื˜ืจื™ืžื™ื ื’ ื–ื™ื™ึทืœ ื“ืึทื˜ืŸ ืžื™ื˜ ืึทืคึผืึทื˜ืฉื™ ืขืจืึธื•

ืกื˜ืจื™ืžื™ื ื’ ื“ืึทื˜ืŸ ืื™ืŸ PyArrow: ืึทืคึผืคึผืœื™ืงืึทื˜ื™ืึธืŸ

ืฆื• ื•ื•ื™ื™ึทื–ืŸ ืื™ืจ ื•ื•ื™ ื“ืึธืก ืึทืจื‘ืขื˜, ืื™ืš ื•ื•ืขื˜ ืžืึทื›ืŸ ืึท ื‘ื™ื™ืฉืคึผื™ืœ ื“ืึทื˜ืึทืกืขื˜ ืจืขืคึผืจื™ื–ืขื ื˜ื™ื ื’ ืึท ืื™ื™ืŸ ื˜ื™ื™ึทืš ืคึผื™ื™ึทื“ืข:

import time
import numpy as np
import pandas as pd
import pyarrow as pa

def generate_data(total_size, ncols):
    nrows = int(total_size / ncols / np.dtype('float64').itemsize)
    return pd.DataFrame({
        'c' + str(i): np.random.randn(nrows)
        for i in range(ncols)
    })	

ืื™ืฆื˜, ืœืึธื–ืŸ ืก ื–ืึธื’ืŸ ืžื™ืจ ื•ื•ื™ืœืŸ ืฆื• ืฉืจื™ื™ึทื‘ืŸ 1 ื’ื™ื’ืื‘ื™ื™ื˜ ืคื•ืŸ ื“ืึทื˜ืŸ, ืงืึทื ืกื™ืกื˜ื™ื ื’ ืคื•ืŸ ื˜ืฉืึทื ื’ืงืก ืคื•ืŸ 1 ืžื‘ ื™ืขื“ืขืจ, ืคึฟืึทืจ ืึท ื’ืึทื ืฅ ืคื•ืŸ 1024 ื˜ืฉืึทื ื’ืงืก. ืฆื• ืึธื ื”ื™ื™ื‘ืŸ, ืœืึธื–ืŸ ืื•ื ื“ื– ืžืึทื›ืŸ ื“ื™ ืขืจืฉื˜ืขืจ 1 ืžื‘ ื“ืึทื˜ืŸ ืจืึทื ืžื™ื˜ 16 ืฉืคืืœื˜ืŸ:

KILOBYTE = 1 << 10
MEGABYTE = KILOBYTE * KILOBYTE
DATA_SIZE = 1024 * MEGABYTE
NCOLS = 16

df = generate_data(MEGABYTE, NCOLS)

ื“ืขืจื ืึธืš ืื™ืš ื’ืขืจ ื–ื™ื™ ืฆื• pyarrow.RecordBatch:

batch = pa.RecordBatch.from_pandas(df)

ืื™ืฆื˜ ืื™ืš ื•ื•ืขืœ ืžืึทื›ืŸ ืึท ืจืขื–ื•ืœื˜ืึทื˜ ื˜ื™ื™ึทืš ื•ื•ืึธืก ื•ื•ืขื˜ ืฉืจื™ื™ึทื‘ืŸ ืฆื• ื‘ืึทืจืึทืŸ ืื•ืŸ ืฉืึทืคึฟืŸ StreamWriter:

sink = pa.InMemoryOutputStream()
stream_writer = pa.StreamWriter(sink, batch.schema)

ื“ืขืจื ืึธืš ืžื™ืจ ื•ื•ืขืœืŸ ืฉืจื™ื™ึทื‘ืŸ 1024 ื˜ืฉืึทื ื’ืงืก, ื•ื•ืึธืก ืœืขืกืึธืฃ ื•ื•ืขื˜ ื–ื™ื™ืŸ 1 ื’ื‘ ืคื•ืŸ ื“ืึทื˜ืŸ ืฉื˜ืขืœืŸ:

for i in range(DATA_SIZE // MEGABYTE):
    stream_writer.write_batch(batch)

ื–ื™ื ื˜ ืžื™ืจ ื’ืขืฉืจื™ื‘ืŸ ืฆื• ื‘ืึทืจืึทืŸ, ืžื™ืจ ืงืขื ืขืŸ ื‘ืึทืงื•ืžืขืŸ ื“ื™ ื’ืื ืฆืข ื˜ื™ื™ึทืš ืื™ืŸ ืื™ื™ืŸ ื‘ืึทืคืขืจ:

In [13]: source = sink.get_result()

In [14]: source
Out[14]: <pyarrow.io.Buffer at 0x7f2df7118f80>

In [15]: source.size
Out[15]: 1074750744

ื–ื™ื ื˜ ื“ื™ ื“ืึทื˜ืŸ ื–ืขื ืขืŸ ืื™ืŸ ื–ื›ึผืจื•ืŸ, ืœื™ื™ืขื ืขืŸ ื‘ืึทื˜ืฉืึทื– ืคื•ืŸ ืขืจืึธื• ืจืขืงืึธืจื“ืก ืื™ื– ืึท ื ื•ืœ-ืงืึธืคึผื™ืข ืึธืคึผืขืจืึทืฆื™ืข. ืื™ืš ืขืคึฟืขื ืขืŸ StreamReader, ืœื™ื™ืขื ืขืŸ ื“ืึทื˜ืŸ ืื™ืŸ pyarrow.Table, ืื•ืŸ ื“ืขืžืึธืœื˜ ื’ืขืจ ื–ื™ื™ ืฆื• DataFrame pandas:

In [16]: reader = pa.StreamReader(source)

In [17]: table = reader.read_all()

In [18]: table
Out[18]: <pyarrow.table.Table at 0x7fae8281f6f0>

In [19]: df = table.to_pandas()

In [20]: df.memory_usage().sum()
Out[20]: 1073741904

ืึทืœืข ื“ืขื ืื™ื–, ืคื•ืŸ ืงื•ืจืก, ื’ื•ื˜, ืึธื‘ืขืจ ืื™ืจ ืงืขืŸ ื”ืึธื‘ืŸ ืฉืืœื•ืช. ื•ื•ื™ ื’ืขืฉื•ื•ื™ื ื“ ื˜ื•ื˜ ื“ืึธืก ืคึผืึทืกื™ืจืŸ? ื•ื•ื™ ืึทืคืขืงืฅ ื“ื™ ื’ืจื™ื™ืก ืคื•ืŸ ื“ื™ ืคึผืึทื ื“ืึทืก ื“ืึทื˜ืึทืคืจืึทืžืข ืจื™ื˜ืจื™ื•ื•ืึทืœ ืคืึธืจืฉื˜ืขืœื•ื ื’?

ืกื˜ืจื™ืžื™ื ื’ ืคืึธืจืฉื˜ืขืœื•ื ื’

ื•ื•ื™ ื“ื™ ืกื˜ืจื™ืžื™ื ื’ ืฉื˜ื™ืง ื’ืจื™ื™ืก ื“ื™ืงืจื™ืกืึทื–, ื“ื™ ืคึผืจื™ื™ึทื– ืคื•ืŸ ืจื™ืงืึทื ืกื˜ืจืึทืงื˜ื™ื ื’ ืึท ืงืึทื ื˜ื™ื’ื™ื•ืึทืก ืงืึธืœื•ืžื ืขืจ ื“ืึทื˜ืึทืคืจืึทืžืข ืื™ืŸ ืคึผืึทื ื“ืึทืก ื™ื ืงืจื™ืกื™ื– ืจืขื›ื˜ ืฆื• ื‘ืึทื˜ืœืึธื ื™ืฉ ืงืึทืฉ ืึทืงืกืขืก ืคึผืึทื˜ืขืจื ื–. ืขืก ืื™ื– ืื•ื™ืš ืขื˜ืœืขื›ืข ืึธื•ื•ื•ืขืจื›ืขื“ ืคื•ืŸ ืืจื‘ืขื˜ืŸ ืžื™ื˜ C ++ ื“ืึทื˜ืŸ ืกื˜ืจืึทืงื˜ืฉืขืจื– ืื•ืŸ ืขืจื™ื™ื– ืื•ืŸ ื–ื™ื™ืขืจ ื–ื›ึผืจื•ืŸ ื‘ืึทืคืขืจื–.

ืคึฟืึทืจ 1 ืžื‘, ื•ื•ื™ ืื•ื™ื‘ืŸ, ืื•ื™ืฃ ืžื™ื™ืŸ ืœืึทืคึผื˜ืึทืคึผ (ืงื•ื•ืึทื“-ื”ืึทืจืฅ Xeon E3-1505M) ืขืก ื˜ื•ืจื ืก ืื•ื™ืก:

In [20]: %timeit pa.StreamReader(source).read_all().to_pandas()
10 loops, best of 3: 129 ms per loop

ืขืก ื˜ื•ืจื ืก ืื•ื™ืก ืึทื– ื“ื™ ืขืคืขืงื˜ื™ื•ื• ื˜ืจื•ืคึผื•ื˜ ืื™ื– 7.75 ื’ื™ื’ืื‘ื™ื™ื˜ / s ืฆื• ื•ืžืงืขืจืŸ ืึท 1 ื’ื‘ ื“ืึทื˜ืึทืคืจืึทืžืข ืคึฟื•ืŸ 1024 1 ืžื‘ ื˜ืฉืึทื ื’ืงืก. ื•ื•ืึธืก ื›ืึทืคึผืึทื ื– ืื•ื™ื‘ ืžื™ืจ ื ื•ืฆืŸ ื’ืจืขืกืขืจืข ืึธื“ืขืจ ืงืœืขื ืขืจืขืจ ื˜ืฉืึทื ื’ืงืก? ื“ืึธืก ื–ืขื ืขืŸ ื“ื™ ืจืขื–ื•ืœื˜ืึทื˜ืŸ:

ืกื˜ืจื™ืžื™ื ื’ ื–ื™ื™ึทืœ ื“ืึทื˜ืŸ ืžื™ื˜ ืึทืคึผืึทื˜ืฉื™ ืขืจืึธื•

ืคืึธืจืฉื˜ืขืœื•ื ื’ ื˜ืจืืคื ืก ื‘ืื˜ื™ื™ื˜ื™ืง ืคื•ืŸ 256 ืง ืฆื• 64 ืง ื˜ืฉืึทื ื’ืงืก. ืื™ืš ืื™ื– ื’ืขื•ื•ืขืŸ ืกืึทืคึผืจื™ื™ื–ื“ ืึทื– 1 ืžื‘ ื˜ืฉืึทื ื’ืงืก ื–ืขื ืขืŸ ืคึผืจืึทืกืขืกื˜ ืคืึทืกื˜ืขืจ ื•ื•ื™ 16 ืžื‘ ื˜ืฉืึทื ื’ืงืก. ืขืก ืื™ื– ื›ื“ืื™ ืฆื• ื“ื•ืจื›ืคื™ืจืŸ ืึท ืžืขืจ ื’ืจื•ื ื˜ื™ืง ืœืขืจื ืขืŸ ืื•ืŸ ืคึฟืึทืจืฉื˜ื™ื™ืŸ ืฆื™ ื“ืึธืก ืื™ื– ืึท ื ืึธืจืžืึทืœ ืคืึทืจืฉืคึผืจื™ื™ื˜ื•ื ื’ ืึธื“ืขืจ ืฆื™ ืขืก ืื™ื– ืขืคึผืขืก ืึทื ื“ืขืจืฉ ืื™ืŸ ืฉืคึผื™ืœ.

ืื™ืŸ ื“ืขื ืงืจืึทื ื˜ ื™ืžืคึผืœืึทืžืขื ื˜ื™ื™ืฉืึทืŸ ืคื•ืŸ ื“ืขื ืคึฟืึธืจืžืึทื˜, ื“ื™ ื“ืึทื˜ืŸ ื–ืขื ืขืŸ ื ื™ืฉื˜ ืงืึทืžืคึผืจืขืกื˜ ืื™ืŸ ืคึผืจื™ื ืฆื™ืคึผ, ืึทื–ื•ื™ ื“ื™ ื’ืจื™ื™ืก ืื™ืŸ ื–ื›ึผืจื•ืŸ ืื•ืŸ "ืื™ืŸ ื“ื™ ื•ื•ื™ืจืขืก" ืื™ื– ื‘ืขืขืจืขืš ื“ืขืจ ื–ืขืœื‘ื™ืงืขืจ. ืื™ืŸ ื“ืขืจ ืฆื•ืงื•ื ืคึฟื˜, ืงืึทืžืคึผืจืขืฉืึทืŸ ืงืขืŸ ื–ื™ื™ืŸ ืึทืŸ ื ืึธืš ืึธืคึผืฆื™ืข.

ื’ืึทื ืฅ

ืกื˜ืจื™ืžื™ื ื’ ืงืึธืœื•ืžื ืขืจ ื“ืึทื˜ืŸ ืงืขื ืขืŸ ื–ื™ื™ืŸ ืึท ืขืคืขืงื˜ื™ื•ื• ื•ื•ืขื’ ืฆื• ืงืึธืจืžืขืŸ ื’ืจื•ื™ืก ื“ืึทื˜ืŸ ืฉื˜ืขืœื˜ ืื™ืŸ ืงืึธืœื•ืžื ืขืจ ืึทื ืึทืœื™ื˜ื™ืงืก ืžื›ืฉื™ืจื™ื ื•ื•ื™ ืคึผืึทื ื“ืึทืก ืื™ืŸ ืงืœื™ื™ืŸ ืฉื˜ื™ืงืขืจ. ื“ืึทื˜ืŸ ื‘ืึทื“ื™ื ื•ื ื’ืก ื•ื•ืึธืก ื ื•ืฆืŸ ืจื•ื“ืขืจืŸ-ืึธืจื™ืขื ื˜ื™ื“ ืกื˜ืึธืจื™ื“ื–ืฉ ืงืขื ืขืŸ ืึทืจื™ื‘ืขืจืคื™ืจืŸ ืื•ืŸ ื˜ืจืึทื ืกืคึผืึธืกื™ืจืŸ ืงืœื™ื™ืŸ ื˜ืฉืึทื ื’ืงืก ืคื•ืŸ ื“ืึทื˜ืŸ ื•ื•ืึธืก ื–ืขื ืขืŸ ืžืขืจ ื‘ืึทืงื•ื•ืขื ืคึฟืึทืจ ื“ื™ื™ืŸ ืคึผืจืึทืกืขืกืขืจ L2 ืื•ืŸ L3 ืงืึทื˜ืฉืขืก.

ื’ืึทื ืฅ ืงืึธื“

import time
import numpy as np
import pandas as pd
import pyarrow as pa

def generate_data(total_size, ncols):
    nrows = total_size / ncols / np.dtype('float64').itemsize
    return pd.DataFrame({
        'c' + str(i): np.random.randn(nrows)
        for i in range(ncols)
    })

KILOBYTE = 1 << 10
MEGABYTE = KILOBYTE * KILOBYTE
DATA_SIZE = 1024 * MEGABYTE
NCOLS = 16

def get_timing(f, niter):
    start = time.clock_gettime(time.CLOCK_REALTIME)
    for i in range(niter):
        f()
    return (time.clock_gettime(time.CLOCK_REALTIME) - start) / NITER

def read_as_dataframe(klass, source):
    reader = klass(source)
    table = reader.read_all()
    return table.to_pandas()
NITER = 5
results = []

CHUNKSIZES = [16 * KILOBYTE, 64 * KILOBYTE, 256 * KILOBYTE, MEGABYTE, 16 * MEGABYTE]

for chunksize in CHUNKSIZES:
    nchunks = DATA_SIZE // chunksize
    batch = pa.RecordBatch.from_pandas(generate_data(chunksize, NCOLS))

    sink = pa.InMemoryOutputStream()
    stream_writer = pa.StreamWriter(sink, batch.schema)

    for i in range(nchunks):
        stream_writer.write_batch(batch)

    source = sink.get_result()

    elapsed = get_timing(lambda: read_as_dataframe(pa.StreamReader, source), NITER)

    result = (chunksize, elapsed)
    print(result)
    results.append(result)

ืžืงื•ืจ: www.habr.com

ืœื™ื™ื’ืŸ ืึท ื‘ืึทืžืขืจืงื•ื ื’