How to copy data from a Cassandra table to another structure for better performance

You can use cqlsh COPY command :
To copy your invoices data into csv file use :

COPY invoices(id_invoice, year, id_client, type_invoice) TO 'invoices.csv';

And Copy back from csv file to table in your case invoices_yr use :

COPY invoices_yr(id_invoice, year, id_client, type_invoice) FROM 'invoices.csv';

If you have huge data you can use sstable writer to write and sstableloader to load data faster. http://www.datastax.com/dev/blog/using-the-cassandra-bulk-loader-updated

To echo what was said about the COPY command, it is a great solution for something like this.

However, I will disagree with what was said about the Bulk Loader, as it is infinitely harder to use. Specifically, because you need to run it on every node (whereas COPY needs to only be run on a single node).

To help COPY scale for larger data sets, you can use the PAGETIMEOUT and PAGESIZE parameters.

COPY invoices(id_invoice, year, id_client, type_invoice) 
  TO 'invoices.csv' WITH PAGETIMEOUT=40 AND PAGESIZE=20;

Using these parameters appropriately, I have used COPY to successfully export/import 370 million rows before.

For more info, check out this article titled: New options and better performance in cqlsh copy.

An alternative to using COPY command (see other answers for examples) or Spark to migrate data is to create a materialized view to do the denormalization for you.

CREATE MATERIALIZED VIEW invoices_yr AS
       SELECT * FROM invoices
       WHERE id_client IS NOT NULL AND type_invoice IS NOT NULL AND id_client IS NOT NULL
       PRIMARY KEY ((type_invoice), year, id_client)
       WITH CLUSTERING ORDER BY (year DESC)

Cassandra will fill the table for you then so you wont have to migrate yourself. With 3.5 be aware that repairs don't work well (see CASSANDRA-12888).

Note: that Materialized Views are probably not best idea to use and has been changed to "experimental" status

How to copy data from a Cassandra table to another structure for better performance

Tags:

Cassandra

Cql

Cql3

Cqlsh

Related

Recent Posts