
Cassandra CQL3 Schema Design
- or -
Post a project like this4337
$100
- Posted:
- Proposals: 1
- Remote
- #458651
- Awarded
Description
Experience Level: Intermediate
General information for the business: Analytics
Description of requirements/functionality: I need a Cassandra schema for the following purpose:
I will be collecting time series data from multiple sources. Each Source has multiple Subsources(this can be arbitrary). Each Subsource has multiple datas (data1, data2, …, data). This data arrives from each Source 2-3 times per minute (timestamp at each data arrival).
I plan to buck each row by Source every hour. RowKey = Source_YYMMDDHH
This is important because I want to retrieve data time-locally so I don't want it spread on multiple machines too much.
Since the row width is arbitrary, to prevent the row from getting too fat I will use row_config::max_len (not today's problem).
I have gone over these:
http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
The last one is most relevant. I need the Data Column Family. But there are not actual code examples on creating the Table (Column Family).
Maybe the end with CQL3 is relevant:
http://planetcassandra.org/blog/post/datastax-developer-blog-cql3-for-cassandra-experts/
How do I create such a table? and why?
I am unsure how to specify the Keys (compound, composite, or just primary).
According to this, I pretty sure I do NOT want composite partition keys (which will spread the data over multiple nodes):
http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_compound_keys_c.html
Is there anything else I am missing?
(If the below is totally wrong, make it from scratch.)
CREATE TABLE DataStore (
Source_yymmddhh text,
Source text,
ip text,
time_current timestamp,
Subsource text,
data1 int,
data2 int,
data3 int,
PRIMARY KEY (Source_yymmddhh, ????????)
);
Will I need to do the following?
CREATE INDEX ON DataStore(Source)
CREATE INDEX ON DataStore(Subsource)
Specific technologies required: Cassandra
Extra notes:
Description of requirements/functionality: I need a Cassandra schema for the following purpose:
I will be collecting time series data from multiple sources. Each Source has multiple Subsources(this can be arbitrary). Each Subsource has multiple datas (data1, data2, …, data). This data arrives from each Source 2-3 times per minute (timestamp at each data arrival).
I plan to buck each row by Source every hour. RowKey = Source_YYMMDDHH
This is important because I want to retrieve data time-locally so I don't want it spread on multiple machines too much.
Since the row width is arbitrary, to prevent the row from getting too fat I will use row_config::max_len (not today's problem).
I have gone over these:
http://planetcassandra.org/blog/post/getting-started-with-time-series-data-modeling/
http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/
http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
The last one is most relevant. I need the Data Column Family. But there are not actual code examples on creating the Table (Column Family).
Maybe the end with CQL3 is relevant:
http://planetcassandra.org/blog/post/datastax-developer-blog-cql3-for-cassandra-experts/
How do I create such a table? and why?
I am unsure how to specify the Keys (compound, composite, or just primary).
According to this, I pretty sure I do NOT want composite partition keys (which will spread the data over multiple nodes):
http://www.datastax.com/documentation/cql/3.0/cql/ddl/ddl_compound_keys_c.html
Is there anything else I am missing?
(If the below is totally wrong, make it from scratch.)
CREATE TABLE DataStore (
Source_yymmddhh text,
Source text,
ip text,
time_current timestamp,
Subsource text,
data1 int,
data2 int,
data3 int,
PRIMARY KEY (Source_yymmddhh, ????????)
);
Will I need to do the following?
CREATE INDEX ON DataStore(Source)
CREATE INDEX ON DataStore(Subsource)
Specific technologies required: Cassandra
Extra notes:
Pete P.
100% (9)Projects Completed
12
Freelancers worked with
12
Projects awarded
65%
Last project
15 Sep 2014
United States
New Proposal
Login to your account and send a proposal now to get this project.
Log inClarification Board Ask a Question
-
There are no clarification messages.
We collect cookies to enable the proper functioning and security of our website, and to enhance your experience. By clicking on 'Accept All Cookies', you consent to the use of these cookies. You can change your 'Cookies Settings' at any time. For more information, please read ourCookie Policy
Cookie Settings
Accept All Cookies