Cassandra NoSQL

Ramses Alexander Coraspe Valdez

Created on July 4, 2020

Cassandra and CQL

Cassandra is one of the most popular NoSQL solutions today, it offers more advantageous features compared to other NoSQL technologies. Its main focus is on the ability to distribute information on various servers. Getting started with Cassandra is easy if you already have experience with some SQL databases.

The Cassandra Query Language (CQL) is the query language for communicating with Cassandra database using cqlsh, it is very similar to SQL. CQLSH facilitates the interaction with Cassandra.

Keyspaces

A keyspace is a collection of column families and can be seen as the outermost container for all the data in Cassandra for a particular project. All the data in Cassandra should live inside a keyspace. This can be like a database in RDBMS which is a collection of tables.

CREATE KEYSPACE verano2020 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

Table

Table is equivalent to a “column families” in Cassandra.

CREATE TABLE verano2020.perfil_estudiante (
    nombre TEXT,
    apellido TEXT,
    matricula TEXT,
    programa TEXT,
    materiasactivas SET<TEXT>,
    materiasterminadas SET<TEXT>,
    otros LIST<TEXT>,
    calificacionmateria MAP<TEXT, FLOAT>,
    PRIMARY KEY (matricula, programa, apellido, nombre)
);

Insert, Update and Select

SET

A set consists of a group of elements with unique values. Duplicate values will not be stored distinctly. The values of a set are stored unordered, but will return the elements in sorted order when queried. Use the set data type to store data that has a many-to-one relationship with another column. For example, in the example below, a set called teams stores all the teams that a cyclist has been a member of during their career.

https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useSet.html

LIST:

A list has a form much like a set, in that a list groups and stores values. Unlike a set, the values stored in a list do not need to be unique and can be duplicated. In addition, a list stores the elements in a particular order and may be inserted or retrieved according to an index value.

Use the list data type to store data that has a possible many-to-many relationship with another column. For example, in the example below, a list called events stores all the race events on an upcoming calendar. Each month/year pairing might have several events occurring, and the races are stored in a list. The list can be ordered so that the races appear in the order that they will take place, rather than alphabetical order.

https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useList.html

MAP:

A map relates one item to another with a key-value pair. For each key, only one value may exist, and duplicates cannot be stored. Both the key and the value are designated with a data type.

.Using the map type, you can store timestamp-related information in user profiles. Each element of the map is internally stored as one Cassandra column that you can modify, replace, delete, and query. Each element can have an individual time-to-live and expire when the TTL ends.

https://docs.datastax.com/en/cql-oss/3.3/cql/cql_using/useMap.html

INSERT INTO verano2020.perfil_estudiante (matricula, programa, apellido, nombre) VALUES ('A00354xxx', 'MCC09', 'coraspe', 'ramses');

SELECT * FROM verano2020.perfil_estudiante;

alt text

UPDATE  verano2020.perfil_estudiante
SET materiasactivas = materiasactivas + {'data analytic', 'data mining'}  WHERE matricula = 'A00354xxx' and programa = 'MCC09' and apellido = 'coraspe' and nombre = 'ramses';

SELECT * FROM verano2020.perfil_estudiante;

alt text

UPDATE  verano2020.perfil_estudiante
SET materiasactivas = materiasactivas + {'extra subject'}  WHERE matricula = 'A00354xxx' and programa = 'MCC09' and apellido = 'coraspe' and nombre = 'ramses';
SELECT * FROM verano2020.perfil_estudiante;

alt text

UPDATE  verano2020.perfil_estudiante
SET materiasactivas = materiasactivas + {'extra subject 2'}  WHERE matricula = 'A00354xxx' and programa = 'MCC09' and apellido = 'coraspe' and nombre = 'ramses';
SELECT * FROM verano2020.perfil_estudiante;

alt text

elements will be removed from the SET with the ‘-’ symbol

UPDATE  verano2020.perfil_estudiante
SET materiasactivas = materiasactivas - {'extra subject'}  WHERE matricula = 'A00354xxx' and programa = 'MCC09' and apellido = 'coraspe' and nombre = 'ramses'
SELECT * FROM verano2020.perfil_estudiante;

alt text

Removing multiple elements from the SET

UPDATE verano2020.perfil_estudiante
SET materiasactivas = materiasactivas - {'extra subject','extra subject 2'} WHERE matricula = 'A00354xxx' AND programa = 'MCC09' AND apellido = 'coraspe' AND nombre = 'ramses';
SELECT * FROM verano2020.perfil_estudiante;

alt text

UPDATE verano2020.perfil_estudiante
  SET materiasactivas = materiasactivas + {'Math 2'}, materiasterminadas = materiasterminadas + {'Math 1'} WHERE matricula = 'A00354xxx' AND programa = 'MCC09' AND apellido = 'coraspe' AND nombre = 'ramses';
  SELECT * FROM verano2020.perfil_estudiante;

alt text

 UPDATE verano2020.perfil_estudiante  
SET materiasactivas = materiasactivas + { 'Math 3', 'Math 4'} , calificacionmateria = calificacionmateria + {'data analytic':10, 'data mining':10}
WHERE matricula = 'A00354xxx' AND programa = 'MCC09' AND apellido = 'coraspe' AND nombre = 'ramses';

alt text

Exporting the data to a file

COPY  verano2020.perfil_estudiante(matricula,programa,apellido,nombre,calificacionmateria,materiasactivas,materiasterminadas,otros) TO 'C:\Users\ramse\Desktop\Data_analytic\Cassandra\datos.csv' WITH HEADER = TRUE and DELIMITER = ',';

Importing the data to a Cassandra database

CREATE KEYSPACE verano2020_version2 WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };

CREATE TABLE verano2020_version2.perfil_estudiante (
    nombre TEXT,
    apellido TEXT,
    matricula TEXT,
    programa TEXT,
    materiasactivas SET<TEXT>,
    materiasterminadas SET<TEXT>,
    otros LIST<TEXT>,
    calificacionmateria MAP<TEXT, FLOAT>,
    PRIMARY KEY (matricula, programa, apellido, nombre)
);

COPY verano2020_version2.perfil_estudiante(matricula,programa,apellido,nombre,calificacionmateria,materiasactivas,materiasterminadas,otros) FROM 'C:\Users\ramse\Desktop\Data_analytic\Cassandra\datos.csv' WITH HEADER = TRUE and DELIMITER = ',';

select * from verano2020_version2.perfil_estudiante

alt text