• LOGIN
  • No products in the cart.

Cassandra Admin Interview Questions – 2021

1. What is the use of Cassandra and why to use Cassandra?

Cassandra was designed to handle big data workloads across multiple nodes without any single point of failure.  The various factors responsible for using Cassandra are:

  • It is fault tolerant and consistent.
  • Gigabytes to petabytes scalabilities.
  • It is a column-oriented database.
  • No single point of failure.
  • No need for separate caching layer.
  • Flexible schema design.
  • It has flexible data storage, easy data distribution, and fast writes.
  • It supports ACID (Atomicity, Consistency, Isolation, and Durability)properties.
  • Multi-data center and cloud capable.
  • Data compression.

2. On what platforms does Cassandra run?

Since Cassandra is a Java application, it can successfully run on any Java-driven platform or on Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on Red Hat, CentOS, Debian, and Ubuntu Linux platforms.

3. Which are the ports that Cassandra uses?

The default settings state that Cassandra uses 7000 port for Cluster Management, 9160 for Thrift Clients, and 8080 for JMX. These are all TCP ports and can be edited in the configuration file: bin/cassandra.in.sh

4. Explain what is composite type in Cassandra?

In Cassandra, composite type allows to define key or a column name with a concatenation of data of different type. You can use two types of Composite Type:

  • Row Key
  • Column Name

5. How Cassandra stores data?

All data stored as bytes, when you specify validator, Cassandra ensures those bytes are encoded as per requirement.Then a comparator orders the column based on the ordering specific to the encoding. While composite are just byte arrays with a specific encoding, for each component it stores a two byte length followed by the byte encoded component followed by a termination bit.

6. How to write a query in Cassandra?

Using CQL (Cassandra Query Language) we can write queries in Cassandra. Cqlsh is used for interacting with the database.

7. What is Column Family?

As the name suggests, a column family refers to a structure having an infinite number of rows. Those are referred by a key–value pair, where the key is the name of the column and the value represents the column data. It is much similar to a hashmap in Java or a dictionary in Python. Remember, the rows are not limited to a predefined list of columns here. Also, the column family is absolutely flexible with one row having 100 columns while the other having only 2 columns.

8. What is the difference between Column and Super Column?

Both elements work on the principle of tuples having name and value. However, the former’s value is a string, while the value of the latter is a map of columns with different data types. Unlike Columns, Super Columns do not contain the third component of timestamp.

9. Explain what is a cluster in Cassandra?

A cluster is a container for keyspaces. Cassandra database is segmented over several machines that operate together. The cluster is the outermost container which arranges the nodes in a ring format and assigns data to them.  These nodes have a replica which takes charge in case of data handling failure.

10. Explain what is a keyspace in Cassandra?

In Cassandra, a keyspace is a namespace that determines data replication on nodes. A cluster consists of one keyspace per node.

11. What is the syntax to create keyspace in Cassandra?

Syntax for creating keyspace in Cassandra is :

CREATE KEYSPACE <identifier> WITH <properties>

12. Explain the concept of compaction in Cassandra?

Compaction refers to a maintenance process in Cassandra, in which the SSTables are reorganized for data optimization of data structures on the disk. The compaction process is useful during interacting with memtables. There are two types of compaction in Cassandra.

Minor compaction: It gets started automatically when a new SSTable is created. Here, Cassandra condenses all the equally sized SSTables into one.

Major compaction: It is triggered manually using the nodetool. It compacts all SSTables of a column family into one.

13.  Mention what are the values stored in the Cassandra Column?

In Cassandra Column, basically there are three values:

  • Column Name
  • Value
  • Time Stamp

14.. Explain what is Cassandra-Cqlsh?

Cassandra-Cqlsh is a query language that enables users to communicate with its database. By using Cassandra cqlsh, you can do following things:

  • Define a schema
  • Insert a data and
  • Execute a query

15. Explain Tombstone in Cassandra.

Tombstone is a row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassandra supports eventual consistency, where the data must respond before any successful operation.

16. Can you add or remove column families in a working cluster?

Yes, but while doing that we need to keep in mind the following processes:

  • Do not forget to clear the commitlog with ‘nodetool drain’
  • Turn off Cassandra to ensure that there is no data left in the commitlog
  • Delete the SSTable files for the removed CFs

17. What is replication factor in Cassandra?

Replication factor is the measure of the number of data copies existing. It is important to increase the replication factor to log into the cluster.

18. Mention what do the shell commands “Capture” and “Consistency” determines?

There are various Cqlsh shell commands in Cassandra. Command “Capture”, captures the output of a command and adds it to a file while, command “Consistency” display the current consistency level or set a new consistency level.

19. Explain what is Memtable in Cassandra?

Cassandra writes the data to an in memory structure known as Memtable. It is an in-memory cache with content stored as key/column. There is a separate Memtable for each ColumnFamily, and it retrieves column data from the key. Similar to table, memtable is in-memory/write-back cache space consisting of content in key and column format. The data in memtable is sorted by key, and each ColumnFamily consist of a distinct memtable that retrieves column data via key. It stores the writes until it is full, and then flushed out.

20. Explain what is SStable consist of?

SStable consist of mainly 2 files:

Index file ( Bloom filter & Key offset pairs)

Data file (Actual column data)

21. Does Cassandra support ACID transactions?

Unlike relational databases, Cassandra does not support ACID transactions.

22. Define the use of the source command in Cassandra.

Source command is used to execute a file consisting of CQL statements.

23. Explain what is Bloom Filter is used for in Cassandra?

A bloom filter is a space efficient data structure that is used to test whether an element is a member of a set. In other words, it is used to determine whether an SSTable has data for a particular row. In Cassandra it is used to save IO when performing a KEY LOOKUP.

24.  How does Cassandra write?

 Cassandra performs the write function by applying two commits-first it writes to a commit log on disk and then commits to an in-memory structured known as memtable. Once the two commits are successful, the write is achieved. Writes are written in the table structure as SSTable (sorted string table). Cassandra offers speedier write performance.

25.  Define the management tools in Cassandra.

 DataStaxOpsCenter: internet-based management and monitoring solution for Cassandra cluster and DataStax. It is free to download and includes an additional Edition of OpsCenter.SPM primarily administers Cassandra metrics and various OS and JVM metrics. Besides Cassandra, SPM also monitors Hadoop, Spark, Solr, Storm, zookeeper and other Big Data platforms. The main features of SPM include correlation of events and metrics, distributed transaction tracing, creating real-time graphs with zooming, anomaly detection and heartbeat alerting.

26. Can we change the replication factor on a live cluster?

Yes, but it will require running repair to alter the replica count of the existing data.

27. How to iterate all rows in a Column Family?

Using get_range_slices. You can start iteration with an empty string, and after each iteration the last key read serves as the start key for the next iteration.

28. What is SSTable? How is it different from other relational tables?

 SSTable expands to ‘Sorted String Table,’ which refers to an important data file in Cassandra and accepts regular written memtables. They are stored on disk and exist for each Cassandra table. Exhibiting immutability, SStables do not allow any further addition and removal of data items once written. For each SSTable, Cassandra creates three separate files like partition index, partition summary and a bloom filter.

29. Explain the concept of Bloom Filter.

Associated with SSTable, Bloom filter is an off-heap data structure to check whether there is any data available in the SSTable before performing any I/O disk operation.

30. What OS Cassandra supports?

Windows and Linux.

31. What is Cassandra Data Model?

 Cassandra Data Model consists of four main components:

  • Cluster: Made up of multiple nodes and keyspaces
  • Keyspace: a namespace to group multiple column families, especially one per partition
  • Column: consists of a column name, value and timestamp
  • ColumnFamily: multiple columns with row key reference.

32. What is Thrift?

 Thrift is a legacy RPC protocol or API unified with a code generation tool for CQL. The purpose of using Thrift in Cassandra is to facilitate access to the DB across the programming language.

33. Explain Tombstone in Cassandra?

 Tombstone is row marker indicating a column deletion. These marked columns are deleted during compaction. Tombstones are of great significance as Cassnadra supports eventual consistency, where the data must respond before any successful operation.

34. What Platforms Cassandra runs on?

 Since Cassandra Online Training is a Java application, it can successfully run on any Java-driven platform or Java Runtime Environment (JRE) or Java Virtual Machine (JVM). Cassandra also runs on RedHat, CentOS, Debian and Ubuntu Linux platforms.

35. What is the main objective of creating Cassandra?

The main objective of Cassandra is to handle a large amount of data. Furthermore, the objective also ensures fault tolerance with the swift transfer of data.

36. Define data replication.

Data replication is an operation in which data from one node is copied to different nodes in the cluster. This operation ensures redundancy and fault tolerance in the database. The replication factor decides the number of copies and the replication strategy decides the nodes in which the data is copied.

37. Define commit log?

It is a mechanism that is used to recover data in case the database crashes. Every operation that is carried out is saved in the commit log. Using this data can be recovered.

38. How does Cassandra differ from Hadoop?

 The primary difference between Cassandra and Hadoop is that Cassandra targets real-time/operational data, while Hadoop has been designed for batch-based analytic work.

There are many different technical differences between Cassandra and Hadoop, including Cassandra’s underlying data structure (based on Google’s Bigtable), its fault-tolerant, peer-to-peer architecture, multi-data center capabilities, tunable data consistency, all nodes being the same (no concept of a namenode, etc.) and much more.

There are many different technical differences between Cassandra and Hadoop, including Cassandra’s underlying data structure (based on Google’s Bigtable), its fault-tolerant, peer-to-peer architecture, multi-data center capabilities, tunable data consistency, all nodes being the same (no concept of a namenode, etc.) and much more.

39. How does Cassandra differ from HBase?

HBase is an open-source, column-oriented data store modeled after Google Bigtable, and is designed to offer Bigtable-like capabilities on top of data stored in Hadoop. However, while HBase shared the Bigtable design with Cassandra, its foundational architecture is much different.

A Cassandra cluster is much easier to setup and configure than a comparable HBase cluster. HBase’s reliance on the Hadoop namenode equates to there being a single point of failure in HBase, whereas with Cassandra, because all nodes are the same, there is no such issue

In internal performance tests conducted at DataStax, Cassandra offered literally 5X better performance in writes and 4X better performance on reads than HBase.

40. How does Cassandra differ from MongoDB?

 MongoDB is a document-oriented database that is built upon a master-slave/sharding architecture. MongoDB is designed to store/manage collections of JSON-styled documents.

By contrast, Cassandra uses a peer-to-peer, write/read-anywhere styled architecture that is based on a combination of Google BigTable and Amazon Dynamo. This allows Cassandra to avoid the various complications and pitfalls of master/slave and sharding architectures. Moreover, Cassandra offers linear performance increases as new nodes are added to a cluster, scales to terabyte-petabyte data volumes, and has no single point of failure.

GoLogica Technologies Private Limited. All rights reserved 2024.