sstable | bigdata processing in golang
kandi X-RAY | sstable Summary
kandi X-RAY | sstable Summary
bigdata processing in golang.
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
- sort2Disk takes a reader and mapper and maps it to memory
- Reduce takes a number of chunks and calls the Reduce function .
- Add adds an entry to the set .
- findUnique finds all the files in r
- newDataSetReader returns a new dataSetReader .
- newStreamReader returns a new streamReader .
- newDataSet creates a new dataSet .
- newStreamAggregator returns a new streamAggregator .
sstable Key Features
sstable Examples and Code Snippets
Community Discussions
Trending Discussions on sstable
QUESTION
I am trying to understand SStable overlaps in cassandra which is not suitable for TWCS. I found references like https://thelastpickle.com/blog/2016/12/08/TWCS-part1.html but I still don't understand what overlap means and how it is caused by read repairs. Can anyone please provide a simple example that would help me to understand? Thanks
...ANSWER
Answered 2022-Mar-08 at 09:15For TWCS, data is compacted into "time windows". If you've configured a time window of 1 hour, TWCS will compact (combine) all partitions written within a one-hour window into a single SSTable. Over a 24-hour period you will end up with 24 SSTables, one for each hour of the day.
Let's say you inspect the SSTable generated at 9am. The minimum and maximum [write] timestamps in that SSTable would be between 8am and 9am.
Now consider a scenario where a replica has missed a few mutations (writes) around 10am. All the writes between 10am and 11am will get compacted to one SSTable. If a repair runs at 3pm, the missed mutations from earlier in that day will get included in the 3pm to 4pm time-window even when it really belongs to the SSTable from the 10-11am time-window.
In TWCS, SSTables from different time windows will not get compacted together. This means that the data from 2 different time windows is fragmented across 2 SSTables. Even if the 11am SSTable is expired, it cannot be dropped (deleted) from disk because there is data in the 4pm SSTable that overlaps with it. The 11am SSTable will not get dropped until all the data in the 4pm SSTable has also expired.
There's a simplified explanation of how TWCS works in How data is maintained in Cassandra. It includes a nice diagram which would hopefully make it easier for you to visualise how data could possibly overlap across SSTables. Cheers!
QUESTION
Am running nodetool rebuild, there is a table having 400 sstables on one node from where streaming is happening. Only one file is being streamed at a time, is there any way to parallelize this operation so that multiple sstables can be streamed in parallel rather than sequential file streaming.
...ANSWER
Answered 2022-Mar-05 at 08:01It isn't possible to increase the number of streaming threads. In any case, there are several factors which affect the speed of the streaming, not just network throughput. The type of disks as well as the data model have a significant impact on how quick the JVM can serialise the data to stream as well as how quick it can cleanup the heap (GC).
I see that you've already tried to increase the streaming throughput. Note that you'll need to increase it for both the sending and receiving nodes (and really, all nodes) otherwise, the stream will only be as fast as the slowest node. Cheers!
QUESTION
I am trying to understand the storage mechanism of cassandra under the hood
From reading the official doc it seems like
- write request write to mutable memtable
- when memtable gets too large, its written to sstable
so I have the following question
- is memtable durable?
- if there is heavy update qps does it mean that there is going to be multiple versions of stale data in both memtable and sstable such that read latency can increase? how does cassandra get the latest data? and how is multiple version of data stored?
- if there is heavy update qps does this mean there is alot of tombstone?
ANSWER
Answered 2022-Feb-06 at 14:39is memtable durable?
There is the memtable which is flushed to disk based on size / a few other settings, but at the point the write is accepted - it is not durable in the memtable. There is also an entry placed in the commitlog which by default will flush every 10 seconds. (so on RF 3, you would expect a flush every 3.33 seconds). The flushing of the commitlog makes it durable to that specific node. To entirely lose the write before this flush has occurred would require all replicas to have failed before any of them had performed a commit log flush. As long as 1 of them flushed, it would be durable.
if there is heavy update qps does it mean that there is going to be multiple versions of stale data in both memtable and sstable such that read latency can increase?
In terms of the memtable, no there will not be stale data. In terms of the SSTables on disk, yes, there can be multiple versions of a record as it is updated over time which would lead to an increase in read latencies. A good metric to look at is the SSTablesPerRead metric which will give you the histogram of how many SSTables are being accessed per DB Table for the queries you run. The p95 or higher is the main value to look at, these will be the scenarios that will be causing slowness.
how does cassandra get the latest data? and how is multiple version of data stored?
During the read of the data, it will use the read path (bloom filters, partition summary etc) and read all versions of the row - and discard the parts which are not needed, before returning the records to the calling application. The multiple versions of the row is a facet of it existing in more than 1 sstable.
Part of the role of compaction is to manage this scenario and to bring together the multiple copies, older and newer versions of a record, and writing out new SStables which only retain the newer version. (and the SSTables it compacted together are removed).
if there is heavy update qps does this mean there is alot of tombstone?
This depends on the type of update, for most normal updates - no, this does not generate tombstones. Updates on list collection types though can and will generate tombstones. If you are issuing deletions, then yes, it will generate tombstones.
If you are going to be running a scenario of heavy updates, then I would recommend considering LeveledCompactionStrategy instead of a default SizeTieredCompactionStrategy - it is likely to provide you better read performance, but at a higher compaction IO cost.
QUESTION
We deleted some old data within our 3 node cassandra cluster (v3.11) some days ago which shall now be restored from a Snapshot. Is there a possibility to restore the data from the snapshot without loosing updates made since the snapshot was taken?
There are two approaches which come to my mind
A)
- Create export via
COPY keypsace.table TO xy.csv
- Truncate table
- restore table from snapshot via sstableloader
- Reimport newer data via
COPY keypsace.table FROM xy.csv
B)
- Just copy sstable files of snapshot into current table directory
Is A) a feasible option? What do we need to consider so that the COPY FROM/TO commands get synchronized over all nodes? For option B) I read that the deletion commands that happend may be executed again (tombstone rows). Can I ignore this warning if we make sure the deletion commands happened more than 10 days ago (gc_grace_seconds)?
...ANSWER
Answered 2022-Feb-01 at 18:35for exporting/importing data from Apache Cassandra®, there is an efficient tool -- DataStax Bulk Loader (aka DSBulk). You could refer to more documentation and examples here. For getting consistent reads and writes, you could leverage --datastax-java-driver.basic.request.consistency LOCAL_QUORUM
in your unload
& load
commands.
QUESTION
After my Mac upgraded to Monterey, I had to reinstall cassandra from 3.x.x to 4.0.1.
I can't start Cassandra 4.0.1 using 'cassandra -f' command. I see following warning/errors:
...ANSWER
Answered 2022-Jan-20 at 08:46The error is here: Too many open files
<- you need to increase the limit on the number of open files. This could be done with ulimit
command and make it permanent as described in this answer.
QUESTION
I have a Cassandra cluster (with cassandra v3.11.11) with 3 data centers with replication factor 3. Each node has 800GB nvme but one of the data table is taking up 600GB of data. This results in below from nodetool status
:
ANSWER
Answered 2022-Jan-19 at 19:15I personally would start with checking if the whole space is occupied by actual data, and not by snapshots - use nodetool listsnapshots to list them, and nodetool clearsnapshot to remove them. Because if you did snapshot for some reason, then after compaction they are occupying the space as original files were removed.
The next step would be to try to cleanup tombstones & deteled data from the small tables using nodetool garbagecollect, or nodetool compact
with -s
option to split table into files of different size. For big table I would try to use nodetool compact with --user-defined
option on the individual files (assuming that there will be enough space for them). As soon as you free > 200Gb, you can sstablesplit (node should be down!) to split big SSTable into small files (~1-2Gb), so when node starts again, the data will be compacted
QUESTION
I am reading LSM indexing in Designing Data-Intensive Applications by Martin Kleppmann.
The author states:
When a write comes in, add it to an in-memory balanced tree data structure (for example, a red-black tree). This in-memory tree is sometimes called a memtable.
When the memtable gets bigger than some threshold—typically a few megabytes —write it out to disk as an SSTable file. This can be done efficiently because the tree already maintains the key-value pairs sorted by key. The new SSTable file becomes the most recent segment of the database. While the SSTable is being written out to disk, writes can continue to a new memtable instance.
In order to serve a read request, first try to find the key in the memtable, then in the most recent on-disk segment, then in the next-older segment, etc.
From time to time, run a merging and compaction process in the background to combine segment files and to discard overwritten or deleted values.
My question is: given that SSTables on disk are immutable, how is sorting guaranteed when new data comes in, that can change the ordering of data in SSTables (not memtable which is in memory)?
For e.g., suppose we have a SSTable on disk which has key-values pairs like [{1:a},{3:c},{4,d}]
. Memtable in memory contains [{5,e},{6,f}]
(which is sorted using AVL/RB tree). Suppose we now get a new entry: [{2,b}]
which should reside between [{1:a}]
and [{3:c}]
. How would this be handled, if SSTable(s) on disk are immutable? In theory, we could create a new SSTable with [{2,b}]
and compaction could later merge them, but wouldn't that break range-queries/reads that we perform before compaction takes place?
Thanks!
...ANSWER
Answered 2021-Dec-29 at 09:00If new data is coming, they are landing in new SSTables, not modifying existing ones. Each SSTable is read separately, and then data is consolidated from all SSTables and memtable, and then put into the correct order in memory before sending. See this doc, for example, on how data is read.
QUESTION
I want to run some of the programs in SSTable Tools however the doc says: Cassandra must be stopped before these tools are executed, or unexpected results will occur. Note: the scripts do not verify that Cassandra is stopped.
I installed and started cassandra using docker. So how do I run something like sstableutil?
...ANSWER
Answered 2022-Jan-08 at 18:30Something like this, but you need to make sure that you have data on the host system, or in the Docker volume (it's good idea anyway):
- stop container
- execute
docker run -it ...volume_config... --rm cassandra sstable_command
- start container
P.S. But it really depends on the command - I remember that some commands were documented as required stop, but not really required
QUESTION
I am getting this two errors Validator.java:268 - Failed creating a merkle tree for
and CassandraDaemon.java:228 - Exception in thread Thread
at the exact time 0t:00:03
each hour
ANSWER
Answered 2021-Dec-10 at 13:18The logs show that Cassandra failed in the validation phase of the anti-entropy repair process.
As "Cannot start multiple repair sessions over the same sstables" this means there're multiple repair sessions on the same token range at the same time.
You need to make sure that you have no repair session currently running on your cluster, and no anti-compaction.
I suggest rolling restart in order to stop all running repairs. then try to repair node by node.
One last suggestion is to try https://github.com/thelastpickle/cassandra-reaper it used to run automated repairs for Cassandra.
QUESTION
Cassandra repair is failing to run with the below error on node 1. I earlier started multiple repair sessions in parallel by mistake. I find that there is a bug https://issues.apache.org/jira/browse/CASSANDRA-11824 which has been resolved for the same scenario. But I am already using cassandra 3.9 Please confirm if running nodetool scrub is the only workaround? Are there any considerations that we need to keep in mind before running scrub as I need to run this directly on Prod.
...ANSWER
Answered 2021-Nov-05 at 04:25Nodetool tpstats revealed that there were indeed active repair jobs, but they were actually not running or compactionstats did not show any running jobs. So I restarted just the nodes on which the repair was stuck and this cleared up those stuck repair jobs and I was able to run a fresh repair after that.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install sstable
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page