bigtable | Abstraction layer for big tables

by altamiracorp JavaScript Version: Current License: Apache-2.0

X-Ray Key Features Code Snippets Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | bigtable Summary

bigtable is a JavaScript library typically used in Big Data applications. bigtable has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Simple framework for implementing BigTable style data models in Java.

Support

Quality

Security

License

Reuse

Support

bigtable has a low active ecosystem.

It has 8 star(s) with 4 fork(s). There are 28 watchers for this library.

It had no major release in the last 6 months.

There are 0 open issues and 4 have been closed. On average issues are closed in 8 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of bigtable is current.

Quality

bigtable has no bugs reported.

Security

bigtable has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

bigtable is licensed under the Apache-2.0 License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

bigtable releases are not available. You will need to build from source code and install.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of bigtable

Get all kandi verified functions for this library.

bigtable Key Features

No Key Features are available at this moment for bigtable.

bigtable Examples and Code Snippets

No Code Snippets are available at this moment for bigtable.

Community Discussions

Trending Discussions on bigtable

BigTable: will column qualifier or timestamp filter cause the full scan?

How to connect to GCP BigTable using Python

Which is better performant bigtable schema: Single column multi cells vs Multi columns single cell?

Using GCP Secret Manager in integration tests in Java

Bigtable: mutating multiple values into single column in a single column family

How to specify which GCP project to use when triggering a pipeline through Data Fusion operator on Cloud Composer

Python Bigtable Dataflow - Can't pickle

Dataflow - State persistence database

BigTable ReadModifyWriteRow Support for Mapping Function

Does google cloud BigTable have a data browser?

QUESTION

BigTable: will column qualifier or timestamp filter cause the full scan?

Asked 2022-Apr-11 at 15:57

Say I have a table that contains billions of records.

On a particular day there are only 7 records though. If I filter it by timestamp range (as described here), will it cause the full table scan?
Only one row has a column "col36847629". if I apply a column qualifier filter, will it scan the entire table?

...

ANSWER

Answered 2022-Apr-11 at 15:57

Any filters on Bigtable reads (besides ones related to the rowkey) will cause a full table scan if not contained by a rowkey or range. The filters are in place to help reduce the amount of data sent over the network for lower network costs and faster throughput, but are not going to reduce the size of the scan.

If this is a common scenario you're facing, you might want to add some date or timestamp information into your rowkey as a way to filter on that and then perform the scans.

Source https://stackoverflow.com/questions/71815242

QUESTION

How to connect to GCP BigTable using Python

Asked 2022-Mar-21 at 11:09

I am connecting to my GCP BigTable instance using python (google-cloud-bigtable library), after setting up "GOOGLE_APPLICATION_CREDENTIALS" in my environment variables. I'm successful at doing this.

However, my requirement is that I want to pass the credentials during the run-time while creating the BigTable Client object as shown below:

...

ANSWER

Answered 2022-Mar-21 at 11:09

After 2 hrs of searching I finally landed on these page(s), please check them out in order:

Source https://stackoverflow.com/questions/71512310

QUESTION

Which is better performant bigtable schema: Single column multi cells vs Multi columns single cell?

Asked 2022-Mar-16 at 13:45

I need to store user interactions for 7 days in an existing BigTable table whose row key is user identifier. There are two types of interactions and we should be able to retrieve interaction history of each user in the order of time. It's obvious that the column family should have 7days as TTL and the column should contain type of interaction. I'm thinking about two options for the column, {interaction_type}:{timestamp} with the latest 1 cell and {interaction_type} with multiple cells. As the GCP bigtable doc doesn't recommend too many columns in a row, so the latter looks more reasonable. However, the column should be retrieved along with other existing columns designed by the former schema (including timestamp in column and the latest 1 cell), if I choose the latter one, the query should use interleaved filters due to the different number of cells for columns. So I wondered which one would show better read performance. Also wondered implications of one column with multi cells vs multi columns with one cell and chain filter vs interleaved filter in terms of performance in BigTable.

...

ANSWER

Answered 2022-Mar-16 at 13:45

What you are talking about comes out of https://cloud.google.com/bigtable/docs/schema-design#row-keys and from what you stated, it is how you design the number of columns and in general, interleaving has a performance penalty and queries result in further fetches.

The best design is to determine the smallest data set that is usable. i.e. combine elements into a column where that element has all the necessary fields for that result without requiring an additional column query. This is set against the need to have common elements stored uniquely, i.e. not needing to store the same field content in multiple columns (which uses more space) but there are times when it is better, i.e. makes a query return a particular column without processing another column (can be faster).

The second option is definitely better but again the question is subjective to the access patterns, but if based purely upon performance, avoiding the interleaved filters would be better.

Another consideration for your scenario would be : https://cloud.google.com/bigtable/docs/using-filters#cells-per-column-limit

and the supporting mention of the overhead is here: https://cloud.google.com/bigtable/docs/using-filters#interleave

Source https://stackoverflow.com/questions/70903051

QUESTION

Using GCP Secret Manager in integration tests in Java

Asked 2022-Feb-14 at 22:41

I have a pretty big code base written in Java. I have a lot of integration tests with both Kafka and Bigtable using JUnits ExternalResource. I have introduced fetching of secrets from GCP Secret Manager in my code. I now want to write integration tests for that as well.

So my scenario is that I want to create a mock username/password, create a secret of that username/password in my mock GCP Secret Manager, access the secrets and then use it to connect to my mock-service that requires that username/password. So, in reality, I'm connecting to a Kafka broker in my test with SSL and I want to simulate the entire flow with fetching of secrets.

The problem is, I can't find any documentation on how to do it. Google has great other documentation on how to emulate Bigtable, but I can't find any documentation on how to emulate/mock a Secret Manager. Has anyone ran into something similar?

...

ANSWER

Answered 2022-Feb-14 at 22:41

"GCP doesn't have any Emulator for Secret Manager" - @Guillaume Blaquiere.

Source https://stackoverflow.com/questions/71068322

QUESTION

Bigtable: mutating multiple values into single column in a single column family

Asked 2021-Dec-15 at 18:39

I've tried to insert 100 values into a column test_col in column family test_cf in a row key test-123

The problem is that I successfully inserted 100 values into Bigtable.

However, The number of values in the test_col column in test_cf is less than 100 and It appears to be randomly inserted.

The code I wrote is below.

...

ANSWER

Answered 2021-Dec-15 at 18:39

This code is not doing what you intended. Each set_cell call is writing to the same row ("test-123"), the same column family ("test_cf") and the same column qualifier ("test_col"). The value is different each time, but the timestamp associated with each value is the current time which could be the same across multiple set_cells. Because a single cell in Bigtable is indexed by the (row, family, column, timestamp) tuple, this code can overwrite data it wrote earlier in the loop.

So, it is entirely possible that the first 3 set_cells look like this:

Source https://stackoverflow.com/questions/70360571

QUESTION

How to specify which GCP project to use when triggering a pipeline through Data Fusion operator on Cloud Composer

Asked 2021-Dec-10 at 09:18

I need to to trigger a Data Fusion pipeline located on a GCP project called myDataFusionProject through a Data Fusion operator (CloudDataFusionStartPipelineOperator) inside a DAG whose Cloud Composer instance is located on another project called myCloudComposerProject.

I have used the official documentation as well as the source code to write the code that roughly resembles the below snippet:

...

ANSWER

Answered 2021-Dec-09 at 13:19

As a recommendation while developing operators on airflow, we should check the classes that are implementing the operators as documentation may lack some information due to versioning.

As commented, if you check CloudDataFusionStartPipelineOperator you will find that it makes use of a hook that gets the instance base on a project-id. This project-id its optional, so you can add your own project-id.

Source https://stackoverflow.com/questions/70286201

QUESTION

Python Bigtable Dataflow - Can't pickle

Asked 2021-Dec-09 at 05:18

I'm writing a Dataflow pipeline using Apache beam to add large batches of rows of data to Bigtable.

apache-beam==2.24.0
google-cloud-bigtable==2.4.0

I have the following method used in my pipeline to create the Bigtable row(s) prior to writing to Bigtable:

...

ANSWER

Answered 2021-Dec-09 at 05:18

Your google-cloud-bigtable version is too high.

There is some movement in updating apache-beam dependencies here

They have the same issue. Can you roll back your bigtable version to something before 2? If you run this:

Source https://stackoverflow.com/questions/70268107

QUESTION

Dataflow - State persistence database

Asked 2021-Nov-30 at 12:48

We are considering using Beam/Dataflow for stateful processing:

Real-time aggregation of metrics on global windows (every 1min)
Real-time aggregation on a high number of parallel sessions (> 1 mio)

Example: get max price article bought for each 1 mio clients since registered on a portal

Also, we would also like to access those calculated aggregates while not interfering with the real-time job.

Design question : can it be covered by the current state back-end - Windmill/Persistent Disks [1] - or would the use of a database (like BigTable) be a better fit ?

Thanks !

[1] Dataflow - State persistence?

...

ANSWER

Answered 2021-Nov-25 at 14:19

It is actually possible to define Big Table connectors in Dataflow to perform reading and writing operations. Moreover, there is the project.jobs.get method of the Dialogflow API that returns an instance of a job which is a json response containing also the “currentState” field. Therefore I think you could build a sort of automation script to get this field value and then store it in Big Table database using the Big Table connectors, however it is a quite complex solution and I am not sure if it could be convenient.

Source https://stackoverflow.com/questions/69913119

QUESTION

BigTable ReadModifyWriteRow Support for Mapping Function

Asked 2021-Oct-29 at 14:11

I am aware that BigTable supports operations append and increment using ReadModifyWriteRow requests, but I'm wondering if there is support or an alternative way to use more generic mapping functions where the value from the cell can be accessed and modified within some sort of closure? For instance, bitwise ANDing a long value in a cell:

...

ANSWER

Answered 2021-Oct-29 at 14:11

Doing a mapping like this is not supported by Bigtable, so here is an option you could try. This will only work with single cluster instances due to consistency required for it.

You could add a column to keep track of row version (in addition to the existing row versions) and then you can read the data and version, modify it in memory and then do a checkAndMutate with the version and new value. Something like this:

Source https://stackoverflow.com/questions/69760591

QUESTION

Does google cloud BigTable have a data browser?

Asked 2021-Oct-21 at 12:58

I need to view the data in a BigTable table, but I can't find a data browser in the web console. (Dynamo has a nice browser in the AWS web console.) Is there a data browser for BigTable, or am I limited to the cbt command line?

...

ANSWER

Answered 2021-Oct-21 at 12:58

There is currently no data browser in the web console, so you're correct that you're limited to the cbt command line. One option that can help making viewing your data easier is to query Bigtable through BigQuery. This can be great for one off ways you want to look at your data, but you should be careful with using it on any production data since some of the queries can easily do full table scans which impact performance.

Source https://stackoverflow.com/questions/69654599

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install bigtable

You can download it from GitHub.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: