kiba | A small implementation of multithreaded Redis in Rust | Database library

by shoyo Rust Version: v0.1 License: BSD-3-Clause

X-Ray Key Features Code Snippets(3)Community Discussions(10)Vulnerabilities Install Support

kandi X-RAY | kiba Summary

kiba is a Rust library typically used in Database applications. kiba has no bugs, it has no vulnerabilities, it has a Permissive License and it has low support. You can download it from GitHub.

Kiba is an in-memory database that's designed to be performant and simple to use. Kiba is fundamentally a key-value store, but supports complex value types such as lists, sets, and hashes. It exposes a similar API to Redis, such as GET, SET, INCR, DECR, LPUSH, RPUSH, SADD, SREM, HSET, HGET and more. Disclaimer: Kiba is a side-project that's still very early in its development. Needless to say, it shouldn't be trusted in any remotely serious setting. I plan to continue developing its feature set and improving reliability so that it'll someday be production-ready.

Support

Quality

Security

License

Reuse

Support

kiba has a low active ecosystem.

It has 22 star(s) with 1 fork(s). There are 3 watchers for this library.

It had no major release in the last 12 months.

There are 3 open issues and 6 have been closed. On average issues are closed in 6 days. There are 1 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of kiba is v0.1

Quality

kiba has no bugs reported.

Security

kiba has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

License

kiba is licensed under the BSD-3-Clause License. This license is Permissive.

Permissive licenses have the least restrictions, and you can use them in most projects.

Reuse

kiba releases are available to install and integrate.

Installation instructions are not available. Examples and code snippets are available.

Top functions reviewed by kandi - BETA

kandi's functional review helps you automatically verify the functionalities of the libraries and avoid rework.
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of kiba

Get all kandi verified functions for this library.

kiba Key Features

No Key Features are available at this moment for kiba.

kiba Examples and Code Snippets

Kiba: An in-memory, multithreaded key-value store,Examples

Rust

Lines of Code : 58

License : Permissive (BSD-3-Clause)

Copy

kiba> SET name "FOO BAR"
OK

kiba> GET name
"FOO BAR"

kiba> GET bar
(nil)

kiba> SET counter 9999
OK

kiba> INCR counter
(integer) 10000

kiba> DECRBY counter 3000
(integer) 7000

kiba> LPUSH letters b
(integer) 1

kiba> LPUS

Kiba: An in-memory, multithreaded key-value store,Implementation

Rust

Lines of Code : 51

License : Permissive (BSD-3-Clause)

Copy

                          -------------------
                          Layers of Execution
                          -------------------
      bytestream input
        (user query)
             v
----------------------------

Kiba: An in-memory, multithreaded key-value store,Docker

Rust

Lines of Code : 7

License : Permissive (BSD-3-Clause)

Copy

% docker pull shoyo64/kiba:0.1

% docker run -p 6464:6464 --name kiba shoyo64/kiba:0.1

% docker start kiba
% docker stop kiba

% ./kiba-cli

% docker build -t  .
% docker run -p :6464 --name

Community Discussions

Trending Discussions on kiba

Understanding LinkingObjects in Realm Xcode 12, Also when to use it

Reorder rows in a Kiba job

How to structure a Kiba project that needs to do multiple HTTP calls

Transpose CSV rows and columns during ETL process using Kiba (or plain Ruby)

Is there a standard pattern for invoking related pipelines in Kiba ETL?

Is there a way to return some data at the end of a Kiba job?

How to filter data in extractor?

Is there an obvious way to reduce rows when using Kiba?

Should I use Rails for consistency? (for ETL project)

QUESTION

Understanding LinkingObjects in Realm Xcode 12, Also when to use it

Asked 2021-Jun-13 at 15:23

In Realm, I had problem understanding ( Im new in Realm T,T ) the implementations of LinkingObjects , let's say a Person could have more than one Dog ( List of Dog ) so I would write the code such as below:

...

ANSWER

Answered 2021-Jun-13 at 15:23

You can think of LinkingObjects almost as a computed property - it automagically creates an inverse link to the parent object when the child object is added to the parent objects List.

So when a Dog is added to a person's dogs list, a person reference is added to the Dog's walkers list. Keeping in mind that it's a many to many relationship so technically if Person A adds Doggo, and Person B adds Doggo, the Doggo's inverse relationship 'walkers' will contain Person A and Person B

the app still can run normally without any diff

Which is true, it doesn't affect he operation of the app. HOWEVER the difference is that by removing the walkers LinkingObjects, there's no way to query Dogs for their Person and get Dog Results (i.e. you can't traverse the graph of the relationship back to the person)

In other words we can query Person for kinds of dog stuff

Source https://stackoverflow.com/questions/67950349

QUESTION

Reorder rows in a Kiba job

Asked 2021-May-14 at 18:44

I have a kiba job that takes a CSV file (with Kiba::Common::Sources::CSV), enrich its data, merge some rows (with the ChainableAggregateDestination destination described here) and saves it to another CSV file (with Kiba::Common::Destinations::CSV).

Now, I want to sort the rows differently (based on the first column) in my destination CSV. I can't find a way to write a transform that does this. I could use post_process to reopen the destination CSV, sort it and rewrite it but I guess there is a cleaner way...

Can someone point me in the right direction?

...

ANSWER

Answered 2021-May-14 at 18:44

To sort rows, a good strategy is to use an "aggregating transform", as explained in this article, to store all the rows in memory (although you could do it out of memory), then at transform "close" time, sort them and re-emit them in the pipeline.

This is the most flexible design IMO.

Source https://stackoverflow.com/questions/67534932

QUESTION

How to structure a Kiba project that needs to do multiple HTTP calls

Asked 2021-Mar-12 at 13:54

I'm looking at writing one of our ETL (or ETL like) processes in kiba and I wonder how to structure it. The main question I have is the overall architecture. The process works roughly like this:

Fetch data from an HTTP endpoint.
For each item returned from that API and make one more HTTP call
Do some transformations for each of the items returned from step 2
Send each item somewhere else

Now my question is: Is it OK if only step one is a source and anything until the end is a transform? Or would it be better to somehow have each HTTP call be a source and then combine these somehow, maybe using multiple jobs?

...

ANSWER

Answered 2021-Mar-12 at 13:54

It is indeed best to use a single source, that you will use to fetch the main stream of the data.

General advice: try to work in batches as much as you can (e.g. pagination in the source, but also bulk HTTP lookup if the API supports it in step 2).

Source section

The source in your case could be a paginating HTTP resource, for instance.

A first option to implement it would be to write write a dedicated class like explained in the documentation.

A second option is to use Kiba::Common::Sources::Enumerable (https://github.com/thbar/kiba-common#kibacommonsourcesenumerable) like this:

Source https://stackoverflow.com/questions/66599099

QUESTION

Transpose CSV rows and columns during ETL process using Kiba (or plain Ruby)

Asked 2021-Mar-10 at 09:16

A third party system produces an HTML table of parent teacher bookings:

...

ANSWER

Answered 2021-Mar-10 at 08:52

Kiba author here!

I see at least two ways of doing this (no matter if you work with plain Ruby or with Kiba):

converting your HTML to a table, then work from that data
work directly with the HTML table (using Nokogiri & selectors), applicable only if the HTML is mostly clean

In all cases, because you are doing some scraping; I recommend that you have a very defensive code (because HTML changes and can contain bugs or cornercases later), e.g. strong assertions on the fact that the lines / columns contain what you expect, verifications etc.

If you go plain Ruby, then for instance you could do something like (here modelizing your data as text separated with commas to keep things clear):

Source https://stackoverflow.com/questions/66559600

QUESTION

Is there a standard pattern for invoking related pipelines in Kiba ETL?

Asked 2020-Sep-12 at 12:17

I'm working on an ETL pipeline with Kiba which imports into multiple, related models in my Rails app. For example, I have records which have many images. There might also be collections which contain many records.

The source of my data will be various, including HTTP APIs and CSV files. I would like to make the pipeline as modular and reusable as possible, so for each new type of source, I only have to create the source, and the rest of the pipeline definition is the same.

Given multiple models in the destination, and possibly several API calls to get the data from the source, what's the standard pattern for this in Kiba?

I could create one pipeline where the destination is 'the application' and has responsibility for all these models, this feels like the wrong approach because the destination would be responsible for saving data across different Rails models, uploading images etc.

Should I create one master pipeline which triggers more specific ones, passing in a specific type of data (e.g. image URLs for import)? Or is there a better approach than this?

Thanks.

...

ANSWER

Answered 2020-Sep-12 at 12:17

Kiba author here!

It is natural & common to look for some form of genericity, modularity and reusability in data pipelines. I would say though, that like for regular code, it can be hard initially to figure out what is the correct way to get that (it will depend quite a bit on your exact situation).

This is why my recommendation would be instead to:

Start simple (on one specific job)
Very important: make sure to implement end-to-end automated tests (use webmock or similar to stub out API requests & make tests completely isolated, create tests with 1 row from source to destination) - this will make it easy to refactor stuff later
Once you have that (1 pipeline with tests), you can start implementing a second one, and refactor to extract interesting patterns as reusable bits, and iterate from there

Depending on your exact situation, maybe you will extract specific components, or maybe you will end up extracting a whole generic job, or generic families of jobs etc.

This approach works well even as you get more experience working with Kiba (this is how I gradually extracted the components that you will find in kiba-common and kiba-pro, too.

Source https://stackoverflow.com/questions/63842239

QUESTION

Is there a way to return some data at the end of a Kiba job?

Asked 2020-Jul-03 at 12:50

It would be great if there was a way to get some kind of return object from a Kiba ETL run so that I could use the data in there to return a report on how well the pipeline ran.

We have a job that runs every 10 minutes that processes on average 20 - 50k records, and condenses them into summary records, some of which are created, and some of which are updated. The problem is, it's difficult to know what happened without trawling through reams of log files, and obviously, logs are useful to end users either.

Is there a way to populate some kind of outcome object with arbitrary data, as the pipeline runs? e.g

25.7k rows found in source
782 records dropped by this transformer
100 records inserted
150 records updated
20 records had errors (and here they are)
This record had the highest statistic
1200 records belonged to this VIP customer
etc.

And then at the end, use that data to send an email summary, populate a web page, render some console output, etc.

Currently, the only way I can see this working right now is to send an object in during setup and mutate it when it's flowing through the sources, transformers, and destinations. Once the run is complete, check the variable afterwards and do something with the data that is now in there.

Is this how it should be done, or is there a better way?

EDIT

Just want to add that I don't want to handle this in the post_process block, because the pipeline gets used via a number of different mediums, and I would want each use case to handle its own feedback mechanism. It's also cleaner (imo) for an ETL pipeline to not have to worry about where it's used, and what that usage scenario's feedback expectations are...

...

ANSWER

Answered 2020-Jul-03 at 12:50

The answer is highly dependent on the context, but here are a few guidelines.

If the outcome object is not too large, indeed I recommend that you pass an empty outcome object (typically a Hash), then populate it during the runs (you could also use some form of middleware to even track the exception itself).

How you will fill it will depend on the context and your actual needs, but this can be done in fairly job-agnostic fashion (maybe using DSL extensions https://github.com/thbar/kiba/wiki/How-to-extend-the-Kiba-DSL, you can achieve some fairly high-level extensions that will register the required transforms or blocks to achieve what you need).

The object can be used as is, or could also be serialised as JSON or similar, even stored into a DB if you need to provide some rich output later (or you could use it to prepare something else).

If needed, you could even have something fairly structured in a specific database, for that purpose (if you need an easy way to expose that to customers, for instance).

Note that you could programmatically define a post_process without the job realising it much (without the coupling). Here is a very simple example:

Source https://stackoverflow.com/questions/62714214

QUESTION

How to filter data in extractor?

Asked 2020-Jul-03 at 12:36

I've got a long-running pipeline that has some failing items (items that at the end of the process are not loaded because they fail database validation or something similar).

I want to rerun the pipeline, but only process the items that failed the import on the last run.

I have the system in place where I check each item ID (that I received from external source). I do this check in my loader. If I already have that item ID in the database, I skip loading/inserting that item in the database.

This works great. However, it's slow, since I do extract-transform-load for each of these items, and only then, on load, I query the database (one query per item) and compare item IDs.

I'd like to filter-out these records sooner. If I do it in transformer, I can only do it per item again. It looks like extractor could be the place, or I could pass records to transformer in batches and then filter+explode the items in (first) transformer.

What would be better approach here?

I'm also thinking about reusability of my extractor, but I guess I could live with the fact that one extractor does both extract and filter. I think the best solution would be to be able to chain multiple extractors. Then I'd have one that extracts the data and another one that filters the data.

EDIT: Maybe I could do something like this:

...

ANSWER

Answered 2020-Jul-03 at 12:36

A few hints:

1/ The higher (sooner) in the pipeline, the better. If you can find a way to filter out right from the source, the cost will be lower, because you do not have to manipulate the data at all.

2/ If you have a scale small enough, you could load only the full list of ids at the start in a pre_process block (mostly what you have in mind in your code sample), then compare right after the source. Obviously it doesn't scale infinitely, but it can work a long time depending on your dataset size.

3/ If you need to have a higher scale, I would advise to either work with a buffering transform (grouping N rows) that would achieve a single SQL query to verify the existence of all the N rows ids in the target database, or work with groups of rows then explode indeed.

Hope this helps!

Source https://stackoverflow.com/questions/62695849

QUESTION

Recommended way to achieve `rescue-ensure` kind of functionality with Kiba?

Asked 2020-Jul-02 at 08:27

We have a Kiba pipeline where we need to do some task after the job has ended, no matter if there were errors or not (the whole pipeline doesn't fail, we just have couple of validation errors or similar).

This is what the documentation says:

:warning: Post-processors won't get called if an error occurred before them. https://github.com/thbar/kiba/wiki/Implementing-pre-and-post-processors

Would this be recommended way to do this:

...

ANSWER

Answered 2020-Jul-02 at 08:27

Indeed post_process will not be called in case of errors, as documented and as you pointed out!

At this point, the best solution is to use a form of ensure statement:

A common way to structure that is:

Source https://stackoverflow.com/questions/62678911

QUESTION

Is there an obvious way to reduce rows when using Kiba?

Asked 2020-Mar-31 at 12:56

Firstly - Thibaut, thank you for Kiba. It goes toe-to-toe with 'enterprise' grade ETL tools and has never let me down.

I'm busy building an ETL pipeline that takes a numbers of rows, and reduces them down into a single summary row. I get the feeling that this should be a simple thing, but I'm a little stumped on how to approach this problem.

We have a number of CDR's from a voice switch, and need to condense them under some simple criteria into a a handful of summary records. So, the problem is; I have many thousands of records coming in from a Source, and need to transform them into only a few records based on some reduce criteria.

Kiba is really simple when there's a one-to-one Source -> Destination ETL, or even a one-to-many Source -> Destination with the new enumerable exploder in V3, but I don't see a clear path to many-to-one ETL pipelines.

Any suggestions or guidance would be greatly appreciated.

...

ANSWER

Answered 2020-Mar-31 at 12:56

Glad you find Kiba useful! There are various solutions to this use case.

I'm making some assumptions here (if these are incorrect, the solutions will exist, but be different, e.g. boundaries detections & external storage):

You are working with finite batches (rather than a continuous stream of updates).
The handful of summary records you are referring to can be held in memory.

My advice here is to leverage Kiba v3 ability to yield record in transform's close method (described in more depth in this article):

Source https://stackoverflow.com/questions/60948636

QUESTION

Should I use Rails for consistency? (for ETL project)

Asked 2020-Mar-03 at 11:03

CONTEXT

I'm new to Ruby and all that jazz, but I'm not new to dev.
I'm taking over a project based on 2 rails/puma repositories for web & APIs.
I'm building a new repository for a backend data processing app, using Kiba, that will run through scheduled jobs.
Also, I'm to be joined by other devs later on, so I'd like to make something maintainable by design.

MY QUESTION : Should I use Rails on that ETL project?

Using it means we can apply the same folder structure as the other repos, use RSpec all the same etc. It also appeared to me that Rails changes the way classes like Hash act.

At the same time, it seems to bring unnecessary complexity to a project that will run on CLI and could consist of only a dozen of files.

...

ANSWER

Answered 2020-Mar-03 at 09:08

From my point of view using Rails for ETL projects is an overhead. Take a look at dry-rb. Using https://dry-rb.org/gems/dry-system/0.12/ you can build a small application to process data. Also, there is a gem to build CLI https://dry-rb.org/gems/dry-cli/0.4/

Here is a list of all dry gems https://dry-rb.org/gems/

Source https://stackoverflow.com/questions/60503257

Community Discussions, Code Snippets contain sources that include Stack Exchange Network

Vulnerabilities

No vulnerabilities reported

Install kiba

You can download it from GitHub.
Rust is installed and managed by the rustup tool. Rust has a 6-week rapid release process and supports a great number of platforms, so there are many builds of Rust available at any time. Please refer rust-lang.org for more information.

Support

For any new features, suggestions and bugs create an issue on GitHub. If you have any questions check and ask questions on community page Stack Overflow .

Find more information at: