marklogic-data-hub | The MarkLogic Data Hub : documentation == > | Database library

by marklogic TypeScript Version: v5.8.1 License: Non-SPDX

X-Ray Key Features Code Snippets Community Discussions(2)Vulnerabilities Install Support

kandi X-RAY | marklogic-data-hub Summary

marklogic-data-hub is a TypeScript library typically used in Database, Gradle applications. marklogic-data-hub has low support. However marklogic-data-hub has 287 bugs, it has 15 vulnerabilities and it has a Non-SPDX License. You can download it from GitHub.

Go from nothing to an Operational Data Hub in a matter of minutes. MarkLogic Data Hub is a data integration platform and toolset that helps you quickly and efficiently integrate data from many sources into a single MarkLogic database and then expose that data.

Support

Quality

Security

License

Reuse

Support

marklogic-data-hub has a low active ecosystem.

It has 127 star(s) with 133 fork(s). There are 52 watchers for this library.

It had no major release in the last 12 months.

There are 32 open issues and 1259 have been closed. On average issues are closed in 132 days. There are 9 open pull requests and 0 closed requests.

It has a neutral sentiment in the developer community.

The latest version of marklogic-data-hub is v5.8.1

Quality

marklogic-data-hub has 287 bugs (13 blocker, 2 critical, 83 major, 189 minor) and 1815 code smells.

Security

marklogic-data-hub has no vulnerabilities reported, and its dependent libraries have no vulnerabilities reported.

marklogic-data-hub code analysis shows 15 unresolved vulnerabilities (15 blocker, 0 critical, 0 major, 0 minor).

There are 45 security hotspots that need review.

License

marklogic-data-hub has a Non-SPDX License.

Non-SPDX licenses can be open source with a non SPDX compliant license, or non open source licenses, and you need to review them closely before use.

Reuse

marklogic-data-hub releases are available to install and integrate.

Installation instructions are available. Examples and code snippets are not available.

marklogic-data-hub saves you 56403 person hours of effort in developing the same functionality from scratch.

It has 64761 lines of code, 3244 functions and 1186 files.

It has medium code complexity. Code complexity directly impacts maintainability of the code.

Top functions reviewed by kandi - BETA

kandi has reviewed marklogic-data-hub and discovered the below as its top functions. This is intended to give you an instant insight into marklogic-data-hub implemented functionality, and help decide if they suit your requirements.

Finishes the run step .
Convert flows and mappings .
Initialize artifact directories .
Build aggregatable properties .
Clears module modules .
Convert the entity models in the project directory to the storage directory .
Create a step file .
Get the step runner for a given flow .
Imports one or more jobs .
Update app config .

Get all kandi verified functions for this library.

marklogic-data-hub Key Features

No Key Features are available at this moment for marklogic-data-hub.

marklogic-data-hub Examples and Code Snippets

No Code Snippets are available at this moment for marklogic-data-hub.

Community Discussions

Trending Discussions on marklogic-data-hub

MarkLogic Data Hub install & upgrade

Could MLCP Content Transformation and Triggers be used together during document ingestion?

QUESTION

MarkLogic Data Hub install & upgrade

Asked 2021-Jan-12 at 06:56

Ive been installing brand new ( empty ) Data Hub 4.1.1 so I can practice upgrade ( 4.1.1 to 4.3.2 , then up to 5.2.6 ).

Ive been using the quickstart instructions here https://marklogic.github.io/marklogic-data-hub/tutorial/4x/install/ to do the install, but wonder if I'm "cheating" and should instead be using same Gradle install method as for 5.2.x?

To clarify - can you install 4.1.1 using the same method for 5.2.x as described here https://docs.marklogic.com/datahub/5.2/projects/create-project-using-gradle.html, or do you need to follow the 4.x.x instructions only?

If we follow the 4.1.1 example, for sample data provided, it seems you initialize the project via Quickstart ( which then adds extra files etc to the local hard disk into the project directory), then you install the project as a discrete Data Hub into markLogic. Is it correct to say each project is its own Data Hub?

Thanks in advance.

...

ANSWER

Answered 2021-Jan-12 at 06:56

You can follow the 5.2 instructions. The 4.x instructions are not really different, as you can see from here:

https://marklogic.github.io/marklogic-data-hub/project/gradle/

4.x also has scaffolding tasks. Run ./gradlew tasks, and look for tasks starting with hub or ml.

HTH!

Source https://stackoverflow.com/questions/65677857

QUESTION

Could MLCP Content Transformation and Triggers be used together during document ingestion?

Asked 2018-Aug-24 at 14:44

As I understand, both the MLCP Transformation and Trigger can be used to modify ingested documents. The difference is that content transformation operates on the in-memory document object during the ingestion, whereas Trigger can be fired after a document is created.

So it seems to me there is no reason why I cannot use both of them together. My use cases is that I need to update some nodes of the documents after they are ingested to the database. The reason I use trigger is because running the same logic in MLCP transformation using the in-mem-update module always caused ingestion failure, presumably due to the large file size and the large number of nodes I attempted to update.

2018-08-22 23:02:24 ERROR TransformWriter:546 - Exception:Error parsing HTTP headers: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

So far, I have not been able to combine Content Transformations and Triggers. When I enabled transformation during MLCP ingestion, the trigger was not fired. When I disabled the transformation, the trigger worked without problem.

Is there any intrinsic reason why I cannot use both of them together? Or is it an issue related to my configuration? Thanks!

Edit:

I would like to provide some context for clarification and report results based on suggestions from @ElijahBernstein-Cooper, @MadsHansen and @grtjn (thanks!). I am using the MarkLogic Data Hub Framework to ingest PDF files (some are quite large) as binaries and extract the text as XML. I essentially followed this example, except that I am using xdmp:pdf-convert instead of xdmp:document-filter: https://github.com/marklogic/marklogic-data-hub/blob/master/examples/load-binaries/plugins/entities/Guides/input/LoadAsXml/content/content.xqy

While xdmp:pdf-convert seems to preserve the PDF structure better than the xdmp:document-filter, it also includes some styling nodes ( and

...

ANSWER

Answered 2018-Aug-24 at 07:24

MLCP Transforms and Triggers operate independently. There is nothing in those Transforms that should stop Triggers from working per se.

Triggers are triggers by events. I typically use both a create and a modify trigger to cover the cases where I import the same files a second time (for testing purposes for instance).

Triggers also have a scope. They are configured to look for either a directory or a collection. Make sure your MLCP configuration matches the Trigger scope, and that your Transform does not influence the URI in such a way that it no longer matches directory scope if that is used.

Looking more closely to the error message however, I'd say that is caused by a timeout. Timeouts can occur both server-side (10 min by default), and client-side (might depend on client-side settings, but could be much smaller). The message basically says that the server took too long to respond, so I'd say you are facing a client-side timeout.

Timeouts can be caused by too small time-limits. You could try to increase timeout settings both server-side (xdmp:set-request-time-limit()), and client-side (not sure how to do that in Java).

It is more common though, that you are simply trying to do too much at the same time. MLCP opens transactions, and tries to execute a number of batches within that transaction, aka the transaction_size. Each batch contains a number of documents to the size of batch_size. By default MLCP tries to process 10 x 100 = 1000 documents per transaction.

It also runs with 10 threads by default, so it typically opens 10 transactions at the same time, and tries to run 10 threads to process a 1000 docs each in parallel. With simple inserts this is just fine. With more heavy processing in transforms or pre-commit triggers, this can become a bottle-neck, particularly when the threads start to compete for server resources like memory and cpu.

Functions like xdmp:pdf-convert can often be fairly slow. It depends on an external plugin for starters, but also imagine it has to process a 200 page PDF. Binaries can be large. You'll want to pace down to process them. If using -transaction_size 1 -batch_size 1 -thread_count 1 makes your transforms work, you really were facing timeouts, and may have been flooding your server. From there you can look at increasing some numbers, but binary sizes can be unpredictable, so be conservative.

It might also be worth looking at doing heavy processing asynchronously, for instance using CPF, the Content Processing Framework. It is a very robust implementation for processing content, and is designed to survive server restarts.

HTH!

Source https://stackoverflow.com/questions/51985912

Community Discussions, Code Snippets contain sources that include Stack Exchange Network