dbt-external-tables | dbt macros to stage external sources | Frontend Utils library
kandi X-RAY | dbt-external-tables Summary
kandi X-RAY | dbt-external-tables Summary
dbt macros to stage external sources
Support
Quality
Security
License
Reuse
Top functions reviewed by kandi - BETA
Currently covering the most popular Java, JavaScript and Python libraries. See a Sample of dbt-external-tables
dbt-external-tables Key Features
dbt-external-tables Examples and Code Snippets
Community Discussions
Trending Discussions on dbt-external-tables
QUESTION
I have a simple 9 column report that I'm sideloading into bigquery via the dbt-external-tables module.
...ANSWER
Answered 2021-Feb-11 at 11:14As I understand it, all BigQuery external tables pointing to Cloud Storage data have an additional pseudo-column _FILE_NAME
(docs). There's no need to include it in your external table definition, you can simply query it downstream:
QUESTION
I have the following issue:
- I have an AWS S3 pipeline that on a daily basis a single json.gz files is spit.
- I wish to take that file with dbt and put it into snowflake (no snowpipe use atm)
I have managed to do this by creating a storage integration and I have manually created with my role (used for running dbt) a schema and assing usage on that schema. So far so good.
Then I read about this:
https://github.com/fishtown-analytics/dbt-external-tables
Problem is that this is the only way this runs properly, I had to alter my dbt profiles.yml, set the default schema to be S3_MIXPANEL with default database RAW_DEV, run a different target and role on that with --target 'ingest_dev' parameter.
I keep thinking that there should be a more sophisticated solution, where I can create schema's and query metadata and use something like {{ source() }} so I can point my documentation somehow that this is an external source. This dbt-external-tables is not really well explained for my case here I think?
Please can anyone help me and share how to create schemas and query from external stages properly without changing default schema macro & dbtprofiles.yml each time?
I have succeeded to run the following code:
...ANSWER
Answered 2020-Jul-31 at 15:47As the maintainer of the dbt-external-tables
package, I'll share its opinionated view. The package believes that you should stage all external sources (S3 files) as external tables or with snowpipes first, in a process that includes as little confounding logic as possible. Then you can select from them, as sources, in dbt models, alongside all requisite business logic.
If my understanding is correct, you would stage your mixpanel data as below, in a file called (e.g.) models/staging/mixpanel/src_mixpanel.yml:
QUESTION
I'm trying to set up a simple DBT pipeline that uses a parquet tables stored on Azure Data Lake Storage and creates another tables that is also going to be stored in the same location.
Under my models/
(which is defined as my sources path) I have 2 files datalake.yml
and orders.sql
. datalake.yml
looks like this:
ANSWER
Answered 2020-Aug-19 at 16:25This might not yet be available. Looks like it's still an open issue with no dev work so far.
Related issue for dbt-external-tables
package repo: Support Spark external tables
Do you have the dependencies from dbt-spark
installed?
Here are some relevant issues there:
Spark_connection_url do not contain workspace_id while connecting to databricks
I realize those aren't exactly help with the easy dbt-external-tables
use-case but it looks like development is still on-going to support the azure databricks / datalake stack.
Gonna try to dig into this a bit more later because this is a use-case that's relevant to me also.
Community Discussions, Code Snippets contain sources that include Stack Exchange Network
Vulnerabilities
No vulnerabilities reported
Install dbt-external-tables
The macros assume that you:.
Have already created your database's required scaffolding for external resources:
an external stage (Snowflake)
an external schema + S3 bucket (Redshift Spectrum)
an external data source and file format (Synapse)
an external data source and databse-scoped credential (Azure SQL)
a Google Cloud Storage bucket (BigQuery)
an accessible set of files (Spark)
Have the appropriate permissions on to create tables using that scaffolding
Have already created the database/project and/or schema/dataset in which dbt will create external tables (or snowpiped tables)
Support
Reuse Trending Solutions
Find, review, and download reusable Libraries, Code Snippets, Cloud APIs from over 650 million Knowledge Items
Find more librariesStay Updated
Subscribe to our newsletter for trending solutions and developer bootcamps
Share this Page