{"id":3493,"date":"2022-04-25T07:19:32","date_gmt":"2022-04-25T15:19:32","guid":{"rendered":"https:\/\/www.gudusoft.com\/?p=3493"},"modified":"2022-05-19T01:40:00","modified_gmt":"2022-05-19T09:40:00","slug":"data-lineage-create-external-table","status":"publish","type":"post","link":"https:\/\/www.gudusoft.com\/de\/data-lineage-create-external-table\/","title":{"rendered":"Datenherkunft: Externe Tabelle erstellen | Gudu SQLFlow"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"background-color: rgba(255,255,255,0);background-position: center center;background-repeat: no-repeat;border-width: 0px 0px 0px 0px;border-color:#e8eaf0;border-style:solid;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start\" style=\"max-width:1310.4px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\"><div class=\"fusion-column-wrapper fusion-flex-justify-content-flex-start fusion-content-layout-column\" style=\"background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;padding: 0px 0px 0px 0px;\"><div class=\"fusion-text fusion-text-1\" style=\"line-height:26px;\"><h2><strong>Datenherkunft: Externe Tabelle erstellen | Gudu SQLFlow<\/strong><\/h2>\n<p>In the data warehouse, the original data is always from the mass storage such as Amazon S3, Google Cloud Storage, and Microsoft Azure, and those data will be loaded into the system such as BigQuery, Snowflake, Redshift, and Microsoft Azure. One of the methods used to load or use those data is using the create external table SQL statement. The data lineage from the external file to the external table can be captured easily by the <strong>Gudu SQLFlow<\/strong> by parsing the create external table SQL query. With this data on hand, <strong>Gudu SQLFlow<\/strong> greatly simplifies the ability to trace errors back to the root cause in a data analytics process.<\/p>\n<div id=\"attachment_3544\" style=\"width: 596px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-3544\" decoding=\"async\" class=\"size-full wp-image-3544\" src=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/Data_Lineage_Create_External_Table.png\" alt=\"Datenherkunft: Externe Tabelle erstellen\" width=\"586\" height=\"482\" srcset=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/Data_Lineage_Create_External_Table-200x165.png 200w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/Data_Lineage_Create_External_Table-300x247.png 300w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/Data_Lineage_Create_External_Table-400x329.png 400w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/Data_Lineage_Create_External_Table.png 586w\" sizes=\"(max-width: 586px) 100vw, 586px\" \/><p id=\"caption-attachment-3544\" class=\"wp-caption-text\">Datenherkunft: Externe Tabelle erstellen<\/p><\/div>\n<p>In the article, we will introduce the create external table SQL statement used in BigQuery, Snowflake, Redshift, and Microsoft Azure, and see the data lineage generated by the <strong>Gudu SQLFlow<\/strong> after analyzing the SQL script.<\/p>\n<h3>BigQuery create external table<\/h3>\n<p>External tables let BigQuery query data that is stored outside of BigQuery storage. For more information about external tables, see <strong><a href=\"https:\/\/cloud.google.com\/bigquery\/external-data-cloud-storage\">introduction to external data sources<\/a><\/strong>.<\/p>\n<p><strong>Gudu SQLFlow<\/strong> can analyze BigQuery create external table statement and create the <strong><a href=\"https:\/\/www.gudusoft.com\/de\/whats-data-lineage-why-important\/\">Datenherkunft<\/a><\/strong> after analyzing this SQL statement.<\/p>\n<p>Here is a BigQuery create external table SQL query:<\/p>\n<pre class=\"wp-block-code\"><code>CREATE OR REPLACE EXTERNAL TABLE dataset.CsvTable\r\n(\r\n  x INT64,\r\n  y STRING\r\n)\r\nOPTIONS (\r\n  format = 'CSV',\r\n  uris = ['gs:\/\/bucket\/path1.csv'],\r\n  field_delimiter = '|',\r\n  max_bad_records = 5\r\n);<\/code><\/pre>\n<p>The data lineage generated for the above SQL here, as you can see the gs:\/\/bucket\/path1.csv file stored at Google Storage is treated as the source of the dataset.CsvTable external table.<\/p>\n<figure class=\"wp-block-image size-large\"><\/figure>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><a href=\"https:\/\/sqlflow.gudusoft.com\"><img decoding=\"async\" width=\"439\" height=\"87\" class=\"wp-image-3576\" src=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-bigquery-create-external-table-1.png\" alt=\"Datenherkunft: Externe Tabelle erstellen\" srcset=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-bigquery-create-external-table-1-200x40.png 200w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-bigquery-create-external-table-1-300x59.png 300w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-bigquery-create-external-table-1-400x79.png 400w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-bigquery-create-external-table-1.png 439w\" sizes=\"(max-width: 439px) 100vw, 439px\" \/><\/a><figcaption>Datenherkunft: Externe Tabelle erstellen<\/figcaption><\/figure>\n<\/div>\n<h3 class=\"wp-block-heading\">Snowflake create external table<\/h3>\n<p>Snowflake creates an external table to reads data from a set of one or more files in a specified external stage and outputs the data in a single VARIANT column. <a class=\"rank-math-link\" href=\"https:\/\/docs.snowflake.com\/en\/sql-reference\/sql\/create-external-table.html\"><strong>Create external table statement<\/strong><\/a> creates a new external table in the current\/specified schema or replaces an existing external table.<\/p>\n<p>Before creating an external table, we need to <a class=\"rank-math-link\" href=\"https:\/\/docs.snowflake.com\/en\/sql-reference\/sql\/create-stage.html\"><strong>create an external stage<\/strong><\/a> for the storage location where the data files are stored.<\/p>\n<pre class=\"wp-block-code\"><code>create stage s1\r\n  url='s3:\/\/mybucket\/files\/logs\/'\r\n  ...\r\n  ;<\/code><\/pre>\n<p>Create the partitioned external table:<\/p>\n<pre class=\"wp-block-code\"><code>create external table et1(\r\n date_part date as to_date(split_part(metadata$filename, '\/', 3)\r\n   || '\/' || split_part(metadata$filename, '\/', 4)\r\n   || '\/' || split_part(metadata$filename, '\/', 5), 'YYYY\/MM\/DD'),\r\n timestamp bigint as (value:timestamp::bigint),\r\n col2 varchar as (value:col2::varchar))\r\n partition by (date_part)\r\n location=@s1\/logs\/\r\n auto_refresh = true\r\n file_format = (type = parquet)\r\n aws_sns_topic = 'arn:aws:sns:us-west-2:001234567890:s3_mybucket';<\/code><\/pre>\n<p>Gudu SQLFlow can process Snowflake create external table query statement and build a data lineage between the files in the external stage and the external table like this:<\/p>\n<figure class=\"wp-block-image size-large\"><\/figure>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter\"><a href=\"https:\/\/sqlflow.gudusoft.com\"><img decoding=\"async\" width=\"428\" height=\"108\" class=\"wp-image-3505\" src=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-snowflake-create-external-table.png\" alt=\"Datenherkunft: Externe Tabelle erstellen\" srcset=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-snowflake-create-external-table-200x50.png 200w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-snowflake-create-external-table-300x76.png 300w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-snowflake-create-external-table-400x101.png 400w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/04\/data-lineage-snowflake-create-external-table.png 428w\" sizes=\"(max-width: 428px) 100vw, 428px\" \/><\/a><figcaption>Datenherkunft: Externe Tabelle erstellen<\/figcaption><\/figure>\n<\/div>\n<p>In Snowflake, data from external files can also be moved to the table using <strong><a class=\"rank-math-link\" href=\"https:\/\/github.com\/sqlparser\/sqlflow_public\/tree\/master\/databases\/snowflake\/copy-from-json\">copy into<\/a><\/strong> SQL statement and Gudu SQLFlow can also detect the data lineage in the copy into SQL query statement.<\/p>\n<h2 class=\"wp-block-heading\">SQL Server and Azure Synapse Analytics<\/h2>\n<p>Create external table command creates an external table for PolyBase to access data stored in a Hadoop cluster or Azure blob storage PolyBase external table that references data stored in a Hadoop cluster or Azure blob storage.<\/p>\n<p>In Azure Synapse Analytics, the created external table:<\/p>\n<ul>\n<li>Query Hadoop or Azure blob storage data with Transact-SQL statements.<\/li>\n<li>Import and store data from Hadoop or Azure blob storage.<\/li>\n<li>Import and store data from Azure Data Lake Store.<\/li>\n<\/ul>\n<p>In order to create an external table, an external data source must be created at first:<\/p>\n<pre class=\"wp-block-code\"><code>CREATE EXTERNAL DATA SOURCE AzureDataLakeStore\r\nWITH (TYPE = HADOOP,\r\n      LOCATION = 'abfss:\/\/data@pbasetr.azuredatalakestore.net'\r\n)<\/code><\/pre>\n<p>Then, create an external table using this SQL query:<\/p>\n<pre class=\"wp-block-code\"><code>CREATE EXTERNAL TABLE [dbo].[DimProductexternal]\r\n( [ProductKey] [int] NOT NULL,\r\n  [ProductLabel] nvarchar NULL,\r\n  [ProductName] nvarchar NULL )\r\nWITH\r\n(\r\n    LOCATION='\/DimProduct\/' ,\r\n    DATA_SOURCE = AzureDataLakeStore ,\r\n    FILE_FORMAT = TextFileFormat ,\r\n    REJECT_TYPE = VALUE ,\r\n    REJECT_VALUE = 0\r\n) ;<\/code><\/pre>\n<p>The data lineage generated by Gudu SQLFlow after parsing the create external table statement is:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large is-resized\"><a href=\"https:\/\/sqlflow.gudusoft.com\"><img decoding=\"async\" class=\"wp-image-3752\" src=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-sqlserver-create-external-table-1-1024x119.png\" alt=\"\" width=\"899\" height=\"104\" srcset=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-sqlserver-create-external-table-1-200x23.png 200w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-sqlserver-create-external-table-1-300x35.png 300w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-sqlserver-create-external-table-1-400x46.png 400w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-sqlserver-create-external-table-1-600x69.png 600w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-sqlserver-create-external-table-1-768x89.png 768w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-sqlserver-create-external-table-1-800x93.png 800w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-sqlserver-create-external-table-1-1024x119.png 1024w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-sqlserver-create-external-table-1.png 1080w\" sizes=\"(max-width: 899px) 100vw, 899px\" \/><\/a><figcaption>SQL Server create exernal table data lineage<\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\">AWS Redshift create external table<\/h2>\n<p>You can create an external table in Amazon Redshift, AWS Glue, Amazon Athena, or an Apache Hive metastore.\u00a0If your external table is defined in AWS Glue, Athena, or a Hive metastore, you first create an external schema that references the external database. Then you can reference the external table in your SELECT statement by prefixing the table name with the schema name, without needing to create the table in Amazon Redshift.\u00a0<\/p>\n<p>The following example creates a table named SALES in the Amazon Redshift external schema named\u00a0<code>spectrum<\/code>. The data is in tab-delimited text files.<\/p>\n<pre class=\"wp-block-code\"><code>create external table spectrum.sales(\r\nsalesid integer,\r\nlistid integer,\r\nsellerid integer,\r\nbuyerid integer,\r\neventid integer,\r\ndateid smallint,\r\nqtysold smallint,\r\npricepaid decimal(8,2),\r\ncommission decimal(8,2),\r\nsaletime timestamp)\r\nrow format delimited\r\nfields terminated by '\\t'\r\nstored as textfile\r\nlocation 's3:\/\/awssampledbuswest2\/tickit\/spectrum\/sales\/'\r\ntable properties ('numRows'='172000');<\/code><\/pre>\n<p>The data lineage generated by Gudu SQLFlow after parsing the above SQL is:<\/p>\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-large\"><a href=\"https:\/\/sqlflow.gudusoft.com\"><img decoding=\"async\" width=\"644\" height=\"216\" class=\"wp-image-3799\" src=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-redshift-create-external-table.png\" alt=\"\" srcset=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-redshift-create-external-table-200x67.png 200w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-redshift-create-external-table-300x101.png 300w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-redshift-create-external-table-400x134.png 400w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-redshift-create-external-table-600x201.png 600w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/05\/data-lineage-redshift-create-external-table.png 644w\" sizes=\"(max-width: 644px) 100vw, 644px\" \/><\/a><figcaption>Redshift Create External Table Data Lineage<\/figcaption><\/figure>\n<\/div>\n<h2 class=\"wp-block-heading\">Abschluss\u00a0<\/h2>\n<p>Thank you for reading our article and if it can help you to have a better understanding of <strong>how does data lineage create external table<\/strong>, we would be very happy. If you want to find more about <strong>data lineage create external table<\/strong>, we would like to advise you to visit our website <strong><a href=\"https:\/\/sqlflow.gudusoft.com\">Gudu SQLFlow<\/a><\/strong> for more information. (Edited by Ryan on Apr 25, 2022)<\/p>\n<\/div><\/div><\/div><style type=\"text\/css\">.fusion-body .fusion-builder-column-0{width:100% !important;margin-top : 0px;margin-bottom : 0px;}.fusion-builder-column-0 > .fusion-column-wrapper {padding-top : 0px !important;padding-right : 0px !important;margin-right : 1.92%;padding-bottom : 0px !important;padding-left : 0px !important;margin-left : 1.92%;}@media only screen and (max-width:1024px) {.fusion-body .fusion-builder-column-0{width:100% !important;}.fusion-builder-column-0 > .fusion-column-wrapper {margin-right : 1.92%;margin-left : 1.92%;}}@media only screen and (max-width:640px) {.fusion-body .fusion-builder-column-0{width:100% !important;}.fusion-builder-column-0 > .fusion-column-wrapper {margin-right : 1.92%;margin-left : 1.92%;}}<\/style><\/div><style type=\"text\/css\">.fusion-body .fusion-flex-container.fusion-builder-row-1{ padding-top : 0px;margin-top : 0px;padding-right : 0px;padding-bottom : 0px;margin-bottom : 0px;padding-left : 0px;}<\/style><\/div>","protected":false},"excerpt":{"rendered":"","protected":false},"author":27,"featured_media":3544,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[31],"tags":[81,83,79,55,75,82,80],"_links":{"self":[{"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/posts\/3493"}],"collection":[{"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/comments?post=3493"}],"version-history":[{"count":49,"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/posts\/3493\/revisions"}],"predecessor-version":[{"id":3898,"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/posts\/3493\/revisions\/3898"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/media\/3544"}],"wp:attachment":[{"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/media?parent=3493"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/categories?post=3493"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gudusoft.com\/de\/wp-json\/wp\/v2\/tags?post=3493"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}