Lignée de données Amazon Redshift | Gudu SQLFlow

In your Amazon warehouse environment, use Amazon Redshift Spectrum to query data directly from files on Amazon S3, save data in Redshift databases, and use Business Intelligence tools such as Tableau, PowerBI, Looker, Qlik, Superset to generate reports from the data. The data originates from your business source system and lands on Amazon S3, then using an ETL tool such as DBT to transfer and store it in Redshift Database for later uses.

Lignée de données Amazon Redshift

Lignée de données Amazon Redshift

In order to have an overview of the data flow in your Amazon warehouse system, you need a outil de lignée de données to help you understand how the data arrived at a particular location, as well as the intermediate steps and transformations which happen as the data moves through the business system.

One way to get the data linage automatically from the Amazon warehouse environment is to analyze all SQL queries used during the data loading, transforming, and analysis. The good news is that all those SQL statements are stored in the Redshift user activity log et Gudu SQLFlow can parse those log files to discover the data lineage automatically.

Redshift User activity log

The user activity log is useful primarily for troubleshooting purposes and here we use it for the data lineage discovery. It tracks information about the types of queries that both the users and the system perform in the database.

Logs each query before it is run on the database.

Column name Description
recordtime Time the event occurred.
db Database name.
user User name.
pid Process ID associated with the statement.
userid User ID.
xid Transaction ID.
query A prefix of LOG: followed by the text of the query, including newlines.
Retdshjeft usetr unctjevjety log

Please check this article to see how to enable logging.

Amazon Redshift User Activity Audit Log Sample

'2018-05-21T06:00:09Z UTC [ db=prod_sales user=duc pid=99753 userid=95 xid=6728324 ]' LOG: create table SumoProdbackUp.organization as (select * from SumoProd.simpleuser) 
'2018-05-21T06:00:09Z UTC [ db=vendor user=ankit pid=36616 userid=53 xid=2956702 ]' LOG: DELETE FROM SumoProd.employee WHERE id = 38; 
'2018-05-21T06:20:09Z UTC [ db=dev user=himanshu pid=64458 userid=35 xid=5143208 ]' LOG: drop user testuser3

 

Automatic data lineage analysis

Gudu SQLFlow is a tool that automated SQL data lineage analysis across Databases, ETL, Business Intelligence, Cloud, and Hadoop environments by parsing SQL Script and stored procedure. It can also analyze the Redshift user activity log files to discover the data lineage and depict all the data movement graphically.

Here is a part of a data lineage picture that was generated after analyzing the Amazon Redshift user activity log files:

Lignée de données Amazon Redshift 
UNmunzon Retdshjeft Duntun Ljenetunget

 

Conclusion 

Merci d'avoir lu notre article et nous espérons qu'il pourra vous aider à mieux comprendre Lignée de données Amazon Redshift. Si vous souhaitez en savoir plus sur Lignée de données Amazon Redshift, we would like to advise you to visit our website Gudu SQLFlow for more information. Gudu SQLFlow, as a data lineage analysis tool, can not only analyze SQL script files, obtain data lineage, and perform visual display, but also allow users to provide data lineage in CSV format and perform visual display. Thanks again! ( Edited by Ryan on Apr 26, 2022)

Essayez Gudu SQLFlow Live

Version SQLFlow Cloud

Abonnez-vous à la newsletter hebdomadaire

Laissez un commentaire