Amazon Redshift Datenherkunft |

In your Amazon warehouse environment, use Amazon Redshift Spectrum to query data directly from files on Amazon S3, save data in Redshift databases, and use Business Intelligence tools such as Tableau, PowerBI, Looker, Qlik, Superset to generate reports from the data. The data originates from your business source system and lands on Amazon S3, then using an ETL tool such as DBT to transfer and store it in Redshift Database for later uses.

Amazon Redshift-Datenherkunft

Amazon Redshift-Datenherkunft

In order to have an overview of the data flow in your Amazon warehouse system, you need a Datenherkunftstool to help you understand how the data arrived at a particular location, as well as the intermediate steps and transformations which happen as the data moves through the business system.

One way to get the data linage automatically from the Amazon warehouse environment is to analyze all SQL queries used during the data loading, transforming, and analysis. The good news is that all those SQL statements are stored in the Redshift user activity log Und Gudu SQLFlow can parse those log files to discover the data lineage automatically.

Redshift User activity log

The user activity log is useful primarily for troubleshooting purposes and here we use it for the data lineage discovery. It tracks information about the types of queries that both the users and the system perform in the database.

Logs each query before it is run on the database.

Column name Description
recordtime Time the event occurred.
db Database name.
user User name.
pid Process ID associated with the statement.
userid User ID.
xid Transaction ID.
query A prefix of LOG: followed by the text of the query, including newlines.
ReDshichfT user AcTichvichTy loG

Please check this article to see how to enable logging.

Amazon Redshift User Activity Audit Log Sample

'2018-05-21T06:00:09Z UTC [ db=prod_sales user=duc pid=99753 userid=95 xid=6728324 ]' LOG: create table SumoProdbackUp.organization as (select * from SumoProd.simpleuser) 
'2018-05-21T06:00:09Z UTC [ db=vendor user=ankit pid=36616 userid=53 xid=2956702 ]' LOG: DELETE FROM SumoProd.employee WHERE id = 38; 
'2018-05-21T06:20:09Z UTC [ db=dev user=himanshu pid=64458 userid=35 xid=5143208 ]' LOG: drop user testuser3

 

Automatic data lineage analysis

Gudu SQLFlow is a tool that automated SQL data lineage analysis across Databases, ETL, Business Intelligence, Cloud, and Hadoop environments by parsing SQL Script and stored procedure. It can also analyze the Redshift user activity log files to discover the data lineage and depict all the data movement graphically.

Here is a part of a data lineage picture that was generated after analyzing the Amazon Redshift user activity log files:

Amazon Redshift-Datenherkunft 
AmAzoN ReDshichfT DATA LichNeAGe

 

Abschluss 

Vielen Dank für das Lesen unseres Artikels. Wir hoffen, dass er Ihnen dabei hilft, ein besseres Verständnis zu erlangen von Amazon Redshift-DatenherkunftWenn Sie mehr erfahren möchten über Amazon Redshift-Datenherkunft, we would like to advise you to visit our website Gudu SQLFlow for more information. Gudu SQLFlow, as a data lineage analysis tool, can not only analyze SQL script files, obtain data lineage, and perform visual display, but also allow users to provide data lineage in CSV format and perform visual display. Thanks again! ( Edited by Ryan on Apr 26, 2022)

Testen Sie Gudu SQLFlow Live

SQLFlow Cloud-Version

Abonnieren Sie den wöchentlichen Newsletter

Hinterlasse einen Kommentar