SQLFlow can discover data lineage of SQL queries in a CSV file.
CSV file format
In order to discover data lineage from SQL queries, at least one column that includes the SQL query must be presented in the CSV file.
Other columns such as database, schema, object type, and object name are helpful to help SQLFlow to make a more precise data lineage analysis. But there are not mandatory.
Here is the common structure of a CSV file that may submit to SQLFlow for the data lineage analysis.
Your CSV file may have a different structure, as long as the column incudes the SQL query exits, and you tell the SQLFlow the column position of this SQL query, then it can be analyzed correctly.
- database： database that this SQL query will run against.
- schema： schema that includes database objects in the SQL query.
- objectType： type of the SQL query such as procedure, function, trigger, view.
- objectName： name of the SQL query.
- objectCode： the SQL query code.
- Notes： comments for this SQL query.
Position of the columns in the CSV file
In order to help SQLFlow understand your CSV file correctly, you need to tell the column position of each predefined column through the UI interface of the SQLFlow.
Please note that the column name needn’t be the exact same as the predefined one, the SQLFlow only cares about the position.
The default columns required is in a format like this:
if your CSV file has the same format, you may tell SQLFlow that
Database is in column 1 Schema is in column 2 ObjectType is in column 3 ObjectName is in column 4 ObjectCode is in column 5 Notes is in column 6
If you have a CSV file in format like this:
|database1||schema1||procedure||“create procedure |
As you can see, the catalog (AKA database) is at the first column; the schema is at the second column, object type is at the third column, and query code is at the fourth column.
In this CSV file, other columns such as ObjectType, Notes in the default CSV file format don’t exist.
ObjectCode column includes the SQL query. Usually, the SQL query will be in multi lines, So it should be enclosed with an enclosed char, which is
" by default.
If the SQL query is enclosed with other char, then it must be set using
objectCodeEncloseChar itself inside the SQL query should be escaped by
objectCodeEscapeChar. The default
enclosed char of other columns
Besides the objectCode column, if other column is enclosed by the
objectCodeEncloseChar as well, SQLFlow will remove this char automatically.
Set the CSV format
- Upload a CSV file by creating a SQLFlow job.
- After upload the CSV file, click “This is a csv file” checkbox.
- Tells the SQLFlow column position. 0 means this columns doesn’t exist in CSV file.