1. Visualize data lineage discovered in stored procedure

    Since SQLFlow ver4.9.5 (2022-02-13), it is quite easy to visualize SQL Server stored proceudre exported from a database by just selecting the procedure and click the visualize menu item in the popup menu. Analyze data lineage of your database 1. Connect to the database in the SQLFow After specify the database parameter, SQLFlow will connect to the database and fetch DDL automatically to analyze the data lineage. 2. Using grabit to connect to database You may also use the grabit tool to do this job automatically.

    2022/02/14 SQLFlow

  2. Continue processing a SQL file even if a parsing error occurs

    If there are multiple SQL statements in a SQL file. General SQL Parser will check the syntax of the SQL statements one by one, and continue processing a file even if a parsing error occurs. If there are multiple SQL syntax errors in a SQL file that includes multiple SQL statements, GSP will identify all the SQL syntax errors in this file so we can fix them all at once. Take this SQL file for example CREATE VIEW vsal as select * from emp; -- syntax error SELECT Count() total_count, SUM(sal) total_sal FROMs scott.emp WHERE city = 'NYC'; SELECT deptno, Count() num_emp, SUM(sal) sal_sum FROM scott.emp WHERE city = 'NYC' GROUP BY deptno; -- syntax error SELECT deptno, Count() num_emp, SUM(sal) sal_sum FROM scott.emp WHEREs city = 'NYC' GROUP BY deptno; SELECT a.deptno "Department", a.num_emp / b.total_count "Employees", a.sal_sum / b.total_sal "Salary" FROM (SELECT deptno, Count() num_emp, SUM(sal) sal_sum FROM scott.emp WHERE city = 'NYC' GROUP BY deptno) a, (SELECT Count() total_count, SUM(sal) total_sal FROM scott.emp WHERE city = 'NYC') b Here is the Java code illustrates how to check syntax of each SQL statement and report all syntax errors at once. static TStatementList parse1(EDbVendor db, String file) { int ret = 0; TGSqlParser sqlparser = new TGSqlParser(db); System.out.println("Parsing with " + db + " ..."); sqlparser.sqlfilename = file; ret = sqlparser.parse(); int totalError = 0; for (int j = 0; j < sqlparser.sqlstatements.size(); j++) { String syntaxSign = "Success\t"; if (sqlparser.sqlstatements.get(j).getSyntaxErrors().size() > 0){ syntaxSign = "Failed\t"; totalError++; } System.out.print(syntaxSign+"sql type: " + sqlparser.sqlstatements.get(j).sqlstatementtype + "(" + sqlparser.sqlstatements.get(j).getClass().toString() + ")" + ", " + sqlparser.sqlstatements.get(j).getStartToken().toString() + "(" + sqlparser.sqlstatements.get(j).getStartToken().lineNo + "," + sqlparser.sqlstatements.get(j).getStartToken().columnNo + ")"); System.out.print("\n"); for(int i=0;i<sqlparser.sqlstatements.get(j).getSyntaxHints().size();i++){ System.out.println( String.format("syntax hint:%s(%d,%d)" ,sqlparser.sqlstatements.get(j).getSyntaxHints().get(i).tokentext ,sqlparser.sqlstatements.get(j).getSyntaxHints().get(i).lineNo ,sqlparser.sqlstatements.get(j).getSyntaxHints().get(i).columnNo) +sqlparser.sqlstatements.get(j).getSyntaxHints().get(i).hint); } } if (ret == 0) { System.out.println("sqls: " + sqlparser.sqlstatements.size()); return sqlparser.sqlstatements; } else { System.out.println("error: " + sqlparser.getErrormessage()); System.out.println("syntax arrays: "+sqlparser.getSyntaxErrors().size()); System.out.println("total errors: "+totalError+"/"+sqlparser.sqlstatements.size()); return null; } } Report generated after parsing the SQL file: Success sql type: sstcreateview(class gudusoft.gsqlparser.stmt.TCreateViewSqlStatement), CREATE(1,1) Failed sql type: sstselect(class gudusoft.gsqlparser.stmt.TSelectSqlStatement), SELECT(5,1) Success sql type: sstselect(class gudusoft.gsqlparser.stmt.TSelectSqlStatement), SELECT(11,1) Failed sql type: sstselect(class gudusoft.gsqlparser.stmt.TSelectSqlStatement), SELECT(19,1) Success sql type: sstselect(class gudusoft.gsqlparser.stmt.TSelectSqlStatement), SELECT(27,1)

    2022/02/02 GSP

  3. How to use SQLEnv in General SQL Parser Java version to build correct relationship between unqualified column and table

    What’s SQLEnv SQLEnv is a Java class in the General SQL Parser(GSP) library that used to store the database metadata. With metadata in SQLEnv, GSP can build correct relationship between unqualified column and table. Take this SQL for example, select Quantity1,Quantity2 from aTab,bTab without the help of metadata, GSP doesn’t know which table column Quantity1 and Quantity2 belongs to. We can create a SQLEnv instance that provides the metadata to GSP like this: class TSQLServerEnv extends TSQLEnv{ public TSQLServerEnv(){ super(EDbVendor.dbvmssql); initSQLEnv(); } @Override public void initSQLEnv() { // add a new database: master TSQLCatalog sqlCatalog = createSQLCatalog("master"); // add a new schema: dbo TSQLSchema sqlSchema = sqlCatalog.createSchema("dbo"); //add a new table: aTab TSQLTable aTab = sqlSchema.createTable("aTab"); aTab.addColumn("Quantity1"); //add a new table: bTab TSQLTable bTab = sqlSchema.createTable("bTab"); bTab.addColumn("Quantity2"); } } and then assign this SQLEnv to GSP before parsing the SQL query. sqlparser.setSqlEnv(new TSQLServerEnv()); Database without schema Some databases such as MySQL, Hive don’t have schema in the database. When creating SQLEnv for this kind of databases, the schema object in the SQLEnv should be set to default. Take this Hive SQL for example: select column1, column2 from mydatabase.mytable1, mydatabase.mytable2 Here is an example for a SQLEnv that stores the Hive metadata. class THiveEnv extends TSQLEnv{ public THiveEnv(){ super(EDbVendor.dbvhive); initSQLEnv(); } @Override public void initSQLEnv() { // add a new database TSQLCatalog sqlCatalog = createSQLCatalog("mydatabase"); // hive doesn't has schema, we use a default schema TSQLSchema sqlSchema = sqlCatalog.createSchema("default"); //add a new table: mytable1 TSQLTable mytable1 = sqlSchema.createTable("mytable1"); mytable1.addColumn("column1"); //add a new table: mytable2 TSQLTable mytable2 = sqlSchema.createTable("mytable2"); mytable2.addColumn("column2"); } } Load metadata from database instance You can connect to the database instance to fetch the metadata and save it in a SqlEnv that can be used by GSP during the parsing of SQL script. TSQLDataSource datasource = createSQLDataSource(vendor, host, port, user, passowrd, database, schema); sqlenv = TSQLEnv.valueOf(datasource); Build table and column relationship without Metadata If there is no metadata available for SQL like this: select Quantity1,Quantity2 from aTab,bTab column Quantity1,Quantity2 will be saved in the TCustomSqlStatement.getOrphanColumns(). You may fetch those columns from this property for further processing as you like. Below is the Java code in gettablecolumns demo that link the orphan column to the first table in from clause. if (stmt.getOrphanColumns().size() > 0){ infos.append(numberOfSpace(pNest+1)+" orphan columns:"+newline); String oc = ""; for (int k=0;k<stmt.getOrphanColumns().size();k++){ TInfoRecord columnRecord = new TInfoRecord(EDbObjectType.column); columnRecord.setColumn(stmt.getOrphanColumns().getObjectName(k)); columnRecord.setFileName(this.sqlFileName); infoList.add(columnRecord); oc = stmt.getOrphanColumns().getObjectName(k).getColumnNameOnly();// stmt.getOrphanColumns().getObjectName(k).toString(); if (showColumnLocation){ infos.append(numberOfSpace(pNest+3)+oc+"("+stmt.getOrphanColumns().getObjectName(k).getLocation()+")"+newline); }else{ infos.append(numberOfSpace(pNest+3)+oc+newline); } if ((linkOrphanColumnToFirstTable)&&(stmt.getFirstPhysicalTable() != null)){ if ((listStarColumn) ||(!(oc.equalsIgnoreCase("*")))) fieldlist.add(stmt.getFirstPhysicalTable().toString() + dotChar + oc ); columnRecord.setTable(stmt.getFirstPhysicalTable()); }else { fieldlist.add("missed"+dotChar + oc+"("+stmt.getOrphanColumns().getObjectName(k).coordinate()+")" ); } tableColumnList.append(",missed"+dotChar+oc+newline); } } For the detailed Java demo, please check it here

    2021/12/28 GSP

  4. General SQL Parser Java version benchmark

    General SQL Parser benchmark The benchmark of General SQL Parser Java version using the single thread mode and multi-thread mode. General SQL Parser is not guaranteed to be thread safe, so you need to create a new instance of General SQL Parser in each thread. Machine: Win 10, Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz, 32G memory. Software: JDK1.8, General SQL Parser Java version 2.3.6.5 Result of General SQL Parser Java version =======================Benchmark 1k size file============================================= Main thread: load sql parser 1.029 seconds Parsing query of size 1k, repeated 100 times,... Main thread: Takes 0.014 seconds to parse query of 1k, Total times: 1.432 seconds for query of 1k, repeated 100 times,. Multi thread mode: Thread 4, parse query of size 1k, takes 0.002 seconds, repeated in 100 times, takes 0.204 seconds Thread 2, parse query of size 1k, takes 0.002 seconds, repeated in 100 times, takes 0.214 seconds Thread 3, parse query of size 1k, takes 0.003 seconds, repeated in 100 times, takes 0.279 seconds Thread 1, parse query of size 1k, takes 0.003 seconds, repeated in 100 times, takes 0.283 seconds Thread 5, parse query of size 1k, takes 0.003 seconds, repeated in 100 times, takes 0.306 seconds =======================Benchmark 10k size file============================================= Main thread: load sql parser 0.000 seconds Parsing query of size 11k, repeated 100 times,... Main thread: Takes 0.009 seconds to parse query of 11k, Total times: 0.856 seconds for query of 11k, repeated 100 times,. Multi thread mode: Thread 2, parse query of size 11k, takes 0.011 seconds, repeated in 100 times, takes 1.070 seconds Thread 1, parse query of size 11k, takes 0.011 seconds, repeated in 100 times, takes 1.095 seconds Thread 4, parse query of size 11k, takes 0.011 seconds, repeated in 100 times, takes 1.055 seconds Thread 5, parse query of size 11k, takes 0.010 seconds, repeated in 100 times, takes 1.022 seconds Thread 3, parse query of size 11k, takes 0.012 seconds, repeated in 100 times, takes 1.155 seconds =======================Benchmark 100k size file============================================= Main thread: load sql parser 0.000 seconds Parsing query of size 100k, repeated 100 times,... Main thread: Takes 0.028 seconds to parse query of 100k, Total times: 2.841 seconds for query of 100k, repeated 100 times,. Multi thread mode: Thread 4, parse query of size 100k, takes 0.036 seconds, repeated in 100 times, takes 3.622 seconds Thread 5, parse query of size 100k, takes 0.036 seconds, repeated in 100 times, takes 3.641 seconds Thread 1, parse query of size 100k, takes 0.036 seconds, repeated in 100 times, takes 3.644 seconds Thread 2, parse query of size 100k, takes 0.037 seconds, repeated in 100 times, takes 3.669 seconds Thread 3, parse query of size 100k, takes 0.037 seconds, repeated in 100 times, takes 3.684 seconds Source code of this benchmark tool Source code

    2021/11/04 GSP

  5. SQLFlow apply premimu account for free

    Apply 30 days SQLFlow cloud premium account At first, please register a free account if you don’t have one. Once your account is activated, please visit this page to apply for a free coupon. In about 5 minutes,you will receive an email including the coupon. Login here https://sqlflow.gudusoft.com , click the user icoin at the upper right: Click Account menu item, then click “activate by entering a coupon” button, enter the coupon.

    2021/10/19 SQLFlow

  6. SQLFlow widget - package the data lineage into a standalone web app

    SQLFlow is a powerful web application that automated SQL data lineage analysis across Databases, ETL, Business Intelligence, Cloud and Hadoop environments by parsing SQL Script and stored procedure. It depicts all the data movement graphically.

    2021/10/06 SQLFlow Widget Newsletter

  7. SQLFlow widget - embed a specific table/column data lineage graphics on your web page

    Now you have successfully set up SQLFlow on-premise version on your server, and it periodically fetches SQL script from the database instance or a git server, generate the data lineage on the backend, and you can view the data lineage graphically on your browser.

    2021/10/02 SQLFlow Widget Newsletter

  8. General SQL Parser library used with Java 9, missing jaxb files

    The General SQL Parser library Java version works with the Java 1.8 or higher version, for both open source OpenJDK or Oracle Java.

    2021/10/01 General SQL Parser

  9. SQLFlow Widget - a javascript library that enables your web page to visualize the data lineage instantly.

    SQLFlow Widget SQLFlow Widget is a javascript library that enables your web page to visualize the data lineage instantly.

    2021/09/13 SQLFlow Widget Newsletter

  10. Grabit connects to Snowflake using key pair authentication

    Snowflake supports using key pair authentication for enhanced authentication security as an alternative to basic authentication (i.e. username and password). Grabit supports using key pair authentication as well. In order to connect to Snowflake using key pair authentication, you need to configure Key Pair Authentication first. Configure the grabit After that, you may use a grabit config file like this to connect to Snowflake: { "databaseServer":{ "hostname":"127.0.0.1", "port":"433", "username":"USERNAME", "privateKeyFile":"/data/rsa_key.p8", "privateKeyFilePwd":"123456", "database":"DATABASE", "enableQueryHistory":true, "queryHistoryBlockOfTimeInMinutes":30, "queryHistorySqlType":"SELECT,UPDATE", "snowflakeDefaultRole":"custom_role" }, "SQLFlowServer":{ "server":"http:127.0.0.1", "serverPort":"8081", "userId":"gudu|0123456789", "userSecret":"" }, "SQLScriptSource":"database", "databaseType":"snowflake" } privateKeyFile, this is the file that store the private key. privateKeyFilePwd, this is the password used when generate the key, leave it empty if not used. other optins, please check this document for detailed explanation. Fetch SQL from qurey history In order to fetch SQL from query history, we need to create a role that able to access SNOWFLAKE database, and this role is able to access WAREHOUSE as well. For example, we create a role: custom_role, and assign privileges of test_warehouse to this role like this: use role accountadmin; grant imported privileges on database snowflake to role custom_role; use role custom_role; use role sysadmin GRANT ALL PRIVILEGES ON WAREHOUSE test_warehouse TO role custom_role; Official Document about QUERY_HISTORY

    2021/09/07 Grabit SQLFlow