1. General SQL Parser Java version benchmark

    General SQL Parser benchmark The benchmark of General SQL Parser Java version using the single thread mode and multi-thread mode. General SQL Parser is not guaranteed to be thread safe, so you need to create a new instance of General SQL Parser in each thread. Machine: Win 10, Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz, 32G memory. Software: JDK1.8, General SQL Parser Java version 2.3.6.5 Result of General SQL Parser Java version =======================Benchmark 1k size file============================================= Main thread: load sql parser 1.029 seconds Parsing query of size 1k, repeated 100 times,... Main thread: Takes 0.014 seconds to parse query of 1k, Total times: 1.432 seconds for query of 1k, repeated 100 times,. Multi thread mode: Thread 4, parse query of size 1k, takes 0.002 seconds, repeated in 100 times, takes 0.204 seconds Thread 2, parse query of size 1k, takes 0.002 seconds, repeated in 100 times, takes 0.214 seconds Thread 3, parse query of size 1k, takes 0.003 seconds, repeated in 100 times, takes 0.279 seconds Thread 1, parse query of size 1k, takes 0.003 seconds, repeated in 100 times, takes 0.283 seconds Thread 5, parse query of size 1k, takes 0.003 seconds, repeated in 100 times, takes 0.306 seconds =======================Benchmark 10k size file============================================= Main thread: load sql parser 0.000 seconds Parsing query of size 11k, repeated 100 times,... Main thread: Takes 0.009 seconds to parse query of 11k, Total times: 0.856 seconds for query of 11k, repeated 100 times,. Multi thread mode: Thread 2, parse query of size 11k, takes 0.011 seconds, repeated in 100 times, takes 1.070 seconds Thread 1, parse query of size 11k, takes 0.011 seconds, repeated in 100 times, takes 1.095 seconds Thread 4, parse query of size 11k, takes 0.011 seconds, repeated in 100 times, takes 1.055 seconds Thread 5, parse query of size 11k, takes 0.010 seconds, repeated in 100 times, takes 1.022 seconds Thread 3, parse query of size 11k, takes 0.012 seconds, repeated in 100 times, takes 1.155 seconds =======================Benchmark 100k size file============================================= Main thread: load sql parser 0.000 seconds Parsing query of size 100k, repeated 100 times,... Main thread: Takes 0.028 seconds to parse query of 100k, Total times: 2.841 seconds for query of 100k, repeated 100 times,. Multi thread mode: Thread 4, parse query of size 100k, takes 0.036 seconds, repeated in 100 times, takes 3.622 seconds Thread 5, parse query of size 100k, takes 0.036 seconds, repeated in 100 times, takes 3.641 seconds Thread 1, parse query of size 100k, takes 0.036 seconds, repeated in 100 times, takes 3.644 seconds Thread 2, parse query of size 100k, takes 0.037 seconds, repeated in 100 times, takes 3.669 seconds Thread 3, parse query of size 100k, takes 0.037 seconds, repeated in 100 times, takes 3.684 seconds Source code of this benchmark tool Source code

    2021/11/04 GSP

  2. SQLFlow apply premimu account for free

    Apply 30 days SQLFlow cloud premium account At first, please register a free account if you don’t have one. Once your account is activated, please visit this page to apply for a free coupon. In about 5 minutes,you will receive an email including the coupon. Login here https://sqlflow.gudusoft.com , click the user icoin at the upper right: Click Account menu item, then click “activate by entering a coupon” button, enter the coupon.

    2021/10/19 SQLFlow

  3. SQLFlow widget - package the data lineage into a standalone web app

    SQLFlow is a powerful web application that automated SQL data lineage analysis across Databases, ETL, Business Intelligence, Cloud and Hadoop environments by parsing SQL Script and stored procedure. It depicts all the data movement graphically.

    2021/10/06 SQLFlow Widget Newsletter

  4. SQLFlow widget - embed a specific table/column data lineage graphics on your web page

    Now you have successfully set up SQLFlow on-premise version on your server, and it periodically fetches SQL script from the database instance or a git server, generate the data lineage on the backend, and you can view the data lineage graphically on your browser.

    2021/10/02 SQLFlow Widget Newsletter

  5. General SQL Parser library used with Java 9, missing jaxb files

    The General SQL Parser library Java version works with the Java 1.8 or higher version, for both open source OpenJDK or Oracle Java.

    2021/10/01 General SQL Parser

  6. SQLFlow Widget - a javascript library that enables your web page to visualize the data lineage instantly.

    SQLFlow Widget SQLFlow Widget is a javascript library that enables your web page to visualize the data lineage instantly.

    2021/09/13 SQLFlow Widget Newsletter

  7. Grabit connects to Snowflake using key pair authentication

    Snowflake supports using key pair authentication for enhanced authentication security as an alternative to basic authentication (i.e. username and password). Grabit supports using key pair authentication as well. In order to connect to Snowflake using key pair authentication, you need to configure Key Pair Authentication first. Configure the grabit After that, you may use a grabit config file like this to connect to Snowflake: { "databaseServer":{ "hostname":"127.0.0.1", "port":"433", "username":"USERNAME", "privateKeyFile":"/data/rsa_key.p8", "privateKeyFilePwd":"123456", "database":"DATABASE", "enableQueryHistory":true, "queryHistoryBlockOfTimeInMinutes":30, "queryHistorySqlType":"SELECT,UPDATE", "snowflakeDefaultRole":"custom_role" }, "SQLFlowServer":{ "server":"http:127.0.0.1", "serverPort":"8081", "userId":"gudu|0123456789", "userSecret":"" }, "SQLScriptSource":"database", "databaseType":"snowflake" } privateKeyFile, this is the file that store the private key. privateKeyFilePwd, this is the password used when generate the key, leave it empty if not used. other optins, please check this document for detailed explanation. Fetch SQL from qurey history In order to fetch SQL from query history, we need to create a role that able to access SNOWFLAKE database, and this role is able to access WAREHOUSE as well. For example, we create a role: custom_role, and assign privileges of test_warehouse to this role like this: use role accountadmin; grant imported privileges on database snowflake to role custom_role; use role custom_role; use role sysadmin GRANT ALL PRIVILEGES ON WAREHOUSE test_warehouse TO role custom_role; Official Document about QUERY_HISTORY

    2021/09/07 Grabit SQLFlow

  8. Discover data lineage of SQL in CSV file

    SQLFlow can discover data lineage of SQL queries in a CSV file. CSV file format In order to discover data lineage from SQL queries, at least one column that includes the SQL query must be presented in the CSV file. Other columns such as database, schema, object type, and object name are helpful to help SQLFlow to make a more precise data lineage analysis. But there are not mandatory. Here is the common structure of a CSV file that may submit to SQLFlow for the data lineage analysis. Database,Schema,ObjectType,ObjectName,ObjectCode,Notes Your CSV file may have a different structure, as long as the column incudes the SQL query exits, and you tell the SQLFlow the column position of this SQL query, then it can be analyzed correctly. column information database: database that this SQL query will run against. schema: schema that includes database objects in the SQL query. objectType: type of the SQL query such as procedure, function, trigger, view. objectName: name of the SQL query. objectCode: the SQL query code. Notes: comments for this SQL query. Position of the columns in the CSV file In order to help SQLFlow understand your CSV file correctly, you need to tell the column position of each predefined column through the UI interface of the SQLFlow. Please note that the column name needn’t be the exact same as the predefined one, the SQLFlow only cares about the position. The default columns required is in a format like this: Database,Schema,ObjectType,ObjectName,ObjectCode,Notes if your CSV file has the same format, you may tell SQLFlow that Database is in column 1 Schema is in column 2 ObjectType is in column 3 ObjectName is in column 4 ObjectCode is in column 5 Notes is in column 6 If you have a CSV file in format like this: catalog schema type code database1 schema1 procedure “create procedure test_db.test_schema.PRC_STG(IN register_batch_id INT64) BEGIN # STEP 1 # Declare variables –DECLARE date_to_query DATE; –DECLARE hour_to_query INT64; DECLARE v_insert_time DATETIME; ….” As you can see, the catalog (AKA database) is at the first column; the schema is at the second column, object type is at the third column, and query code is at the fourth column. In this CSV file, other columns such as ObjectType, Notes in the default CSV file format don’t exist. objectCode column ObjectCode column includes the SQL query. Usually, the SQL query will be in multi lines, So it should be enclosed with an enclosed char, which is " by default. If the SQL query is enclosed with other char, then it must be set using objectCodeEncloseChar. The objectCodeEncloseChar itself inside the SQL query should be escaped by objectCodeEscapeChar. The default objectCodeEscapeChar is ". enclosed char of other columns Besides the objectCode column, if other column is enclosed by the objectCodeEncloseChar as well, SQLFlow will remove this char automatically. Set the CSV format Upload a CSV file by creating a SQLFlow job. After upload the CSV file, click “This is a csv file” checkbox. Tells the SQLFlow column position. 0 means this columns doesn’t exist in CSV file.

    2021/09/05 SQLFlow