{"id":5110,"date":"2022-07-14T07:53:44","date_gmt":"2022-07-14T15:53:44","guid":{"rendered":"https:\/\/www.gudusoft.com\/?p=5110"},"modified":"2022-07-15T05:12:40","modified_gmt":"2022-07-15T13:12:40","slug":"best-open-source-data-lineage-tools","status":"publish","type":"post","link":"https:\/\/www.gudusoft.com\/pt\/best-open-source-data-lineage-tools\/","title":{"rendered":"5 melhores ferramentas de linhagem de dados de c\u00f3digo aberto para considerar em 2022"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"background-color: rgba(255,255,255,0);background-position: center center;background-repeat: no-repeat;border-width: 0px 0px 0px 0px;border-color:#e8eaf0;border-style:solid;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start\" style=\"max-width:1310.4px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\"><div class=\"fusion-column-wrapper fusion-flex-justify-content-flex-start fusion-content-layout-column\" style=\"background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;padding: 0px 0px 0px 0px;\"><div class=\"fusion-text fusion-text-1\" style=\"line-height:26px;\"><h2>5 melhores ferramentas de linhagem de dados de c\u00f3digo aberto para considerar em 2022<\/h2>\n<p>A ess\u00eancia de <a href=\"https:\/\/www.gudusoft.com\/pt\/o-que-e-governanca-de-dados\/\"><strong>governan\u00e7a de dados<\/strong><\/a> \u00e9 ajudar as empresas a criar pol\u00edticas de dados e garantir que as pessoas possam cumpri-las. Essas pol\u00edticas abordam uma s\u00e9rie de processos relacionados a dados, incluindo diretrizes para prote\u00e7\u00e3o, verifica\u00e7\u00e3o e uso de dados. <a href=\"https:\/\/www.gudusoft.com\/pt\/administradores-de-dados\/\"><strong>Administradores de dados<\/strong><\/a> deve solicitar requisitos de dados de usu\u00e1rios empresariais e trabalhar com os membros do conselho de governan\u00e7a de dados para concordar com defini\u00e7\u00f5es comuns de dados, especificar <a href=\"https:\/\/www.gudusoft.com\/pt\/como-melhorar-a-qualidade-dos-dados\/\"><strong>qualidade dos dados<\/strong><\/a> m\u00e9tricas, articular pol\u00edticas relevantes e desenvolver m\u00e9todos para medir a conformidade.<\/p>\n<div id=\"attachment_5114\" style=\"width: 919px\" class=\"wp-caption alignnone\"><img aria-describedby=\"caption-attachment-5114\" decoding=\"async\" class=\"size-full wp-image-5114\" src=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Lineage_Tools.png\" alt=\"Melhores ferramentas de linhagem de dados de c\u00f3digo aberto\" width=\"909\" height=\"521\" srcset=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Lineage_Tools-200x115.png 200w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Lineage_Tools-300x172.png 300w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Lineage_Tools-400x229.png 400w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Lineage_Tools-600x344.png 600w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Lineage_Tools-768x440.png 768w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Lineage_Tools-800x459.png 800w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Lineage_Tools.png 909w\" sizes=\"(max-width: 909px) 100vw, 909px\" \/><p id=\"caption-attachment-5114\" class=\"wp-caption-text\">Melhores ferramentas de linhagem de dados de c\u00f3digo aberto<\/p><\/div>\n<p>However, building a bridge between defining data governance policies and implementing them is often a formidable challenge. The purpose of these strategies is to control and monitor the quality of data assets across business workflows, but data stewards with key data quality management responsibilities are often not properly trained or qualified. This is where the <a href=\"https:\/\/www.dpriver.com\/blog\/2022\/05\/11\/best-data-lineage-tools\/\"><strong>ferramenta de linhagem de dados<\/strong><\/a> comes in. In this article, we will introduce<strong> 5 best open source data lineage tools<\/strong> that can be found on the market of 2022.<\/p>\n<h3>Best Open Source Data Lineage Tools &#8211; 1. Tokern<\/h3>\n<p><strong>Tokern Overview:<\/strong><\/p>\n<p>Tokern is built for cloud <strong><a href=\"https:\/\/www.gudusoft.com\/pt\/o-que-e-um-data-warehouse\/\">armaz\u00e9ns de dados<\/a><\/strong> e <strong><a href=\"https:\/\/www.gudusoft.com\/pt\/o-que-e-um-data-lake\/\">lagos de dados<\/a><\/strong>, and takes a dedicated approach to enabling you to obtain column-level data lineage from databases and data warehouses hosted on Google BigQuery, AWS Redshift, and Snowflake. In addition, more resources such as SparkSQL, AWS Athena, and Presto are in development. Tokenn has considerable integration capabilities because it works well with most of the open source data catalogs and ETL frameworks.<\/p>\n<p><strong>Tokern Data Lineage Features: <\/strong><\/p>\n<p>Token was released a while ago and takes into account the latest data engineering and design patterns. One such example is that in addition to building <a href=\"https:\/\/www.gudusoft.com\/pt\/whats-data-lineage-why-important\/\"><strong>linhagem de dados<\/strong><\/a> from DBCAT (data directory), Tokern also allows you to build data lineage from query history or ETL scripts, making it ideal for BI and ETL tool integration. Tokenn stores the data catalog and lineage in a PostgreSQL database. Users can access this database for further analysis using SQL, or feed it into other visualization and analysis engines.<\/p>\n<p>The visualization engine Kedro-Viz and a network graph analysis library called NetworkX are behind Tokenn&#8217;s excellent visualization and analysis capabilities. These libraries help you track, visualize, and analyze column-level lineage data. You can also interact with lineage data using Token&#8217;s SDK or API.<\/p>\n<p>In addition to its state-of-the-art data lineage capabilities, Tokern uses PIICatcher to provide PII (Personally identifiable Information) and PHI (Personal health Information) detection. The built-in tool combines regular expressions with several standard NLP libraries for PII detection, such as Spacy and Stanford NER.<\/p>\n<h3>Best Open Source Data Lineage Tools &#8211; 2. Egeria<\/h3>\n<p><strong>Egeria Overview\uff1a<\/strong><\/p>\n<p>Described as the world&#8217;s first open source metadata standard, Egeria provides a way to seamlessly integrate data engineering tools for a reliable and consistent view of metadata. In addition to cataloging and searching metadata, the standard enables users to build more advanced solutions for data lineage tracing, data quality checking, PII identification, and more.<\/p>\n<p>Many data engineering architectures involve a great deal of avoidable chatter between various data tools. Egeria moves away from this and instead adopts a spoke-and-wheel model, where everything goes through Egeria. In this way, users need only use one tool to converse.<\/p>\n<p><strong>Egeria Data Lineage Features:<\/strong><\/p>\n<p>Data lineage in Egeria utilizes well-known open standards to capture and store a data lineage called OpenLineage. OpenLineage also gives you greater insight into your data by providing a horizontal and vertical pedigree of tracking data.<\/p>\n<p>Egeria listens for Kafka events emitted by the source system to capture data lineage information. After obtaining data lineage information, Egeria tells lineage managers to match and link lineage charts that Egeria cannot. After that, the lineage is good for commercial consumption.<\/p>\n<p>The data lineage capabilities in Egeria are well aligned with the capabilities of data discovery and management, metadata provenance, and so on. These capabilities and Egeria&#8217;s lineage design and architecture make it a compelling and well-thought-out data governance and data lineage tool.<\/p>\n<h3>Best Open Source Data Lineage Tools &#8211; 3. Pachyderm<\/h3>\n<p><strong>Pachyderm Overview:<\/strong><\/p>\n<p>Like Tokenn we just mentioned, Pachyderm is another specialized data lineage tool. Rather than focusing on cloud data warehouses, it aims to enable developers to build machine learning pipelines in a language &#8211; and framework-independent way.<\/p>\n<p>It has implemented a version control system, such as lakeFS or Git, to maintain lineage of data objects. Changes to these objects (think commit) are captured and stored by Pachyderm to maintain a complete and immutable audit trail of events. Audit trails enable you to have a data lineage map for viewing and analysis, and allow you to reproduce data and code at any point in time for debugging or compliance reasons.<\/p>\n<p><strong>Pachyderm Data Lineage Features:<\/strong><\/p>\n<p>To achieve seamless data lineage tracking and versioning of data, Pachyderm uses a central repository that uses object stores such as AWS S3 in a custom file system called PFS (Pachyderm File System). PFS helps your object store (such as S3) become the only true source of your data with its complete history.<\/p>\n<p>Pachyderm also enforces invariance in your data source, which allows it to assign global ids to lineage events and data objects. Pachyderm allows you to treat immutable data lineage diagrams as DAGs in the UI. Both of these features are beneficial when working with ML pipes, and you want to trace the results back to their inputs.<\/p>\n<p>Pachyderm integrates with the most widely used databases, data warehouses, and data lakes. In addition, you can import data from any database into Pachyderm using an SQL-based ingestion tool. However, Pachyderm has limitations as a general-purpose data lineage tool, which is why most of Pachyderm&#8217;s enterprise customers use it to handle MLOps, unstructured data ETL, and NLP workloads.<\/p>\n<h3><strong>Best Open Source Data Lineage Tools &#8211; 4. OpenLineage<\/strong><\/h3>\n<p><strong>OpenLineage Overview:<\/strong><\/p>\n<p>OpenLineage was founded by DataKin, the company responsible for taking over Marquez&#8217;s development, after WeWork opened it. DataKin turned over the OpenLineage project to the Linux Foundation as a sandbox project in mid-2021. Highly inspired by the ubiquitous OpenTelemetry in the field of data observability, OpenLineage aims to establish an open standard for data lineage collection and analysis.<\/p>\n<p><strong>OpenLineage Features:<\/strong><\/p>\n<p>Integration is central to OpenLineage&#8217;s design and mission. It integrates with the ETL framework, data orchestration engine, metadata directory, data quality engine, and data lineage tools. OpenLineage uses JSONSchema as an API definition and supports various languages and frameworks. Egeria is one of the popular data tools, whose core metadata layer is built on OpenLineage.<\/p>\n<p>WeWork&#8217;s Marquez is also at the heart of OpenLineage&#8217;s architecture, as Marquez provides the UI and metadata repository, and the metadata collection API comes from OpenLineage. OpenLineage is also exposed to you via GraphQL and REST APIs.<\/p>\n<p>OpenLineage is an attractive choice because it can be easily used with most existing data engineering stacks and provides you with a wide range of exciting and valuable features so that you can comprehensively collect, track, and analyze data lineage.<\/p>\n<h3>Best Open Source Data Lineage Tools &#8211; 5. TrueDat<\/h3>\n<p><strong>TrueDat Overview:<\/strong><\/p>\n<p>As a complete data governance solution, TrueDat enables you to categorize, search, and track data in detail. With its data lineage capabilities, TrueDat can also help you visualize the entire life cycle of your data, giving you insight into the journey of your data over time.<\/p>\n<p>TrueDat was built by BlueTab (an IBM company) in 2017 and has been in active development since then, with its latest version, V4.39, released in March 2022.<\/p>\n<p><strong>TrueDat Data Lineage Features:<\/strong><\/p>\n<p>TrueDat allows you to use data lineage to analyze the impact of database changes and better understand your reporting business logic. It allows you to trace the lineage of a data object with point-in-time visibility. For advanced analysis, you can also apply filters to lineage objects to examine specific parts of the lineage diagram. In addition to the graphical representation that follows in the UI, you can download the collected data lineage information into a CSV file. Because TrueDat provides an excellent set of data governance and lineage capabilities, it is a real contender to solve your data lineage problems.<\/p>\n<h3>Conclus\u00e3o<\/h3>\n<p>Obrigado por ler nosso artigo e esperamos que ele possa ajud\u00e1-lo a encontrar o <strong>best open source data lineage tools<\/strong>. Se voc\u00ea quiser saber mais sobre a linhagem de dados, gostar\u00edamos de aconselh\u00e1-lo a visitar <a href=\"https:\/\/www.gudusoft.com\/pt\/\"><strong>Gudu SQLFlow<\/strong><\/a> para maiores informa\u00e7\u00f5es.<\/p>\n<p>Como um dos\u00a0<strong>melhores ferramentas de linhagem de dados<\/strong>\u00a0dispon\u00edvel no mercado hoje, o Gudu SQLFlow pode n\u00e3o apenas analisar arquivos de script SQL, obter\u00a0<strong>linhagem de dados<\/strong>, e executar exibi\u00e7\u00e3o visual, mas tamb\u00e9m permitir que os usu\u00e1rios forne\u00e7am linhagem de dados em formato CSV e executem exibi\u00e7\u00e3o visual.\u00a0<strong>(Published by Ryan on Jul 14, 2022)<\/strong><\/p>\n<\/div><\/div><\/div><style type=\"text\/css\">.fusion-body .fusion-builder-column-0{width:100% !important;margin-top : 0px;margin-bottom : 0px;}.fusion-builder-column-0 > .fusion-column-wrapper {padding-top : 0px !important;padding-right : 0px !important;margin-right : 1.92%;padding-bottom : 0px !important;padding-left : 0px !important;margin-left : 1.92%;}@media only screen and (max-width:1024px) {.fusion-body .fusion-builder-column-0{width:100% !important;}.fusion-builder-column-0 > .fusion-column-wrapper {margin-right : 1.92%;margin-left : 1.92%;}}@media only screen and (max-width:640px) {.fusion-body .fusion-builder-column-0{width:100% !important;}.fusion-builder-column-0 > .fusion-column-wrapper {margin-right : 1.92%;margin-left : 1.92%;}}<\/style><\/div><style type=\"text\/css\">.fusion-body .fusion-flex-container.fusion-builder-row-1{ padding-top : 0px;margin-top : 0px;padding-right : 0px;padding-bottom : 0px;margin-bottom : 0px;padding-left : 0px;}<\/style><\/div>","protected":false},"excerpt":{"rendered":"","protected":false},"author":27,"featured_media":5118,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[178],"tags":[286,137,155,55,139,285,136,59,210,288,75,290,289,287,291],"_links":{"self":[{"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/posts\/5110"}],"collection":[{"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/comments?post=5110"}],"version-history":[{"count":10,"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/posts\/5110\/revisions"}],"predecessor-version":[{"id":5123,"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/posts\/5110\/revisions\/5123"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/media\/5118"}],"wp:attachment":[{"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/media?parent=5110"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/categories?post=5110"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gudusoft.com\/pt\/wp-json\/wp\/v2\/tags?post=5110"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}