{"id":5124,"date":"2022-07-15T20:26:50","date_gmt":"2022-07-16T04:26:50","guid":{"rendered":"https:\/\/www.gudusoft.com\/?p=5124"},"modified":"2022-07-15T20:26:50","modified_gmt":"2022-07-16T04:26:50","slug":"best-open-source-data-catalog-tools","status":"publish","type":"post","link":"https:\/\/www.gudusoft.com\/fr\/best-open-source-data-catalog-tools\/","title":{"rendered":"Les 4 meilleurs outils de catalogue de donn\u00e9es open source \u00e0 consid\u00e9rer en 2022"},"content":{"rendered":"<div class=\"fusion-fullwidth fullwidth-box fusion-builder-row-1 fusion-flex-container nonhundred-percent-fullwidth non-hundred-percent-height-scrolling\" style=\"background-color: rgba(255,255,255,0);background-position: center center;background-repeat: no-repeat;border-width: 0px 0px 0px 0px;border-color:#e8eaf0;border-style:solid;\" ><div class=\"fusion-builder-row fusion-row fusion-flex-align-items-flex-start\" style=\"max-width:1310.4px;margin-left: calc(-4% \/ 2 );margin-right: calc(-4% \/ 2 );\"><div class=\"fusion-layout-column fusion_builder_column fusion-builder-column-0 fusion_builder_column_1_1 1_1 fusion-flex-column\"><div class=\"fusion-column-wrapper fusion-flex-justify-content-flex-start fusion-content-layout-column\" style=\"background-position:left top;background-repeat:no-repeat;-webkit-background-size:cover;-moz-background-size:cover;-o-background-size:cover;background-size:cover;padding: 0px 0px 0px 0px;\"><div class=\"fusion-text fusion-text-1\" style=\"line-height:26px;\"><h2>4 Best Open Source Data Catalog Tools in 2022<\/h2>\n<p>Fundamentally, any data-driven organization needs <a href=\"https:\/\/www.gudusoft.com\/fr\/meilleurs-outils-de-catalogue-de-donnees\/\"><strong>data catalog tools<\/strong><\/a>. <strong><a href=\"https:\/\/www.gudusoft.com\/fr\/quest-ce-que-le-catalogue-de-donnees-pourquoi-les-donnees-datalog\/\">Catalogues de donn\u00e9es<\/a><\/strong> help create a single environment from which all of an organization&#8217;s data and context about that data can be accessed, ensuring that organizations can reduce their insight time and quickly make high-quality data-driven business decisions. If you are looking for the <strong>best open source data catalog tools<\/strong>, then you&#8217;ve come to the right place. In this post, we&#8217;ve compiled a list of the <strong>best open source data catalog tools in 2022<\/strong> to make your life easier.<\/p>\n<div id=\"attachment_5129\" style=\"width: 828px\" class=\"wp-caption aligncenter\"><img aria-describedby=\"caption-attachment-5129\" decoding=\"async\" class=\"size-full wp-image-5129\" src=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Catalog_Tools.png\" alt=\"Best Open Source Data Catalog Tools\" width=\"818\" height=\"471\" srcset=\"https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Catalog_Tools-200x115.png 200w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Catalog_Tools-300x173.png 300w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Catalog_Tools-400x230.png 400w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Catalog_Tools-600x345.png 600w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Catalog_Tools-768x442.png 768w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Catalog_Tools-800x461.png 800w, https:\/\/www.gudusoft.com\/wp-content\/uploads\/2022\/07\/Best_Open_Source_Data_Catalog_Tools.png 818w\" sizes=\"(max-width: 818px) 100vw, 818px\" \/><p id=\"caption-attachment-5129\" class=\"wp-caption-text\">Best Open Source Data Catalog Tools<\/p><\/div>\n<h3>Best Open Source Data Catalog Tools &#8211; 1. Apache Atlas<\/h3>\n<p>As an open source <strong><a href=\"https:\/\/www.gudusoft.com\/fr\/quest-ce-que-la-gestion-des-metadonnees\/\">gestion des m\u00e9tadonn\u00e9es<\/a><\/strong> tool and governance platform, Apache Atlas is incubated by Hortonworks under the umbrella of the Data Governance Initiative.<\/p>\n<p>It joined the Apache Foundation Incubator in 2015 and grew to a top program in 2017. Apache Atlas is widely recognized as one of the building blocks of modern data platforms because of its early vision of using metadata to solve the challenges of data cataloging, sorting, discovery, governance, and collaboration.<\/p>\n<p>Main capabilities of Apache Atlas:<\/p>\n<ol>\n<li><strong>Metadata classification:<\/strong> Apache Atlas enables you to automatically classify PII, sensitive data, and other sensitive data. Data assets can be associated with multiple classifications. These policies are also propagated through lineage to ensure that derived data inherits the same classification and security controls.<\/li>\n<li><strong>Metadata types and instances:<\/strong> According to the Apache documentation, a &#8220;type&#8221; is a definition of how a specific type of metadata object is stored and accessed in Atlas, which enables <a href=\"https:\/\/www.gudusoft.com\/fr\/gestionnaires-de-donnees\/\"><strong>gestionnaires de donn\u00e9es<\/strong><\/a> to define technical and business metadata.<\/li>\n<li><strong>Search and lineage:<\/strong> The intuitive UI in Apache Atlas allows predefined and temporary exploration of data types by type, category, attribute value, or free text. In addition, it retains a history of how the data source or explicit data was constructed and how it evolved over time.<\/li>\n<li><strong>Security and data masking:<\/strong> Apache Atlas is primarily a data governance tool that allows fine-grained security of metadata access to set up access control over entity instances, as well as add\/update\/remove classifications, and so on.<\/li>\n<\/ol>\n<h3>Best Open Source Data Catalog Tools &#8211; 2. Amundsen Lyft<\/h3>\n<p>As an open source data catalog platform originally built by Lyft&#8217;s engineering team, Amundsen opened source in October 2019, a year after launching for internal use.<\/p>\n<p>Amundsen has a cohesive community of contributors and users, and is widely adopted by other organizations built on top of this open source data catalog tool to advance their data democratization, governance, and metadata service initiatives.<\/p>\n<p><strong>Main Capabilities of Amundsen:<\/strong><\/p>\n<ol>\n<li><strong>Easy to find trusted data:<\/strong> Amundsen helps find data from a variety of sources with simple text searches, and the search results even display online metadata.<\/li>\n<li><strong>Automated and curated metadata:<\/strong> When a data asset is clicked, the user is shown its detailed description and behavior, which are manually curated and automatically generated, respectively.<\/li>\n<li><strong>Ability to share context with colleagues:<\/strong> Descriptions of data assets can be updated, reducing the need for colleagues to go back and forth looking for more context for a particular data asset.<\/li>\n<li><strong>Learn and understand from data usage:<\/strong> Users can see which data assets are frequently used, owned, or bookmarked, and can even see the most common queries related to tables by looking at the dashboards built on a given table.<\/li>\n<\/ol>\n<h3>Best Open Source Data Catalog Tools &#8211; 3. LinkedIn DataHub<\/h3>\n<p>As an open source metadata management platform developed by LinkedIn&#8217;s engineering team, DataHub is really LinkedIn&#8217;s second attempt to address the challenges of data cataloging, discovery, observability and lineage.<\/p>\n<p>Prior to DataHub, they built an open source data directory tool called WhereHows back in 2016. DataHub was announced in 2019 and opened source in 2020. As a result, LinkedIn maintains two different versions of DataHub &#8212; one for internal use and one open source for others that can be built.<\/p>\n<p><strong>Main Capabilities of DataHub:<\/strong><\/p>\n<ol>\n<li><strong>Automated metadata ingestion:<\/strong> In LinkedIn DataHub, metadata is ingestion from different sources via API or Kafka stream push.<\/li>\n<li><strong>Easy data discovery:<\/strong> At the highest level for end users, the DataHub front end supports three types of interaction: search, browse, and view\/edit metadata.<\/li>\n<li><strong>Understand data through context:<\/strong> Each data entity on DataHub comes with a profile page that displays all metadata associated with that data entity, providing the user with the information needed to develop that data context.<\/li>\n<\/ol>\n<h3>Best Open Source Data Catalog Tools &#8211; 4. Netflix Metacat<\/h3>\n<p>As a joint metadata management service built by Netflix and opened source in June 2018, Metacat aims to simplify the sorting, discovery, processing, and management of data.<\/p>\n<p>Metacat primarily forms a single source of access to all of Netflix&#8217;s data assets. While Metacat is an open source data catalog, there appears to be a lack of significant public knowledge for others to effectively use its schema and extend it.<\/p>\n<p><strong>Main Capabilities of Metacat:<\/strong><\/p>\n<ol>\n<li><strong>Data abstraction and interoperability:<\/strong> Metacat forms a common abstraction layer, and datasets can be accessed across Netflix&#8217;s multiple query engines.<\/li>\n<li><strong>Business and USER-DEFINED metadata stores:<\/strong> Metacat helps document business and user-defined metadata about data assets, ensures that data users are provided with more information about data assets, and standard rules for how to deal with them.<\/li>\n<li><strong>Data discovery:<\/strong> Metacat provides schema metadata and business\/user defined metadata for data via ElasticSearch and this helps query through text search.<\/li>\n<li><strong>Data change audit and notification:<\/strong> Captures any metadata changes or updates, enabling push notifications for such events that might require the user&#8217;s attention.<\/li>\n<\/ol>\n<h3>What are the benefits of open source data catalog tools?<\/h3>\n<p>High-quality data catalogs not only allow you to properly categorize all your data, they also allow you to properly track data flows between different data types and even show you flaws in data flows that you can improve.<\/p>\n<p>Another nice feature is that sensitive data can also be managed, and the tool can identify where sensitive data is displayed the most, thereby reducing the risk of leakage. Some high-end data catalog tools even offer machine learning capabilities that can understand how you manage your data and help you process large amounts of data. But why use the open source data directory tool?<\/p>\n<p>Open source data catalog tools are still high-quality software, inexpensive, sometimes even free, but they scale well, offer a lot of customization options, and can work without any limitations, ideal for high data volumes.<\/p>\n<p>Beyond that, as a business or organization, you don&#8217;t have to worry about relying on one developer for updates, as you can hire developers to further develop open source software, or easily customize it to suit your needs.<\/p>\n<h3>Conclusion<\/h3>\n<p>Merci d&#039;avoir lu notre article et nous esp\u00e9rons qu&#039;il pourra vous aider \u00e0 trouver le <strong>best open source data catalog tools<\/strong> in 2022. If you want to find more information about open source data catalog tools, we would like to advise you to visit <a href=\"https:\/\/www.gudusoft.com\/fr\/\"><strong>Gudu SQLFlow<\/strong><\/a> pour plus d&#039;informations.<\/p>\n<p>En tant que l&#039;un des\u00a0<strong><a href=\"https:\/\/www.dpriver.com\/blog\/2022\/05\/11\/best-data-lineage-tools\/\" target=\"_blank\" rel=\"noopener noreferrer\">meilleurs outils de lignage de donn\u00e9es<\/a><\/strong>\u00a0Disponible sur le march\u00e9 aujourd&#039;hui, Gudu SQLFlow peut non seulement analyser les fichiers de script SQL, mais aussi obtenir\u00a0<a href=\"https:\/\/www.gudusoft.com\/fr\/quest-ce-que-la-lignee-des-donnees-pourquoi-est-elle-importante\/\"><strong>lign\u00e9e de donn\u00e9es<\/strong><\/a>, et effectuer un affichage visuel, mais permettent \u00e9galement aux utilisateurs de fournir une lign\u00e9e de donn\u00e9es au format CSV et d&#039;effectuer un affichage visuel.\u00a0<strong>(Publi\u00e9 par Ryan le 16 juillet 2022)<\/strong><\/p>\n<\/div><\/div><\/div><style type=\"text\/css\">.fusion-body .fusion-builder-column-0{width:100% !important;margin-top : 0px;margin-bottom : 0px;}.fusion-builder-column-0 > .fusion-column-wrapper {padding-top : 0px !important;padding-right : 0px !important;margin-right : 1.92%;padding-bottom : 0px !important;padding-left : 0px !important;margin-left : 1.92%;}@media only screen and (max-width:1024px) {.fusion-body .fusion-builder-column-0{width:100% !important;}.fusion-builder-column-0 > .fusion-column-wrapper {margin-right : 1.92%;margin-left : 1.92%;}}@media only screen and (max-width:640px) {.fusion-body .fusion-builder-column-0{width:100% !important;}.fusion-builder-column-0 > .fusion-column-wrapper {margin-right : 1.92%;margin-left : 1.92%;}}<\/style><\/div><style type=\"text\/css\">.fusion-body .fusion-flex-container.fusion-builder-row-1{ padding-top : 0px;margin-top : 0px;padding-right : 0px;padding-bottom : 0px;margin-bottom : 0px;padding-left : 0px;}<\/style><\/div>","protected":false},"excerpt":{"rendered":"","protected":false},"author":27,"featured_media":5141,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[178],"tags":[295,294,292,98,296,151,297,293],"_links":{"self":[{"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/posts\/5124"}],"collection":[{"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/users\/27"}],"replies":[{"embeddable":true,"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/comments?post=5124"}],"version-history":[{"count":18,"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/posts\/5124\/revisions"}],"predecessor-version":[{"id":5144,"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/posts\/5124\/revisions\/5144"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/media\/5141"}],"wp:attachment":[{"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/media?parent=5124"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/categories?post=5124"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.gudusoft.com\/fr\/wp-json\/wp\/v2\/tags?post=5124"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}