11 Dark Secrets of Data Management
Some call data the “new oil,” while others call it the “new gold.” Leaving aside the validity of these metaphors, there’s no question that organizing and analyzing data is a vital job for any business looking to deliver on the promise of data-driven decision-making. To this end, a solid data management strategy is key. This includes data governance, data operations, data warehousing, data engineering, data analytics, data science, etc. Data management, when done right, can provide a competitive advantage to businesses in every industry. In this article, we’ll introduce 11 dark secrets of data management.
Dark Secrets of Data Management
Dark Secrets of Data Management – 1. Unstructured data is difficult to analyze
80%-90% of the data in enterprises is unstructured data. As digital transformation gradually enters the deep water area, the amount of unstructured data is growing rapidly. These data are scattered in the internal enterprise in the form of documents, pictures, audio and video, etc. Due to the reasons of departments, applications, architectures, multi-cloud environment and so on, they form unstructured data islands, which are difficult to share and use, and dig out the content value, seriously hindering the process of enterprise digital transformation.
Dark Secrets of Data Management – 2. Even structured data is often unstructured
Good scientists and database administrators guide databases by specifying the type and structure of each field. Sometimes, in the name of more structure, they restrict the value in a given field to an integer in a specific range or a predefined choice. Even so, people filling out database storage forms will find ways to increase the difficulty.
When they don’t think a question applies, it’s sometimes indicated by the field being empty; others are indicated by entering a dash or the initial “n.a.”. A good developer can spot some of these issues through validation. Good data scientists can also reduce this uncertainty by cleaning. But it’s maddening that even the most structured tables have suspicious entries that can introduce unknowns or even errors into the analysis.
Dark Secrets of Data Management – 3. Data schemas are either too strict or too loose
No matter how hard the data team tries to articulate schema constraints, the final schema used to define the values in various data fields is either too strict or too loose. If the data team added strict constraints, users would complain that their answers were not found in the limited list of acceptable values. If the data schema is too permissive, users can add odd values with little consistency.
Dark Secrets of Data Management – 4. Data laws are very strict
The laws on privacy and data protection are strong and will only get stronger. With more than a dozen regulations such as GDPR, HIPPA, and more, collecting data can be very difficult and even more dangerous if hacked. In many cases, getting a lawyer will cost far more than hiring a programmer or data scientist. These headaches are why some companies dispose of data as soon as it is processed.
Dark Secrets of Data Management – 5. The cost of data cleaning is huge
Data cleansing is the process of correcting and removing inaccurate data records from a database or data table. Broadly speaking, data cleansing includes identifying and replacing incomplete, inaccurate, irrelevant or problematic data and records.
Many data scientists admit that 90% of their work is simply collecting data, presenting it in a consistent form, and dealing with endless loopholes or errors. The people who have the data will always say, “Everything is in CSV (Comma Separated Values, a common, relatively simple file format) ready to use.” But they don’t mention blank fields or error descriptions. Cleaning data for data science projects can take up to 10 times as much time as launching routines in R or Python to actually perform statistical analysis.
Dark Secrets of Data Management – 6. Users are increasingly suspicious of your data practices
End-users and customers are increasingly suspicious of companies’ data management practices, and AI algorithms and their use have only heightened fears and made more and more people deeply uneasy about the very act of capturing their data. These concerns are driving the regulatory process and often send companies into public relations crises. Not only that, but people deliberately interfere with data collection with fake values or wrong answers. Sometimes half the job is dealing with malicious partners and customers.
Dark Secrets of Data Management – 7. Integrating external data can pay off, but it can also spell disaster
It’s one thing for companies to own the data they collect, but it’s another thing for them to want to integrate their own local information with third-party data and the vast amount of personalized information that exists on the Internet. Some tools openly promise to collect data on each customer in order to build a personalized profile with each purchase. That’s right, they’re using the same words as spy agencies that track terrorists to track your fast food purchases and credit score. No wonder people are worried and panicked!
Dark Secrets of Data Management – 8. Regulators are cracking down on data use
No one knows when savvy data analysis will cross the line, but when it does, regulators will come in. In a recent case in Canada, a government investigation found that some doughnut shops tracked customers who also shopped at competitors.
According to a newly issued press release, “The investigation found that Tim Hortons’ contract with a third-party location service provider in the United States contained language so vague and permissive that it allowed the company to sell ‘de-identification’ for its own purposes. location data.” For what? Selling more doughnuts? Anyway, it turns out that regulators are paying more and more attention to anything involving personal information.
Dark Secrets of Data Management – 9. Your data plan may not be worth it
We imagine that a great algorithm can make everything more efficient and profitable. Sometimes such an algorithm is actually possible, but the price may also be too high. For example, consumers (and even companies) are increasingly questioning the value of targeted marketing from well-designed data management schemes. Some people point out that we often see ads for things we’ve purchased because ad trackers haven’t figured out that we don’t need it anymore.
The same fate often befalls other plans. Sometimes rigorous data analysis identifies the worst-performing factories, but that doesn’t matter because the company signed a 30-year lease on the building. Companies need to prepare for the possibility that all data science geniuses may produce an unacceptable answer.
Dark Secrets of Data Management – 10. In the end, data decisions are often just subjective judgments
Numbers can provide enough precision, but how humans interpret them is often what matters. After all the data analysis and AI manipulation, most algorithms need to decide whether a value is above or below a threshold. Sometimes scientists want p-values below 0.05; other times, police issue tickets for cars that are 20 percent faster. These thresholds are usually just arbitrary values. For all the science and math that can be applied to data, there are more grey areas in many “data-driven” processes than we think, and while companies may be investing all their resources into their data management practices, decisions are more dependent on Intuition and subjective judgment.
Dark Secrets of Data Management – 11. Data storage costs are exploding
Disk drives are getting bigger and prices per terabyte are falling, but programmers are collecting data significantly faster than prices are falling. Devices from the Internet of Things (IoT) are constantly uploading data, and users expect to be able to browse the rich collection of these bytes forever. At the same time, compliance officials and regulators continue to demand more and more data in case of future audits. It would be one thing if anyone actually looked at some of this data, but we only have so much time in a day. The percentage of data that is actually revisited is getting lower and lower. However, the price of storage expansion packs has been increasing.
Thank you for reading our article and if you’ve enjoyed it, we would be very happy. If you want to learn more about data management, we would like to advise you to visit Gudu SQLFlow for more information.
As one of the best data lineage tools available on the market today, Gudu SQLFlow can not only analyze SQL script files, obtain data lineage, and perform visual display, but also allow users to provide data lineage in CSV format and perform visual display. (Published by Ryan on Aug 27, 2022)