Subscribe to DMU

Search DMU Library


Differences Between Text and Big Data

Text and Big Data are not synonymous, regardless of attempts to force them to be considered as similar.

 One of the questions that vendors love to ask (and to confuse people with) is combining / mixing Big Data and text.  The confusion becomes apparent when the question “Isn’t text the same thing as Big Data?” arises.

The short answer is that text can be placed inside Big Data, but Big Data and text are essentially different things.

Figure 1 shows the starting point for the confusion.

Figure 1: Text as a Subset of Big Data

In the diagram shown in Figure 1 it is seen that text may be considered to be a subset of Big Data.  Indeed, text can be found inside f Big Data.  But, Big Data contains far more data than just text. There is structured data inside Big Data. There is machine generated data found inside of Big Data. There is log tape data found inside of Big Data. There is archival data found inside of Big Data, etc.  So it may be true that text can be found inside Big Data. But under no circumstances are Big Data and text synonymous.

Now it is true that both Big Data and text support analytical processing. However, the technology and the techniques used to do analytical processing for both types of data are entirely different.  Figure 2 shows the technology that is required to perform analytical processing for each of the types of data.

Figure 2: Text and Big Data Technologies

Textual disambiguation is as different from Big Data management tools as horses are different from roller skates.  There simply is no basis for comparison between the two.

Textual disambiguation is designed to address the nuances and vagaries of text and how to restructure text into a form suitable for analysis. Big Data management tools are designed to address the storage and access of large amounts of data. The problems that each of these technologies address are entirely different from each other.

So, the next time a vendor tries to confuse you by saying that text and Big Data are the same thing, find a more enlightened vendor.

Share on linkedin
Share on facebook
Share on twitter

Bill Inmon

Bill Inmon is best-known as the “Father of Data Warehousing” and textual data integration. He has become the most prolific and well-known author worldwide in the data warehousing and business intelligence arena, and has opened the field of textual data integration. In addition to authoring more than 50 books and 650 articles, Bill lectures on data warehousing, textual data integration and related topics. Bill consults with a large number of Fortune 1000 clients, and supports IT executives on data warehousing, business intelligence, and database management issues around the world.

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.