Text and Big Data are not synonymous, regardless of attempts to force them to be considered as similar.
One of the questions that vendors love to ask (and to confuse people with) is combining / mixing Big Data and text. The confusion becomes apparent when the question “Isn’t text the same thing as Big Data?” arises.
The short answer is that text can be placed inside Big Data, but Big Data and text are essentially different things. Both should be managed by an enterprise data management program, but there are differences to how to do that.
Figure 1 shows the starting point for the confusion.
Figure 1: Text as a Subset of Big Data
In the diagram shown in Figure 1 it is seen that text may be considered to be a subset of Big Data. Indeed, text can be found inside f Big Data. But, Big Data contains far more data than just text. There is structured data inside Big Data. There is machine generated data found inside of Big Data. There is log tape data found inside of Big Data. There is archival data found inside of Big Data, etc. So it may be true that text can be found inside Big Data. But under no circumstances are Big Data and text synonymous.
Now it is true that both Big Data and text support analytical processing. However, the technology and the techniques used to do analytical processing for both types of data are entirely different. Figure 2 shows the technology that is required to perform analytical processing for each of the types of data.
Figure 2: Text and Big Data Technologies
Textual disambiguation is as different from Big Data management tools as horses are different from roller skates. There simply is no basis for comparison between the two.
Textual disambiguation is designed to address the nuances and vagaries of text and how to restructure text into a form suitable for analysis. Big Data management tools are designed to address the storage and access of large amounts of data. The problems that each of these technologies address are entirely different from each other.
So, the next time a vendor tries to confuse you by saying that text and Big Data are the same thing, find a more enlightened vendor.