. Variations on Textual Data - EWSOLUTIONS

Teaching Data Management Since 1998

Data Management University

Request a free consultation with a DMU Expert

Search DMU Library


2 - 3 Minute Data Management Videos

Variations on Textual Data

09 January, 2018 | William H. Inmon | Textual Data Analysis

Textual data comes in a variety of formats, which information technology (IT) specialists and end-users must be aware of and address when developing and using textual solutions.

When you ask a person about processing text, the usual reaction is to simply say – “well you just read text…”  Most people do not give a second thought to reading text.  They think that text is just text and that is all there is to it.

But that is like saying that a person is just a person. When you get down to it there are many different types and variations of types of people. There are old people and there are young people. There are men and there are women. There are short people and there are tall people. There ae slender people and there are husky people. There are educated people and there are less than educated people. There are people from the city and there are people from the countryside. So it is not enough simply to say that people are people. In order to make sense of what you are saying you need to specify what kind of person you are talking about.

The same is true of text. There are as many varieties of text as there are varieties of people. And, when you get ready to do text processing you need to be prepared to handle all kinds of text.

So what exactly are the different kinds of text that there are? What are some of the variations? As described in the book, “Turning Text into Gold”, some of the different kinds of text include:

  • Formal text, where words and thoughts are expressed according to a well-defined grammar. Formal text is what your teacher gives you an A for in school.
  • Informal text, where words include slang and street expressions.  In this case, the text does not appear in any particular fashion.  Informal text may be what a street gangster says to his gang member.
  • Unpredictable text, where there is no prescribed order from one word to the next.  A classic example of unpredictable text is email.  A person can write anything they want in an email, in any order they desire.
  • Predictable text, where there is a predictable, prescribed order to the text.  A classic example of predictable text is legal “boilerplate” where the same contract (or very similar contract) appears repeatedly.  On the other hand, there might be laboratory results, where a hospital describes the results of tests made on a patient.

Examples of variety of textual data could include:

And this is just the tip of the iceberg. There are LOTS of other variations of text.

So when you get ready to read text and process text, you need to be prepared to handle ALL these variations of text. If you are going to be serious about reading text and processing it, you need to be ready to handle ALL variations of text, not just some text.

It is debatable – are there more variations in people or are there more variations in text? That is an imponderable that probably no one knows and for which there is no answer.

Text is not text is not text.




View Comments

Request a free consultation with a DMU Expert


View all podcasts

View Our Podcasts

DMU provides regular podcasts of our best webinars, expert speaking events and our 3 minute Data Management Moment teaching videos.

View Our Upcoming Webinars

Click here to signup for our upcoming expert webinars and to view our previous webinars

WordPress Image Lightbox