Tables

About

As David McCandless demonstrated in his collections of graphics, Information Is Beautiful (2012) and Knowledge Is Beautiful (2014), data visualization does not necessarily have to be complex to be significant. Perhaps one of the most “simple” visualizations, but one that contains complexity, is the table. Along with graphs, tables are one of two fundamental ways for presenting quantitative information (Few, 2012). Tables are now ubiquitous because they are so easy to create and navigate (Dobrin, Keller, & Weisser, 2010); because they are so common, audiences now expect them in many fields (McKenna, 2018). Although tables are often overlooked, they serve as an important function for data visualization. The table is adaptable—both amateurs and experts can use it: while a novice can use it as a basic visual, experts can create more complex tables (Sherman & Johnson, 1990). Because tables are closely related to graphs, researchers must often choose whether to present their information as a table or as a graph; while graphs are effective for presenting smaller data sets, tables provide detailed information of larger data sets in a concise manner (Duquia et al., 2014; Krawiec, 1995). And, importantly, tables represent an important first step in creating additional visuals for data analysis (e.g., graphs).

Tables have many benefits—they can contain a lot of numbers and can summarize, rank, and order those numbers. Because they contain numerical information, tables provide an exactness that can be missing with other data visualization tools; it is easy to look up individual values in tables, and tables are effective at displaying simple relationships between quantitative values and categorical items (Few, 2012). Despite the amount of information they can contain, tables are often easier than other visuals for interpretation and analysis (Houp et al., 2002, p. 285). Tables are also effective ways to include frequency distribution—both absolute and relative. Absolute frequency is the number of times that a value appears while relative frequency results from dividing the absolute frequency by the total number of data (“Absolute, Relative, Cumulative,” n.d.). Tables are also more forthright and simplistic than other forms of data visualization. For Andrew Gelman (2011), tables are a form of unadorned numbers that represent a summary of the data. And, because it is not a narrative, the numbers are the focus of the visual. William Schroeder and Kenneth Martin (2005) noted that the “table should accentuate important features while minimizing less important or extraneous details” (p. 7). Therefore, not all information needs to be contained in the table but only that which is the most important.

Tables are not without their issues. First and foremost, a table is a static visualization, as opposed to an interactive visualization (Sinclair, Ruecker, & Radzikowska, 2013). As a static visualization, users are limited to what they can see in the table to a single view. Tables are also a basic visualization that lack the aesthetic appeal of other visuals like word clouds, collocate graphs, and word trees. Additionally, using tables can sometimes put the responsibility of understanding the data on the reader when not enough explanation is provided (McKenna, 2018). And, although easy to use, they can be difficult to use effectively.

The “Document Terms” tool in Voyant provides a visual in a variety of optional data columns. This section utilizes three columns:

  • Term: the document term
  • Count: the raw frequency of the term in the document
  • Relative: the relative frequency (per 1 million words) of the term in the document

This tool also has additional functions, including the ability to visualize the following:

  • #: the position of the term’s document in the corpus
  • Trends: a sparkline graph that shows the distribution of the term within the segments of the document
  • Significance: a TF-IDF score, a common way of expressing how important a term is in a document relative to the rest of the corpus
  • Z-Score: a normalized value for the term’s raw frequency compared to other term frequencies in the same document

Examples

The following three tables provide examples of data visualization that include both the individual and combined data from Michigan State University, The Ohio State University, University of Michigan, and Texas A&M University.

Table 1, below, is generated from the combined institutional documents comprising 1,997,453 total words and 23,649 unique words.

Table 1. Reproduction of table from Voyant showing count and relative frequency of terms
Term Count Relative Frequency
worked 10,075 5,044
grammar 8,929 4,470
talked 8,590 4,300
read 6,915 3,462
discussed 6,827 3,418
wanted 6,708 3,358
ideas 6,635 3,322
make 6,237 3,122
session 6,112 3,060
thesis 5,557 2,782
sentences 5,551 2,779
work 5,536 2,772
help 5,024 2,515
sentence 4,801 2,404
sure 4,541 2,273
paragraph 4,427 2,216

This function uses the automatic stopwords generated by Voyant as well as using the additional stopwords paper, writing, client, and essay. Although it might be tempting to draw conclusions about the field of writing center studies from this data (because it is comprised of nearly two million words from four institutions), this visual functions instead as a form of what Colin Ware (2004) called hypothesis formation—a starting point. Specifically, it gives us a glimpse of what consultants do during consultations: talked, read, discussed, make, and help. “Talked” was the third most common term in the aggregated institutional data, highlighting the oral nature of sessions. Meanwhile, the common use of the word “discussed” demonstrates the dialogic and participatory nature of sessions by both the consultant and the client.

Table 2. Key terms: Comparison between the combined list of terms and the terms from Michigan State University's corpus
Combined MSU
Worked Worked
Grammar Grammar
Talked Discussed
Read Talked
Disucssed Ideas
Wanted Structure
Ideas Read
Make Make
Session Sentence(s)
Thesis Sure
Sentence(s) Clarity
Help Thesis

The top 12 terms (with stopwords exempted) for the combined data and MSU data was included (see Table 2). When viewed as a table, this data helps us see both similarities and differences between the combined data set and MSU’s data set; furthermore, it provides a sense of what might be valued by one institution when compared to the combined institutions. There are many similarities, including worked, grammar, talked, read, discussed, ideas, make, thesis, and sentence(s). There are also some key differences, including wanted, session, help, structure, sure, and clarity. Two words in particular stand out for the Michigan State University Writing Center that are not on the combined list: “sure” and “clarity.” Sure can be used to mark confidence (e.g., she is right for sure) or, more likely, as an idiom (i.e., to make sure) that indicates to be or become absolutely certain. Clarity, meanwhile, functions as a way to move toward a better understanding. Both represent language used to seek or determine a better understanding.

Table 3, below, provides a series of key terms and individual institutions by relative frequency.

Table 3. Comparison bewteen institutions, relative frequency per 1 million words of terms
Word MSU OSU UM A&M
Grammar 8,771 2,675 1,819 4,569
Talked 5,992 4,500 3,357 3,364
Discussed 6,327 3,746 1,491 2,126
Read 4,508 3,568 2,428 3,348
Thesis 3,396 2,757 3,369 1,611
Sentence(s) 7,550 3,658 4,548 4,977

Tables such as Table 3, drawn from each institution’s individual corpus, are useful for comparing information (Sherman &  Johnson, 1990). As noted earlier, tables are also effective in presenting precise numbers. Because of the numerical details included in it, this type of visualization prompts a series of questions. The first is why terms like grammar, discussed, and sentences are much higher for MSU than other institutions. Is it because there are more consultations represented in the terms at MSU than at other institutions? Or is there something else going on there? The second is the low relative frequency at University of Michigan for “grammar.” Is that because there are fewer consultations represented in the data? Or is it because sessions are focused on more global concerns? Likewise, MSU has a higher relative frequency for “discussed” than other institutions. Is this due to the number of consultations? Is it a function of tutor training at the institution? And is it telling or important? These are questions that other data visualizations and other methods of analysis can help to solve.

Implications

An advantage of using tables is that the best practices for them are easy to learn, understand, and put to use (Few, 2012). But, to mitigate any shortcomings with tables, there are a few things to remember when using tables for data visualization:

  • Use title and headings to accurately describe the table’s contents.
  • Use consistent formatting.
  • Avoid arbitrary ordering along the first column (Dunn, 2010).
  • Avoid excessive grid lines (Dunn, 2010).
  • Don’t make the reader do all the work, but describe what the data reveals about the topic (McKenna, 2018).
  • Reference the table in text.

The table functions as a good starting point for data visualization; it can be used by a broad range of researchers from novices to experts, and most people are familiar with them. Tables can also serve as the end point for data analysis as well as prompt questions and promote inquiry about data, serving as hypothesis formation. And tables are a precursor to more complex visualizations like Cirrus, Collocates, and Word Trees, included in this site.