Graphically Displaying Text
提出一种将文本行按位置、缩进和长度映射为彩色行的可视化技术,用颜色表示每行关联的统计量,可同时显示数万行文本,适用于文学语料库、源代码历史分析等场景。
There are many examples of text databases including literary corpora and computer source code where statistics are associated with each line.A visualization technique for this class of data represents the text lines as thin colored rows within columns.The position, length, and indentation of each row corresponds to that of the text.The color of each row is determined by a statistic associated with each line.The display looks like a miniature picture of the text with the color showing the spatial distribution of the statistic within the text.Using this technique, SeeSoft™, a dynamic graphics software tool, can easily display 50,000 lines of text simultaneously on a high-resolution monitor.of data consists of files containing lines of text that have values associated with each line.The ''lines'' may be literally the lines of text or logical entities such as sentences, verses, or records.For a work of literature, the statistics of interest may include word usages or the locations of items in an index.In analyzing a paper jointly written by several authors, it might be helpful to know which author was responsible for a given sentence and the revision number in which it first appeared.The statistics associated with text may be continuous, categorical, or binary.For a line in a computer program, when it was written is a continuous statistic, who wrote it is a categorical statistic, and whether or not the line executed during a regression test is a binary statistic.The motivation for this technique comes from studying the change history of the computer source code from AT&T's 5ESS® switch.This system contains several million lines of code, written by thousands of programmers, over the past decade.The complete history of every change has been captured in a version management database (Rochkind, 1975 andTichy, 1985).For each modification, the database contains the modified code lines, reason for the change, the pertinent release, responsible developer, type of change (bug fix or new feature), etc.To keep a large software system functioning, the code must be maintained.This involves reorganizing the code, fixing bugs, and adding new functionality.Analysts involved with code restructuring studies and code archeology must determine when the files have become too complex and should be rewritten.Programmers working on the code would like to know whether the code has been stable, since instability and frequent changes indicate the need for caution.Much of the software maintenance effort involves code discovery, where programmers rediscover how the code actually works.A visual method showing the code history can be a useful tool for helping programmers understand code.A graphical technique for displaying text represents each file as a vertical column and each line as a colorcoded row within the column (see Figure 1).The row indentation and length tracks the corresponding text and the row color is tied to a statistic.If the row tracking is literal as with computer source code, the display looks as if the text had been printed in color and then photo-reduced to fit on a single figure.The spatial pattern of color shows the distribution of the statistic within the text.SeeSoft is a system implementing this graphical technique and applying dynamic graphics (Becker and Cleveland 1987), particularly brushing and linking, and high-interaction (Shneiderman 1983) to increase its effectiveness.Note that color is critical for this method and the intent is that this paper be read with reference to the accompanying color figures.The following sections describe this graphical method and display manipulation techniques in more detail.For concreteness, Sections 3, 4, 5, and 6 apply it to visualizing the change history of source code, while Section 7 is general and Section 8 describes SeeSoft's implementation.