The blog and me


This blog will be erratic and seldom follow themes. It will make no claims to being structured or logical. It will, I hope, be fun and occasionally insightful. I do still publish more coherent work (though in economics, and in very strange places) but that may take some believing after reading these pages. I've a PhD in economics/economic history from Cambridge, I've taught in several universities (and still do, when I get the chance) but now focus energy and attention on commercialization for a large London university, and dealing with the daily commute.

Friday 9 January 2015

A very, very few words on 'sensemaking' and historical text-based data

Information scientists, and in particular those with an interest in visualisation, have generated a number of different approaches to recovering or discovering meaning in pooled experience data (including text data). Literally 'making sense of' a variety of stimuli (sounds, images, text, interactions, feelings, impressions, memories) is something we all do within our everyday lives. What computer and information scientists within two distinct but complementary traditions (human-computer interaction [HCI] on the one hand and information science on the other) have tried to do is to create model schemes for the effective construction of meaning from these data via software tools. Some of these tools create data taxonomies and representations of meaning, some create only the latter. All are trying, though, to automate the process of making sense of data - and in a world of significant 'big data' sources, this is a vitally important tool for users as diverse as security agencies, criminal investigators, librarians and, potentially, for historians too.

If 'sensemaking' sounds like an attempt to supplant the interpretative power of historical method, historians should rest assured it isn't. Sensemaking, when deployed effectively, is a means of revealing patterns, relations and clusters of meaning in data; it does not project onto those data anything other that the assumptions necessary to 'make sense'. It does not, for example, deduce significance. Moreover in the HCI tradition of sensemaking in particular [summarised in 1], it is essential that the 'point of view' which gives rise to the clustering is informed by expertise - and here the historian can find scope for working with the software engineer to develop tools, not merely deploy the ones developed. Sensemaking software development in this sense can be creative and involve, with historial data, considerable forensic and situational historical knowledge; one useful summary by one of the leading exponents of this approach [2] shows how software developers can, and indeed must, work with and transfer control to, domain specalists to make 'sensemaking' work .

One example of such a 'sensemaking' tool in the public domain is Invisque [3], a simple tool for visualising the 'sense made' of large collections of text data, resulting from a project funded by Jisc. As a video of the software in use shows [4], it is possible to use the HCI approach to sensemaking to 'make sense' of loosely categorised text data, or plain text sources. In an example given by Wong et al [3], a PhD student could use the sensemaking approach and the visualisation tools to shape a detailed literature search. Equally, however, the same student could make use of the technology to make sense of analogous corpuses of data - such as texts from the BL Incunabula Short Title Catalogue [5]. Other sensemaking tools, such as Stasko et al's Jigsaw [6] allows sensemaking among large corpuses of text documents where the finding and mapping of connections between data can help identify patterns in text records. Text, let it be remembered, is not the only material for sensemaking tools like Invisque. In my own university for example, where the software tool originated, we have been discussing potential applications within the extensive artefact and image collections of the university's Museum of Domestic Architecture (MODA) [7]. Yet greater potential exists in the ability of sensemaking tools to seek coherence among a disparate collection of sources (for example lyrics in Victorian popular songs, Victorian newspaper poetry, manuscript poetry in private archives).

The Invisque tool
There are methodological issues that will be of concern to the historian at the heart of this approach, of course, which will already have occurred to the patient scholar. In J.H. Hexter's terms the gestalt implied by sensemaking is very much more 'lumper' than 'splitter' in character [8, p. 242] - sensemaking allows one to frame connections in order to make sense of a whole corpus. 'Lumping' has consequences (ignoring 'blindspots' in the record, eliding meaning etc.). That, however, is more than made up for by the explanatory potential of (in particular, visualised) sensemaking to show in sharp relief just how clustered meaning might be framed even where the complexity of a text suggests layered or textured, or even hidden, meanings. Secondly it seeks meaning in data by inferring potential clustered association, but does not impute - only suggests - causation within it. If, for example, one were to frame a sensemaking analysis of single author texts such the The Entring Book of Roger Morrice (a very possible project given the scale of the text corpus and the sophistication of layered meanings within it), or multiple author documents such as selected texts from the Patrologia Latina, the historian would be left to hypothesize, weigh and assess causal theories behind the clustered meanings. Moreover one must be careful to recognise the generality of the claim made by sensemaking: it seeks to make 'senses' of data, not one - for example 'historians'-type - sense. Sensemaking techiques could be used to interrogate the texts of the Patrologia Latina for meaning in schemes not related to historical questions but to questions of an etymological kind.

What sensemaking may offer historians is a vehicle for investigating - 'trying out', if you will - meaningful clustering schemes in textual data. It will certainly do more, for digital text files, than might be accomplished by text analysis tools alone, which seek patterns in word representations, not intuitive 'sense'.

[1] Pirolli, Peter, and Daniel M. Russell. "Introduction to this special issue on sensemaking," Human–Computer Interaction 26.1-2, 2011, pp. 1-8.

[2] Youn-ah Kang and Stasko, J., "Examining the Use of a Visual Analytics System for Sensemaking Tasks: Case Studies with Domain Experts," Visualization and Computer Graphics, IEEE Transactions on , vol.18, no.12, Dec. 2012, pp. 2869-78.

[3] Wong, BL William, et al. "Invisque: technology and methodologies for interactive information visualization and analytics in large library collections," Research and Advanced Technology for Digital Libraries. Springer Berlin Heidelberg, 2011, pp. 227-235.

[4] https://www.youtube.com/watch?v=FDmswS6cceg (accessed 8th January 2015)

[5] http://www.bl.uk/catalogues/istc/index.html (accessed 8th January 2015)

[6] Stasko, John, Carsten Görg, and Zhicheng Liu. "Jigsaw: supporting investigative analysis through interactive visualization," Information Visualization 7.2, 2008, pp. 118-132

[7] http://www.moda.mdx.ac.uk/home (accessed 8th January 2015)

[8] Hexter, J.H., On Historians: Reappraisals of the Masters of Modern History. Harvard University Press 1979.

1 comment:

  1. http://history-lab.org/ is the beginning of the quest to put big data analytics into historical study

    ReplyDelete