Shared publicly  - 
 
"Data Visualization is a Halfway House"

I've said this several times over the years.  Mike Driscoll just wrote a blog post reflecting on this thought.  

Mike captured the intent of my quote quite well, but I want to add a bit more context.

Right now, we have data visualization apps that are expected to serve consumers.  We look at crime maps, at various news infographics, at interfaces that help us to understand data.  But if you think about the arc of data services like maps, and follow the path from the paper map through interactive services like Google Maps giving you maps and directions on your phone, through turn by turn navigation all the way to the Google self-driving car, you can see how eventually the information is submerged into the service.  The car just gets you where you want to go.  The visualization is a kind of checkpoint perhaps, but the human decision maker is not in the actual processing loop.

Applying this idea to business-oriented visualization, you can see how, as Mike points out, machine learning algorithms eventually are trained to do the right thing.  At this point, visualization is a bit like debugging - a way for humans to get insight into the operation of the algorithm.  It's most useful during development, and less useful once the algorithm has been perfected.

There's an interesting gray area.  This model assumes that an area is adequately captured by an algorithm.  There will be many areas where visualization interfaces enable exception handling.  But more and more, expect services that used to require you to look at a screen and make a decision to just make that decision for you.

(In that regard, see my post on Square and Uber:  Software Above the Level of a Single Device.  http://radar.oreilly.com/2012/11/square-wallet-the-apple-store-and-uber-software-above-the-level-of-a-single-device.html )

There's a great deal of interesting work yet to be done on the proper role of visualization in man-machine interfaces; but one thing is clear, we need fresh thinking about just where the algorithm ends and the human interface begins. 
122
63
Prashastha P's profile photoJonathan Sidhu's profile photoPu Cho's profile photoIfung Lu's profile photo
18 comments
 
A thought-provoking piece. It's worth reminding ourselves that visualization is one way to communicate about data, and that the reason we communicate about data is to change something -- be it a belief or a behavior or a system. 
j. gill
 
You're right.   As a consumer of processed data I expect to be able to "see" what data was processed and to drill down into exceptions.   This is already in play with my gas and electric service.   At home I look at this with my bandwidth usage.   Anywhere else I'll want the same capability.  +Michael Driscoll tries to make this point with the scale example, but would he be looking for a 20-lb. spike over three days (maybe at the end of November?)
 
Cognitional Theorist Perspective
Let me begin by stating that I have minimal background in programming and a bit more in math, what my expertise is in is cognitional theory.  From that point of view, there is a meaningful difference between understanding something and judging whether that something as understood is true or false, real or not. 

Concerning the topic at hand, I would think that the cutting edge occurs in the operations we perform once we have achieved some level of understanding [which again I would think might in fact be the limit of algorithms].  In common lanague, judging is weighing up the evidence and judging, but that is more metaphor than a technical account of what we do when we judge and a technical language would only have meaning for those who have appropriated their own interiority, that is know the operations that move us from sense experience, through questioning, to understanding, expressing in words what we understand, wonder if what we have understood is true, and determining whether we have met the demands of our rational consciousness.

I certainly would be interested in any feedback from those who have a grasp of this topic in the concrete.
 
While visualization may indeed be an interm tool, it will continue to provide value by allowing our human visual cortex to understand non human binary algorithms and patterns.  Say the machine provides us all services that do the "right thing", who and what will evaluate them to understand if they are actually better for the larger ecosystem?

Humans will still need an interface to these vast data oceans in order to build insight and keep us in check. Because we can, does not mean we should. We learned this over several generations as shown by past human mistakes when trusting technology as a unflappable beacon. The right thing now can easily be the wrong thing tomorrow.

Examples of Square or Uber are only about making transactions easier or more convenient. This selling of magic can be a slippery slope. This is not about creating a better human ecosystem. Convenience can often inversely effect conservation.

Self navigating cars are an application of advanced cartography, not the end of maps as useful tools. The massive amount of data generated from self navigating cars will need to quantified and qualified and that is where data viz as a methodology will be part of an ongoing loop and not a halfway house.

The ark from paper cartography to a driving car that you laid out is easy to follow, yet does not take in to account that each technology creates its own massive data sets that will need to be parsed and interpreted. I don't see a point where we will be able to predictively define every type of data set generated by our own invention. Such inventions are also not flawless, when one is lead astray by the latency of a database update to new road construction or a detour, one must consult some form of map. It now returns to become the means for the human to control the loop, and not a checkpoint alone.

Instead of data visualization being thought of a halfway house, where one goes to prevent relapse we need to think of it as safe house, where one goes to find protection and sanctuary from the machine driven world and to revaluate what is the "right thing".
 
I like some of the points made above...visualizations are only as good as the underlying logic, and for most visualization tools that means logic implemented using SQL queries (primarily) with some hooks into other sources. SQL is ok for asking many questions but not great for statistical analysis, predictive modeling, or optimization. In other words, don't just show me what has already happened in an intuitive way...show me what may happen in the future and what the best choice is, right now, based upon all the data ("big data"...LOL) at my disposal.
 
There is a "feeling" that somewhere, something is not right with skipping the data visualization..
But is it, really, not right? At "ground level" it feels this should be the way things work, this is what our technology should do: provide solutions, not problems.
What, then, makes us uncomfortable?
And why, the more "technically literate" we are, the more uncomfortable we get?
Maybe knowing in detail how many corners are being cut and how much quality always gets relegated behind time-to-market help us feel that way?!?
I am your poster case of "jack-of-all-trades-and-master-of-none" and, definitely, not a "data architect/scientist". But I would like to ask (maybe I missed it): has any significant progress been made in automated data consistency/coherence checking, lately?
We always had integrity checks for whatever data storage solutions (Flat Files, CODASYL databases, RDBMSs, NoSQL databases, Network databases... etc), which made sure the data storing systems/algorithms worked the way they should, in a verifiable way.
Maybe I'm completely ignorant, but how do we make sure there is no "garbage" (as in GiGo), especially no subtle or intentional garbage, in those databases ?

From this, the implications and possible discussions are endless... And, for some of us, the subliminal dangers are emphasized by the awareness that the vast majority of technology consumers are not "technically literate" (nor should they be) and data visualization will fade away by simple lack of demand.
(just my 0.1e-3 cents)
 
Generally right but there's a flaw in Driscoll's thinking here

> But data visualizations still require human analysts to react and kick off another action, if they are to be useful.

Not totally correct i.e. depends on problem and how actionable the info is.
 
Use Tools.  Build Tools.  Do Not Be Tools.  THAT is my impression from what you shared.  Thank you.
 
Data visualisation is an API for devs whose apps require high bandwidth connections to meat.
 
Also important: Pretty much everything (at least everything even remotely non-trivial) involves a whole hell of a lot more exception handling that we are apt to notice - one might in fact argue that that's what makes things non-trivial... just something to remember.
 
Dashboards and visualizations are blunt tools.  Generally they've been one-size-fits-all, but when it comes to data analysis there is a big gap between the "data experts" (those that can understand visualizations) and the "data novices" (those that can't understand visualizations). 

Dashboards are a tool to help someone "do" analysis, but they are not good for communicating the results of the analysis. In many cases you can automate the Data Analyst job completely and let software "do" analysis and communicate the results at the same time.

I recently wrote about how we need to move away from Dashboards as the primary tool for communicating analysis and move to automated analysis:
http://blog.automatedinsights.com/post/39923823985/dashboards-arent-the-answer

The ideal scenario is you provide the right tool for the job. When it comes to Data Analysts/Scientists, they may always want to navigate the data on their own. Dashboards are fine for them. But for the vast majority of users, providing insights in plain English is the better option. Now technology exists to do just that.
 
the big problem with automation remains that in automating this kind of decision making process, you're codifying and even canonizing the supporting processes -- turning them into culture, as it were, and worse yet, into submerged, invisible culture. 

I take the view that this kind of progression is inevitable. So what I'd like to see is better, more process-fundamental ways to continually expose the value judgements that went into the automation process. 
DrWex
 
+Tim O'Reilly I'd be interested in your thoughts on the notion that the current generation of map/GPS applications is warping our historical sense of what a map is. In particular, prior generations saw maps as external visualizations whereas the current generation of maps always places us at the center of the visualization 'universe'.

My personal belief is that this changes things, but I don't have a good handle on HOW it changes things.
 
+Eric Scoles This reminds me a good deal about human behavior, the challenge of evaluating why we do specific things that seem like they are not decisions, our semiautomatic behaviors, the unconscious as it were. While the human brain is more complex than most data-processing systems, the challenge is the same. The distance between the 'intelligence' and the underlying process means that simplification is necessary to expose the mechanics from the decisions, and since some of the business decisions are implicit in the underlying architecture, how do you create an explicit process that is open and expressible to humans without making choices that themselves are implicit (for example who defines what is the difference between value oriented choices and architectural decisions? Is there one?) Even narrative and documentation are flawed.

Since I spend most of my time working to elicit underlying decision making from legacy systems, even when I have people who wrote the code, and made the requests, they may not remember or agree on why certain things are done in a certain way. Over time systems are modified.
I can imagine a system in which every component that changes or creates data is well defined and modular and allows for it to be evaluated and compared to other methods/models/decisions, however this system in practice gets modified when performance is at risk. We still are not in the ideal time when computers can efficiently translate logical design into code in an optimal way. When it comes to data analysis, it is very challenging to collect and store data based not merely on what is needed now, but on what will be needed in the future. The effort to collect data based on unknown decisions is inefficient. The effort to request decisions based on unknown data often requires a specific talent or art. I have seen systems in which 90% of the data does not get used because 20 years before someone thought they knew what would be important.
Do you have any resources on the topic of how to express value judgement effectively in the process of a system rather than documentation? I would be interested in seeing what solutions look like.
Add a comment...