Wednesday, November 26, 2014

No search challenge today... but a question!


IT's just about Thanksgiving here, and the next few days are going to be a bit crazy.  Thanksgiving is tomorrow, my birthday and my son's birthdays are in the next couple of days, people are visiting from out of town... it'll be great, but also very busy.  

I'm also hosting the Program Committee meeting for the Learning at Scale 2015 conference on Monday--so it's really going to be busy.  

But if you really want to do a challenge, here's an old one about Thanksgiving (On the origins of cranberry sauce?) from a couple of years ago.  Interestingly, the answer given back then no longer works (Google deprecated that particular timeline tool a while back).  So feel free to find a new path, and let us know!  

Nevertheless, I still have a question for you.  

Last week's Challenge was a little more difficult than the ordinary ones.  So I was wondering how many people would be interested in attending a Google Hangout so I could go over the answer with you.  I was thinking about just showing the whole process from finding the data, through cleaning it, through creating the final charts.  

Interested?  If so, please fill out the form here: CLICK ME!   Results next week.  If we can round up a quorum, we'll do a live Hangout and have a good time.  

Happy Thanksgiving, Happy Birthday, or Happy whatever-your-local-festival-is!  

Search on! 

Tuesday, November 25, 2014

How much snow was there? Part 2--Mapping it out with choropleths and heatmaps



I thought I'd try something a little more sophisticated this morning with the snowfall data.  

So I took all the data I got from the NOAA site for all weather reporting sites (in their list), extracted just the data for cities in New York, between Nov 2 and Nov 22.  I dumped this into a table like this: 



This is pretty much just copy/pasted from their CSV file into my text editor.  

A small, but important point to notice here.  I wanted the first column ("Location") to be a comma-separated Lat/Long pair.  The easiest way to convert from a CSV file (where Lat and Long were in different columns) into a TSV file was to import everything into a spreadsheet, then write a small converter function that would concatenate the Lat + "," + Long into a single cell.  Then, save everything as a TAB-separated file (TSV).  That makes column one into a handy lat/long pair (like the first one:  42.74, -73.81 is the lat/long for Albany NY).  I also wrote an sum column at the end (that is, =SUM(A2:A26) which would add up all the snowfalls for the entire period), so the last column is total snowfall.  

Once I have this file, I can create a map of the total snowfall. 

I did the obvious search:  

     [ heat map tool ] 

and found the OpenHeatMap site.  It's really very easy to use--just upload your data, and you can make this map (showing the location of each of the snowfall summary rows, distributed on the map by the lat/long in column 1).  


Figure 1.  A choropleth map of NY state cumulative snowfall, first 2 weeks of November, 2014. Data from NOAA.

Strictly speaking, this is NOT a heat map (they admit as much on their website).  A heat map is a visualization of a matrix where each point in the array is color-coded.  

This figure above is actually a choropleth map.  Say that 5 times fast to get it firmly lodged in your brain.  Here's the difference:  A heat map is a colorized regular grid.  A choropleth map is a colorized symbol (like the circles above) or a symbolic element (such as a state in a colored state map) where the size or color indicate the variable value.  

In Figure 1 above, the circles vary in both size and color--that makes it a choropleth map. When you click on one of the circles, you can see the value of that location.  Example: 


  

BUT...  One of the options in OpenHeatMap is to blur/fuzz the circles.  This operation makes it LOOK like a heat map, but it's really just a fuzzy choropleth.  You have to be careful not to overinterpret what you're seeing.  The snow doesn't magically smoothly fall off the way this chart suggests.  Remember that ALL the data we have is total snowfall for a particular point.  This kind of chart suggests much more than is really here.  


Figure 2.  The same map as in Figure 1, just blurred a bit and using a rainbow from blue-to-red coloring scheme. 
Naturally, I wanted to see how to do the same thing in FusionTables.  So I created a new fusion table, and imported the data into there. After playing around with the values for a bit, I created this view. 


Figure 3.  A Fusion Table view of the same data. 

There you go, a slightly more sophisticated version of the snowfall map.  

If anyone is up for a Challenge over the Thanksgiving holiday, try this:  Can you make an animated version of the snowfall chart?  That is, could you make something like this?  

If so, Search On!  




Monday, November 24, 2014

Answer: How snowy is it this week?

A


I've been enjoying watching as SearchResearchers tackle this Challenge.  It's a bit trickier than I thought, so let me walk you through my (relatively) simple solution.  

To begin with, I searched for a government organization that I thought would have the snowfall data for November.  My query was: 

     [ NOAA snow depth data ]   -- I added “daily” to make sure I found complete data.  

This led me to the NOAA.gov site for snowtracking.  

Which leads to:  



And from there to the data file, http://www1.ncdc.noaa.gov/pub/data/snowmonitoring/fema/11-2014-dlysnfl.txt  This is just a plain text file, so I opened this in my favorite boring text editor (I use TextWrangler because it has lots of nice features, such as grep, for data wrangling).  I pulled out the data for Buffalo, Rochetser and Oswego  for November 2014.   (Actually, I pulled out the data for Pulaski which is very near Oswego – the other rows for Oswego had too many missing values, indicated by -9999.00) 


For Canada data, I did the simple query: 

     [ Canada snow data daily ]

Which led to the official government site, http://climate.weather.gc.ca/index_e.html



On this interface to their data I selected: “daily” data values, for “November”  - then downloaded London International Airport data and the downtown Toronto data.  Here's what the London data looks like.  This is from the London International Airport data set.  



Again, I copy paste this into my text file and then converted it all fit into a CSV file.  

I uploaded that CSV into my spreadsheet, did a quick conversion of the US data (from inches into cm).  

 And I now have a spreadsheet that looks like this.  Each column is one day (from Nov 1 - Nov 22), and each cell is the amount of snowfall on that day (in cm).  


Now, I need to visualize it.  There are lots of things one could do here, but first I did the simplest thing I could imagine.  I computed a line graph for each of the cities over time.  It looks like this:  


While that shows what happened and when, but it's not all that great for showing spatial distribution.  So here's another simple version of the chart (remember that you can click on the chart to see it at full size).  


I didn't do anything fancy here--I just screenshot the map, then laid the charts for each location on top of them.  This is a visualization technique known as "small multiples" (that is, a small number of repeated charts all of the same type)--but one of the things to know is that they all have to show the same thing on the same axes, or all bets are off.  

Note that I DID have to set the max-grid values to all be 19. If left to their own devices, the Y axis will be different on each one.  I wanted them to be comparable, so I had to manually set them all to 19.  If I was producing hundreds of charts, I would have written a piece of code to this... but this data set was small enough that you could do it by hand.  

As I worked through this data collection and plot task, the biggest challenge for me is just keeping track of where my data came from, and how it gets transformed from one data source to another.  As I worked, I kept backtracking and checking my data.  (This is where having a buddy to work with is a great idea--looking at you Ann & Debbie.)  

I also really like your comments about wanting to create contour maps (or heat maps, although maybe we should call them cold maps).  

If I get the chance, I'll write that up tomorrow.  

Search/Sensemaking Lessons:  In the meantime, it's useful to see that sometimes the simplest approach is best.  (See my queries above.)  

The hard part is knowing which of the many data sources will work out.  

My approach is always to grab a small data set (3 or 4 cells) and then work through the whole process from beginning data download to visualization.  It's a mistake to try and do the entire data manipulation process on the full data set at the beginning.  Use a small sample and work UP to the full dataset.  Trust me, I've wasted many hours on wrangling data, only to find out, when I got to the end, that the whole thing was broken.  Better to find that out on a small subset of the data than the entire thing.  

Because sometimes you'll get halfway through an analysis and realize that everything you're doing is wrong.  Or that the data doesn't make sense.  Or it's too full of errors, or missing data points. 

More comments tomorrow... but this was my quick answer for today.  

We'll be talking more about this in the future! 

In the meantime, Keep Searching On! 




Friday, November 21, 2014

Working on the snowfall challenge over the weekend...



Folks... 

Yesterday Regular Reader Ramón made a great suggestion:  Let's not solve the problem today, but work on it over the weekend and give people some extra time to work on this.  

That's a great idea.  I'm traveling at the moment (hence, the question) and I could use the extra time as well!  

As I mentioned, the first part of this problem is to find a data set (or sets) that will hold the data we need.  

There are lots of ways to do this.  Here are a couple of options: 

1.  Use a commercial weather provider.  Wunderground often has data sets available from individual weather stations.  Look for the "weekly" data in a table, and download that.  (I already knew that Wunderground had such data, so I just did a search for [Wunderground] and went from there.) 

2.  Use a government agency.  The US National Weather Service has snowfall data that they publish.  Example data for Nov 13th.  Canada has a similar service with data for provincial weather stations as well.  To find these data sources, I searched for [ snowfall data US weather service ]  and [ snowfall data Canada provincial weather service ] 


Now you can just scoop up the data from these sites and fast forward to figuring out the visualization! 


Snow on! 


Wednesday, November 19, 2014

Search Challenge (11/19/14): How snowy is it this week?

SORRY about being late with the Challenge today.  



As you might have guessed, I'm traveling in the Northeast part of the United States, and let me tell you... it's COLD out there!  

But I'm not complaining, some places have been having a much more interesting time of it than I have.  Buffalo, NY, as you've probably seen on the news, is getting dumped on at a phenomenal rate. 

Of course, that makes me wonder...  And you know where this is going... 

This week's Search Challenge is another in our Data-Driven series.  Can you answer me this? 


1.  Can you make a map of showing how much snow has fallen this week in the Northeast of the US and the Southeast of Canada?  I'd love to see a map of roughly this region: 



.. showing (in whatever format you like) the total snow accumulation for the past week.  Let's pick the week of November 12 - 19, 2014.  At a minimum, we'll want to have snowfall data for Buffalo, Toronto, London (Ontario), Rochester, and Oswego.

To make this Challenge interesting, I'm not going to specify HOW you should show the data--I'll leave that to your design sensibility and inventiveness.  

When you send in your answer, be sure to include a link to your chart / graph / map.  I'll show my top three picks for best graphic on Friday.  (Or Saturday, if we get a bunch of them.)  

Your chart can be a simple histogram, or an interactive visualization with sliders, and everything.  It's up to you.  

Obviously, Step 1 will be finding the data source.  Once you've found that data, what you do with it is up to you.  

To keep the charts comparable, let's use metric measurements for the snow depth.   

I'm really curious to see what you'll come up with!  

Search on! 


Friday, November 14, 2014

Answer: Digging deeply



In this weeks's Challenge I ask you to look a little into the present and past of one such well-known company.  Imagine that you're a reporter and you need to fill in a few of the missing parts of your story.  Can you do this on deadline?  


1.  Is is possible for you to find a relatively recent organization chart for the Xerox company?  (Say, within the past year.)
2.  If so, where does Steve Hoover sit in this organization?  
3.  How many people directly report to him?   
4.  What boards does he sit on?  
5.  What was MY (Dan's) first job after getting my PhD?  
6.  Did Steve Hoover and I ever work at the same place at the same time? 

These questions aren't hard by themselves, but they might require a bit of looking in non-standard places for the answers.  


1.  Org chart?  This isn't hard, but there are lots of ratholes and deadends.  You'd think that the company would provide this in their year-end financial statements, but other that the top level execs, they actually typically don't.  

So to find this I backed up and search for: 

     [ org charts company ] 

which took me to a few sites, none of which were perfect.  The closet one I found was Cogmap.com, but their org chart for Xerox was missing a lot of content.  But they were close, so in this case, I changed my search to look for web sites that were similar to Cogmap. I did this with the related: operator. 

     [ related:cogmap.com ] 

The first hit in the related-sites list is TheOfficialBoard.com, which had a much more complete orgchart.   In particular, I searched for [ PARC ] on their website: 

 and found this.  


You can also find a much more extensive orgchart for Xerox (not ALL of it, but much larger pieces of the puzzle).  And if you want to spend money, you can get much more detail.  But with just these clues, I can tell you that searches for the VPs can get you pretty much the entire upper level.  



As you can see, Steve Hoover is the CEO of PARC, a wholly owned research arm for Xerox.  

From this chart we can see that at least 8 people report to him.  Note that since this isn't from the company itself, there well could be other people in the organization.  

Now, to get the details of his working career, I wanted to check out LinkedIn--a social network that's commonly used by people in Silicon Valley for professional associations. 

My query for that looked like this: 


I've noticed before that profile pages on LinkedIn have this www.linkedin.com/pub/dir/ structure.  That's a perfect clue for using the inurl: operator.  That looks for this string in URls, which in this case gives us his background page, which reveals that he joined PARC in 2011, but also worked at the Xerox Webster Research Center from 2006-2009.  (Remember this.)  

The background page also lists three bo
ards he's on:  Infotonics Technology Center, Rochster Museum & Science Center, and the Rochester Engineering Society.  

Now, can we do the same thing for me?  

The same inurl: trick will work for me, but an easier search might be: 



Why this particular query?  Because in academic circles, a resume is also known as a "curriculum vitae," commonly abbreviated as CV.  Google synonym expansion might have gotten it, but I put in the OR just to be sure.  

I added PARC and Google into the search query because I knew I'd worked both places, and this is a great way to reduce the clutter of spurious Dan Russells.  I have a common name, so any trick you can do to eliminate some of the "off-topic" Dan Russells is a good thing.  

Surprisingly, there are TWO CVs for Dan.  (I honestly had forgotten about one!)  I CMD+Clicked them both so I could see them side-by-side, and noticed that one was last updated in 2008, and the other in 2011.  

But it's clear from the more recent CV that I'd joined Google in 2005, and I've been there ever since (as you well know), so I couldn't have overlapped with Steve at PARC.  

On the other hand, I DID work at the Webster Research Center.  As it says in the older CV, 

 "Prior to PARC, Dr. Russell worked in the Xerox Webster Research Center gaining practical experience in printing systems and computer architecture."  

Which means I worked at Webster before moving to PARC in 1982.  So there's no overlap there either.  



Search Lessons:  

1.  related:  Knowing when and how to use related: can be a real power tool for a researcher.  I used it here when I wanted another site that did more-or-less the same thing (i.e., collected org-charts).  You can also use it to find additional sites that have very similar content (e.g., comic-book collections, etc.) 

2.  Sites for everything!  There actually ARE sites that collect org-charts.  Who knew?  Before doing this problem, I had no idea such things existed, even though I should realize that "this is the Internet... for every thing there is a group of impassioned collectors of those things..."  Org-charts are no different.  

3.  inurl:  If you want the internal structure of the URLs used on websites (e.g., www.linkedin.com/pub/dir/)  you can use that with an inurl: operator to zoom into parts of the site that you want to explore in particular.  This is an example from LinkedIn--other large sites (Amazon, Facebook, etc.) all have similar structures that you can extract and use to focus your search. 

4.  OR  While synonym expansion is great, you won't hurt anything by adding in exactly the synonyms you want.  cv OR resume is a good one to use.  

5.  Look at all the things.  In just a simple scan for my CVs, we found TWO of them from different eras.  One was obviously forgotten (not updated in 6 years??).  Finding things like this is often a gold-mine because you can look at them side-by-side and see what has changed.  In my case, a few things were dropped, mostly as a reflection that those parts of my life weren't as salient any longer to what I was trying to do.  Different people might have different stories, and having multiple versions to compare and contrast can be immensely useful. 

Search on! 


Wednesday, November 12, 2014

Search challenge (11/12/14): Digging deeply



Sometimes research projects require that you go a little more deeply.  This is especially true when you're searching out information about a company, its people, and their connections.  For obvious reasons, sometimes this information can be closely held. 

In this weeks's Challenge I ask you to look a little into the present and past of one such well-known company.  Imagine that you're a reporter and you need to fill in a few of the missing parts of your story.  Can you do this on deadline?  

1.  Is is possible for you to find a relatively recent organization chart for the Xerox company?  (Say, within the past year.)
2.  If so, where does Steve Hoover sit in this organization?  
3.  How many people directly report to him?   
4.  What boards does he sit on?  
5.  What was MY (Dan's) first job after getting my PhD?  
6.  Did Steve Hoover and I ever work at the same place at the same time? 

These questions aren't hard by themselves, but they might require a bit of looking in non-standard places for the answers.  

Be sure to tell us HOW you found these answers. I'm really interested in the places where you found the solutions.  (I'm pretty sure they're not all in the same place.) 

Search on! 

Friday, November 7, 2014

Answer: Plausible or not, and why...?

OUR CHALLENGE  this week was pretty straightforward, but perhaps not simple... 

Here they are:  Can you determine if these stories make sense or not? 


From: The Pirates Own Book, Or Authentic Narratives of the Lives, Exploits,
and Executions of the Most Celebrated Sea Robbers (1837)
1.  Pirate eyepatches:  In a lecture this week I heard the speaker say that "...the reason pirates are often depicted wearing an eyepatch over one eye is that they'd keep that eye dark-adapted in case they were going to jump aboard a ship they'd captured and needed to instantly be able to see in the depths of a dark ship interior.  Remember, in those days there was no on-ship interior lighting, so you needed to be able to flip up the eyepatch and have one dark adapted eye ready-to-see."   Really?  
2.  Rocker sunglasses:  Recently, the rock star Bono announced that he's been wearing sunglasses constantly for the past twenty years to help with his glaucoma.  I immediately wondered--does that make sense, is it plausible story?  How can sunglasses help with glaucoma? Is it really a treatment?  (Or is it a ruse to give him plausible reason to be cool.)  


Are these stories plausible?  If so, why? If not, why not?  



Pirate eyepatches:  I have to admit to being initially skeptical.  Sure, we know that pirates wore eyepatches.  But is this the reason why?  And... wait a second... how do we know pirates wore eyepatches? 

The fundamental skill of critical thinking is to ask the fundamental question:  How do you know?  What is the source of your knowledge?   

That's partly why I included the illustration from "The Pirates Own Book" of 1837 (above).  

First, note that when I asked the question about "pirates," I didn't specify pirates of a particular era.  Still, if you show a picture from the Golden Age of piracy (roughly the 100 years from the mid-17th century to the mid-18th century) I did a quick browse of illustrations of pirates from that era, poking through several books I found on Google Books.  (Such as Piracy: The Complete History by Angus Konstam, or Pirates of colonial Virginia, by Lloyd Haynes Williams.)  

In none of these books could I find a contemporary illustration of a pirate with an eye patch.  That strikes me as odd, but maybe I just missed something.    



As Ramón pointed out in his search, 

     [Pirate eye patches origin]

leads to a nice episode of Mythbusters (the television show that specializes in testing out various legends and myths) on commonly held beliefs about pirates.  In this episode, they report on their test (to become dark-adapted and try to do some tasks in the dark, then repeat the tasks with light-adapted eyes--no question, the dark adaptation was useful).  

I was inspired by this, so I repeated the experiment myself in the dark of the early morning.  

At 4AM I put on my eye patch (see below), and let my left eye become dark adapted for 30 minutes.  
The eye patch is an adapted eye mask that was piratically appropriated from
a commercial airline.  Arrgh!  
I then entered a small, hold-like room in my house (okay.. a bathroom with a very dim skylight, but no windows), switched off the light at 4:30AM and took off the eyepatch.  

I was amazed at how well I could see with my left eye.  I could count fingers held up at arm's length, and even read large printed text (1 cm high on white paper at arm's length).  

What really surprised me was that my right eye (the non-dark-adapated one) was completely blind.  It was as though I'd suddenly changed the eyepatch from one eye to the other.  It was SO striking that I reached up to touch my eyelid to be sure it was open.  Yes, it was... but it was also completely black--my right eye was terrifyingly blind.  

This brings up a good point that Marian raised in the comments:  When you wear an eyepatch, you've compromised your depth perception AND taken away all vision over half your visual field.  It's a HUGE disadvantage in a fight.  Likewise in the dark of the hold--if there's any threat there (including things you might walk into, like cannons or other crewmen defending their ship), you're literally missing half the picture.  

It took about 20 minutes for my right eye vision to accomodate to the changes in lighting.  

What really surprised me (but in retrospect should not have) was that my right eye vision returned in the periphery first, slowly progressing towards the center.  It's really odd and interesting to be able to see perfectly on the left (with my left eye), and all the way on the right (in the periphery of my right eye), but not in the middle of the right.  To get the effect, hold your right hand over your right eye with the edge of your hand touching your nose.  You can see on the left, and a bit on the far right... but nowhere else.  

In any case, based on my personal experience with an eye patch, it seems unlikely that you'd wear this thing constantly for a 20 minute advantage whenever you boarded a ship.  If you were going in and out of holds all the time, maybe... otherwise, it would be a hassle.  

Ramon's discovered that Howard Pyle (the 19th century illustrator) apparently created the common image of the pirate in his images (large gold earring, head scarves, broad sashes and belts, etc.) by adapting Spanish peasant dress of that time (late 1800s) in his pictures.  (As documented in the book Pirates: The Golden Age, by David Rickham and Angus Kostam.) 

In any case, given the lack of contemporary evidence, and the dysfunction of wearing an eyepatch most of the time, I'm going to say that this seems implausible to me.  Possible, but unlikely.  

 
Bono's sunglasses:  It's not hard to find that Bono has glaucoma, but it's a bit more difficult to find any support for wearing sunglasses as a therapy for glaucoma.  The obvious search in Google Scholar: 

     [ glaucoma sunglasses ] 

yields rather little of use.  There are lots of hits, but when you read the documents, you'll see that glaucoma is usually referred to separately, as another condition, and not in conjunction with sunglasses as a therapy.  

This is when  you want to use the AROUND operator.  

     [ glaucoma AROUND(5) sunglasses ] 

This gives you somewhat better results (on Google Web search) because it's looking for near adjacency of the terms (as opposed to a term like "glaucoma" being mentioned in a distant article somewhere on the page).    

But after reading through the top ten results, it becomes clear that sunglasses help with glare and being sensitive to light, but they don't actually act as a treatment or therapy for glaucoma.  

Just to check on this, I also did a search on Pubmed (the National Library of Medicine's scholarly medical collection).  My search was: 

     [ glaucoma sunglasses ] 

If you use Pubmed's Advanced Search tool, you'll see they have 57K hits for glaucoma alone, but only 2  (not 2K, just two) hits for glaucoma and sunglasses.  

I found that pretty interesting.  There are 28K results for [ glaucoma treatment ] but only 2 for [ glaucoma sunglasses ] 

Don't get me wrong--wearing sunglasses is a great idea, and if you're sensitive to bright lights and often exposed to flashes when indoors (as Bono is), then it's a good idea.  But the sunglasses aren't therapeutic, but tonic.  


Search Lessons:  

1. Sometimes it's good to do the experiment yourself.  My eyepatch experiment was really interesting.  I had no idea that my dark vision would return from the periphery to the center, nor that it would work as well as it did.  That experience also pointed out the difficulties of wearing an eye patch for any length of time when you didn't need to.  (My right shoulder now has a bruise where I hit the door jamb because I couldn't see it...)  

2. The LACK of information is often a signal in-and-of itself.  After looking at dozens and dozens of contemporary illustrations of pirates--and not finding a single eyepatch--I'm dubious that this was a standard practice.  To be sure, this isn't conclusive, but it's pretty interesting...  especially when.... 

3.  Very authoritative resources sometimes have (nearly) no results when they have nothing to say.  If you can't find more than 2 articles on Pubmed for a treatment mode (e.g., sunglasses), you can be pretty sure that the silence is telling you something.  


Great work this week--as always.  

Search on!