Archive for January, 2017

Countdown to Star Trek: Discovery: ??? days to go

January 20th, 2017

Today’s news has two parts:

Star Trek: Discovery, CBS’ upcoming addition to the long running sci-fi series, has been delayed again.

There’s now no release date for the show beyond ‘when it’s ready’.

And then:

CBS confirmed today that James Frain would play Spock’s father.

(I’ll leave aside that when Mark Lenard portrayed the 2260s Sarek he was 43, James Frain portraying the 2250s Sarek is 48. He’s meant to be a 90-100 year old Vulcan, you can have a little creative licence there).

Put together I’m officially downgrading from ‘cautiously optimistic’ to ‘worried’.

Sarek first (and why do they have to say ‘Spock’s father’ here – anyone paying attention to casting news on a new Star Trek TV show knows who Sarek is). Canonically Spock is in his mid 20s by this point, indeed he’s begun to serve on the Enterprise. So this is a Sarek who doesn’t much like Starfleet. He’s an ambassador, so perhaps some role in negotiations with the Klingons makes sense. So I can’t really say that this is a big stretch – but something about this feels wrong to me. Perhaps it’s that they’re trying too hard to shoehorn in links to the Original Series, maybe it’s the idea of recasting Mark Lenard.

But the indefinite delay is more worrying. Not because it won’t be out soon (although it was originally supposed to be out next week). But rather because it suggests that they’re frantically reworking everything to create a story and a plan. Which means that they’re junking whatever they had to start with. I think starting something like this without a clear idea of the story you want to tell isn’t going to end well. It begins to seem like there’s something of a death spiral going on.


Pretty Mapping of Australia using R

January 5th, 2017

I’ve recently been experimenting with Data Visualisation in R. As part of that I’ve put together a little bit of (probably error ridden and redundant) code to help mapping of Australia.

First, my code is built on a foundation from Luke’s guide to building maps of Australia in R, and this guide to making pretty maps in R.

The problem is that a lot of datasets, particularly administrative ones, come with postcode as the only geographic information. And postcodes aren’t a very useful geographic structure – there’s no defined aggregation structure, they’re inconsistent in size, and heavily dependent on history.

For instance, a postcode level map of Australia looks like this:

Way too messy to be useful.

The ABS has a nice set of statistical geography that will let me fix this problem by changing the aggregation level, but first I need to convert the data into another file.

Again, fortunately the ABS publishes concordances between postcodes and the Statistical Geography, so all I need to do is take those concordances and use them to mangle my data lightly. First, I used those concordances to make some CSV input files:

Concordance from Postcode to Statistical Area 2 level (2011)

Concordance from SA2 (2011) to SA2(2016)

Statistical Geography hierarchy to convert to SA3 and SA4

Then a little R coding. First convert from Postcode to SA2 (2011). SA level 2 is around the same level of detail of postcodes, and so the conversions won’t lose a lot of accuracy.And then convert to 2016 and add the rest of the geography:

## Convert Postcode level data to ABS Statistical Geography heirarchy

## Quick hack job, January 2017

## Robert Ewing

require(dplyr)

## Read in original data file, clean as needed.

## This data file is expected to have a variable 'post' for the postcode,

## and a data series called 'smsf' for the numbers.

data_PCODE <- read.csv("SMSF2.csv", stringsAsFactors = FALSE)

## Change this line depending on your data series.

## This code is designed to read in only one series. If you need more than one,

## you'll need to change the Aggregate functions.

## Change this line to reflect the name of the data series in your file

data_PCODE$x <- as.numeric(data_PCODE$smsf)

data_PCODE$smsf[is.na(data_PCODE$x)] <- 0

data_PCODE$POA_CODE16 <- sprintf("%04d", data_PCODE$post)

## Read in concordance from Postcode to SA2 (2011)

concordance <- read.csv("PCODE_SA2.csv", stringsAsFactors = FALSE)

concordance$POA_CODE16 <- sprintf("%04d", concordance$POSTCODE)

## Join the files

working_data <- concordance %>% left_join(data_PCODE)

working_data$x[is.na(working_data$x)] <- 0

## Adjust for partial coverage ratios

working_data$x_adj = working_data$x * working_data$Ratio

## And produce the SA2_2011 version of the dataset. Data is in x.

data_SA2_2011 <- aggregate(working_data$x_adj,list(SA2_MAINCODE_2011 = working_data$SA2_MAINCODE_2011),sum)

## Now read in the concordance from SA2_2011 to SA2_2016

concordance <- read.csv("SA2_2011_2016.csv", stringsAsFactors = FALSE)

## Join it.

working_data <- concordance %>% left_join(data_SA2_2011)

working_data$x[is.na(working_data$x)] <- 0

## Adjust for partial coverage ratios

working_data$x_adj = working_data$x * working_data$Ratio

## And produce aggregate in SA2_2016

data_SA2_2016 <- aggregate(working_data$x_adj,list(SA2_MAINCODE_2016 = working_data$SA2_MAINCODE_2016),sum)

## and finally join the SA2 with the rest of the hierarchy to allow on the fly adjustment.

statgeo <- read.csv("SA2_3_4.csv", stringsAsFactors = FALSE)

data_SA2_2016 <- data_SA2_2016 %>% left_join(statgeo)

The end result gives you a data set that can be converted to a higher level. Here's the chart above, but this time using SA3 rather than postcodes: