As an information designer, I’m charged with summarizing data. But even the simplest of questions, like “How big is a typical case?” presents choices about what to do; about what kind of summary to use. An “average” is supposed to describe something like a typical case, or the “central tendency” of the data. But there are many kinds of averages, as you might know. Here I’ll give a quick overview of two familiar averages, median and arithmetic mean, and compare them to a third, the geometric mean — which I think should get a lot more use than it does.
People obsess over what bootcamps to join, the hottest ML algorithms, and which SaaS products to use. But the technicalities of data science/analysis are just one piece of the work.
I would list at least these six skills:
Nice write-up of this position, Erik. I'm of the same persuasion.
I've seen the same thing in the R world, between different dialects. I even wrote a piece comparing one approach (the “tidyverse”) to a kitchen full of gadgets — v. using a chef knife (data.table).
I also saw something similar when I worked in graphic design. There were tool people, who loved add-ons for Photoshop that would save them 10 seconds each, but cost them 2 hours to update and implement in total; versus the people who would just wing it with the tools included, and not worry.
There are some interesting implications for teaching / learning languages too. I think the simplicity approach is far better -- you can learn concepts then, and not baroque tools that you might never need in your particular career.
Are dashboards dead? Kind of. Which I think makes them undead.
No, they're not dead: because people keep asking for them, and other people keep making them.
Yes, they're dead, because they rarely get used: most dashbaords are requested, made, published, and then never accessed again.
Users love the idea of dashboards: that they'll have control, and can get all kinds of insights. But most data are too complex to understand without analysis -- more than you cna do in an interactive dash. So to answer any serious question, you're back to ad-hoc reporting. Which is fine. Dashboards have their uses -- it's just far fewer uses than we sometimes imagine.
Flexibility with constraints. That’s what I find fun and to generate the most creative ideas. So I’ve created a generic system for fantasy classes. (Just remember, half the fun of old D&D games was making your character!)
Each character has three slots to use, assigning each a class archetype from the nine below. How you interpret these combinations is where the fun comes in. There are a lot of possibilities, and I’ve provided class specializations to shade each selection as well (each slot gets its own specialization). …
If you haven’t done a lot of programming, learning R can be pretty intimidating.
But it’s easier if you focus on fundamentals, and slowly build up your skills through practice. Here I’ll give a short lesson on the most basic things you can do in R.
Let’s start by looking at the very basics of how we enter commands in R, to tell R what we want to do. …
Last year I started teaching a six-week R programming course at a university, and I have another available online on Udemy, So You Need to Learn R. I thought long and hard about how to teach R. Here’s what I came up with.
I wanted a course that beginners could take. Most people learning R via a course aren’t computer scientists or experienced hackers, but doctoral students in the sciences, analysts who currently use Excel for their work, or those just starting to pursue career in data science. They might have basic coding experience, but maybe on shaky foundations.
Let me tell you about how to succeed. Obviously, I haven’t yet done this myself, or I’d have better things to do than write about it. Unless you pay me — my speaking rates are on the high end of very affordable, I assure you. But I digress. What was I talking about? Success! Because what else is there in life, or at least on Medium, anyway?
This particular brand of success will give you ten orders of magnitude better results than your current brand. 99.9% of all successful people do it. I have, of course, defined success to be…
You find your data, load it, model it — and get garbage out. Must have been garbage in, as they say. Of course, you made sure to have a nice rectangular files with consistently named columns, and you deleted the random comments people typed into the Excel document. So what gives?
Here are six common, real-world data irregularities you will have to deal with sooner or later.
One expects one record per row. And in 95% of a file, that’s what you have. But then suddenly a join gives a warning, and you realize there are extras. Why? …
There is now a major dialect of R, loudly proclaimed and apparently in the ascendant: the Tidyverse, promulgated by RStudio and largely the effort of one man, Hadley Wickham. Should we adopt it? Should students learn it? Is even having R dialects a good idea?
I have some strong opinions on this, being someone who’s used R extensively in academic and professional contexts — for everything from scientific research and simulation modeling to custom visualization and predictive modeling— and who now teaches a course in R.
Mostly, I am skeptical of this “tidy” fad: it does not sit well with…