06.17.13

My First Brush w/Open Data – Hospital Charges

Posted in Data Visualization at 4:35 am by Auro Tripathy

Curious about what a medical procedure may cost you? Then, read on…

Recent data on the top 100 medical procedures is available here. The Government will soon release data on yet another 30 procedures. Below is a box plot showing the bewildering variation in in-patient cost for medical procedures nationwide.

The R code below can be executed, without changes, to generate the plot above.

You can also use openrefine to discover that a medical procedure with code 207 can cost up to a million dollars!

# Author Auro Tripathy, auro@shatterline.com
# The box plot code, written in R is reproducible and is licensed under Creative Commons, Attribution-NonCommercial-ShareAlike, CC BY-NC-SA

# Medicare Provider Charge Data: Inpatient
# The data provided here include hospital-specific charges for the more than 3,000 U.S. hospitals 
# that receive Medicare Inpatient Prospective Payment System (IPPS) payments for the top 100 most 
# frequently billed discharges, paid under Medicare based on a rate per discharge using the 
# Medicare Severity Diagnosis Related Group (MS-DRG) for Fiscal Year (FY) 2011. These DRGs 
# represent almost 7 million discharges or 60 percent of total Medicare IPPS discharges.

# Read further and get the data from the link below
# http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Inpatient.html

rm(list=ls())
temp.zipped <- tempfile()
download.file("http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Downloads/IPPS_DRG_CSV.zip",
              temp.zipped)
hospital.charges <- read.csv(unz(temp.zipped, "Medicare_Provider_Charge_Inpatient_DRG100_FY2011.csv"), header=TRUE)
unlink(temp.zipped)
dim(hospital.charges)

#min/max needed to bound the plot
max <- max(hospital.charges$Average.Covered.Charges)
min <- min(hospital.charges$Average.Covered.Charges)

#if you want to study the data further, use openrefine (aka Google Refine)
colnames(hospital.charges)
unique(hospital.charges$DRG.Definition)
unique(hospital.charges$Provider.Zip.Code)
unique(hospital.charges$Provider.Name)
unique(hospital.charges$Provider.City)

procedures <- unique(hospital.charges$DRG.Definition)

#procedure.by.charges.table
procedure.charges.array <- array(list(NULL), c(100))

for (i in 1:length(procedures)) {

  procedure.charges <- hospital.charges[which(hospital.charges$DRG.Definition == procedures[i]), ]
  print(nrow(procedure.charges))

  #used in the box-and-whiskers plot below
  procedure.charges.array[[i]] <- array(procedure.charges$Average.Covered.Charges, dim=nrow(procedure.charges))

}

#retain the three-digit medical code
procedures.labels <- as.character(procedures)
for (i in 1:length(procedures.labels)) {
  procedures.labels[i] <- substr(procedures.labels[i], 1, 3)
}

#boxplot to show the media, the quartiles, and the outliers
boxplot(x = procedure.charges.array, main="Boxplot Showing Variation in In-Patient Cost for Medical Procedures Nationwide",
        xlab="Medical Procedure Code",
        col = c("lightgreen", "brown2", "cyan4"),
        ylim=c(min, max), yaxt="n", col.ticks = "red", col.axis = "azure4", names=procedures.labels, las=2)

axis(2, axTicks(2), labels=sprintf("$%2d", axTicks(2)), las=1)

Created by Pretty R at inside-R.org

Leave a Comment