Data importing is important. Although it can be done easily, sometimes it does give people a headache. We will be going through a few importing methods for several common data formats and talk about issues people might encounter.
getwd()
list.files()
will list the files available in your working directory.
In case you want to change your directory, usesetwd("~/Desktop/")
# Set the working directory to your desktop
read.table
function is usually used for reading data in txt
format, and again we will want to know what kind of separation symbol a file has in order to import the data without any error. If you look at text1.txt and text2.txt, you will find the separation symbol of the first one is a space but the second has ;
as the delimiter. In this case we shall import them differently:
setwd("~/angerhang.github.io/statsWithR/src/chaper1/")
table1 <- read.table("text1.txt", header = T)
print(table1)
## make model mpg weight price
## 1 AMC Concord 22 2930 4099
## 2 AMC Pacer 17 3350 4749
## 3 AMC Spirit 22 2640 3799
## 4 Buick Century 20 3250 4816
## 5 Buick Electra 15 4080 7827
table2 <- read.table("text2.txt", sep = ";", header = T)
print(table2)
## make model mpg weight price
## 1 AMC Concord 22 2930 4099
## 2 AMC Pacer 17 3350 4749
## 3 AMC Spirit 22 2640 3799
## 4 Buick Century 20 3250 4816
## 5 Buick Electra 15 4080 7827
header
argument in the function call. The header will be automatically generated for you using V1
, V2
, V3
...... text3.txt is such a file that has no header.
table3 <- read.table("text3.txt")
.csv
extension, namely read.csv()
and read.csv2()
. The difference of these two is that for former one can properly import the Excel file that has comma ,
as the separator symbol, whereas, the latter can import the one that has semicolon ;
as the separator symbol.
test_csv = read.csv("test.csv")
test_csv = read.csv2("test.csv")
sav
, we need to use the foreign package which comes with a set of commands that allows us to import data in different ways:
# install foreign package if you don't have it
install.packages("foreign")
library(foreign)
spss_data = read.spss("test", to.data.frame = TRUE)
library(sas7bdat)
sas_data = read.sas7bdat("test.sas7bdat")
str
which tells you much about your data set.
str(table1)
## 'data.frame': 5 obs. of 5 variables:
## $ make : Factor w/ 2 levels "AMC","Buick": 1 1 1 2 2
## $ model : Factor w/ 5 levels "Century","Concord",..: 2 4 5 1 3
## $ mpg : int 22 17 22 20 15
## $ weight: int 2930 3350 2640 3250 4080
## $ price : int 4099 4749 3799 4816 7827
And remember to check the header and the numbers of rows and columns because those are the places where things normally go wrong. (e.g. when you import a data set that has a header as if it doesn't, your data set will have an extra row of elements that are your original header.)