How to Read Text File in Rstudio
Importing Data into R
A tutorial nearly data assay using R
Dr Jon Yearsley (Schoolhouse of Biology and Environmental Science, UCD)
- Objectives
- Organise yourself!
- Data Workflow
- Format your data (tidy data)
- Information frames
- Importing spreadsheet data
- Summary of the topics covered
- Farther Reading
How to Read this Tutorial
This tutorial is a mixture of R code chunks and explanations of the lawmaking. The R lawmaking chunks volition appear in boxes.
Below is an instance of a chunk of R code:
# This is a chunk of R code. All text after a # symbol is a comment # Ready working directory using setwd() function setwd('Enter the path to my working directory') # Clear all variables in R'due south memory rm(list= ls()) # Standard lawmaking to clear R'due south memory
Sometimes the output from running this R code volition exist displayed after the chunk of code. R output will be preceeded by ##.
Here is a chunk of code followed past the R output
ii + 4 # Use R to add together two numbers
## [one] 6
Objectives
The objectives of this tutorial are:
- Demonstrate good do in data organisation
- Innovate apparently text file formats for data
- Explain data import into R
Organise yourself!
Before you outset importing data into R you should accept time to organised your workspace on your computer:
- Create a binder on your computer to contain all your work for this particular projection (eastward.g. a folder called DataModule)
- Inside this project folder create another binder called
data
. This will hold all the raw data files. These raw information files should not exist changed. - Inside this project folder create a text file called
MyFirstScript.R
. You can use RStudio for this (for this use File->New File->R Script menu option) or any basic text editor to do this (east.g. Notepad, TextEdit, gedit, emacs). This file will be your R script that volition contain all the commands for R. The.r
or.R
suffixes is the standard suffix for an R script. - If you are starting a large project consider creating carve up folder for: R scripts, figures, output from the R script
Your first R script
Now you have created the file MyFirstScript.R
yous should put some header text at the start of the file to explain what the R script will exercise. This was described in tutorial ane.
Video Tutorial: Creating a new R script with RStudio (1 min)
The text should have a curt explanation of the R script followed by your name and the date yous wrote the R script. Each line should start with a #
and then that the text is non interpreted by R (this text is for humans so they understand what the file is intended to do). Here is an case,
# ********** Kickoff of header ************** # Title: <The title of your R script> # # Add together a short description of the R script here. # # Author: <your name> (electronic mail address) # Engagement: <today's date> # # *********** End of header **************** # Two mutual commands at the offset of an R script are: rm(list=ls()) # Clear R'south memory setwd('~/DataModule') # Set the working directory # Supplant '~/DataModule' with the proper noun of your own directory # ****************************************** # Write your commands below. # Call up to use comments to explain your commands
Writing clear R scripts
An R script isn't just telling the calculator how to perform calculations on your data. It is also explaining your working to other man beings.
"Instead of imagining that our main task is to instruct a computer what to do, permit us concentrate rather on explaining to human beings what nosotros want a computer to do." – Donald E. Knuth
To make your R scripts usable by humans they must be clearly commented (using the #
symbol to start a comment) and conspicuously organised.
Equally you lot write an R script consider these questions:
- Does your R script await well organised (e.g. is information technology well spaced, are lines indented logically)?
- Could someone else read the R script and understand the basic thought?
- Could someone else modify your R script relatively easily?
- In a couple of months fourth dimension could you lot quickly read and edit your own R script?
Professional data analysts take clarity very seriously. Hither are some links to R coding style guides:
- Google's style guide, https://google.github.io/styleguide/Rguide.xml
- Hadley Wickham's style guide, http://adv-r.had.co.nz/Mode.html
- http://world wide web.stat.ubc.ca/~jenny/STAT545A/block19_codeFormattingOrganization.html
- http://nicercode.github.io/blog/2013-04-05-why-nice-code/
Information Workflow
Below is a schematic of the workflow for handling information.
In this tutorial we will consider formating data, in the next tutorial we'll hash out importing data, and then we'll start to consider exploring the data using graphics and numerical summaries.
Format your information (tidy data)
The workflow starts long before you analyse your data. It starts even before you have your data in some computer software.
Organising your information should follow tidy information guidelines (meet beneath) and be planned before you collect your data. The format of the data should be finalised earlier importing the data into R. It is often easiest to tidy your data using a spreadsheet plan before you import the data into R.
Well organised data from the start will brand your life a lot easier and your data import equally painless as possible.
Six guidelines for tidy information
When tidying your data yous should ensure that:
- each variable has its ain cavalcade
- each row is an observation
- the elevation of each column contains the name of the variable
- there are no blank columns or blank rows between data
- all data in a column has the aforementioned type (due east.g. it is all numerical data, or it is all text information)
- data are consistent (e.g. if a binary variable tin take values 'Yep' or 'No' and then only these two values are allowed, with no alternatives such as 'Y' and 'N')
PDF Summary: This PDF document reiterates the concept of tidy data
The link to the PDF is: http://www.ucd.ie/ecomodel/pdf/TidyData.pdf
Poorly vs well formatted data
The data fix shown in the figure below are an example of poorly formatted data. The data set contains data on the lead concentrations (ppm) from three species of fish (whitefish, sucker and trout). Two types of sample were nerveless: samples from fillets of fish and from whole fish. The data has iii variables: lead concentration, species of fish and blazon of fish sample.
How would yous amend the format of the poorly formatted information shown in the figure? (Hint: utilise the six guidelines above)
The second figure shows some well formatted data that follows the tidy data guidelines: each column represents a single variable and each row an observation.
Data frames
A data frame is R's proper name for spreadsheet data (e.thousand. data organised in a grid, similar Excel). R stores the vast majority of information every bit a data frame and uses data frames when analyzing data.
A data frame forces the data to be well organised.
- Each column is a variable. The name of this variable becomes the name of the column.
- Each row corresponds to an observation. This meas that values in the aforementioned row are data collected about the same object. Rows can besides have names.
Below is an example of a data frame (called airquality
) that contains data on the air quality in New York from May - September 1973 (this is a data set that is congenital in to R).
# The airquality information is a built-in dataset # Showtime 10 rows of the airquality data frame head(airquality, n= ten)
## Ozone Solar.R Wind Temp Month 24-hour interval ## 1 41 190 7.iv 67 v 1 ## 2 36 118 8.0 72 5 two ## three 12 149 12.6 74 5 3 ## 4 eighteen 313 11.five 62 v 4 ## 5 NA NA 14.three 56 5 5 ## vi 28 NA 14.9 66 5 six ## 7 23 299 8.half dozen 65 5 7 ## 8 19 99 13.eight 59 5 eight ## 9 8 19 20.1 61 5 9 ## 10 NA 194 8.6 69 5 10
Y'all can type ?airquality
to display the help file for this data ready. The data frame has 154 rows (observations) and 6 columns (variables measured). The 6 columns comprise data on: ozone concentrations (parts per billion), solar radiation, wind speed, air temperature, month and day of observation. You tin can see that each column has a proper name corresponding to the data for that column.
The construction of the data frame can be viewed using the str()
function
# Display the structure of the airquality information frame str(airquality)
## 'data.frame': 153 obs. of 6 variables: ## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ... ## $ Solar.R: int 190 118 149 313 NA NA 299 99 nineteen 194 ... ## $ Current of air : num 7.4 viii 12.half dozen 11.5 fourteen.3 14.nine 8.vi 13.viii twenty.1 viii.6 ... ## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ... ## $ Month : int 5 v v 5 5 5 five 5 5 5 ... ## $ 24-hour interval : int 1 two 3 4 5 6 7 8 nine 10 ...
The str()
function shows that this is a information frame with 153 observations (rows) and half dozen variables (columns). It besides shows the data tyes of the variables: air current is a numerical variable (i.e. continuous) and the other variables are all integers (i.due east. whole numbers).
Tidy information in R is described in more item on this web page: https://cran.r-projection.org/spider web/packages/tidyr/vignettes/tidy-information.html
Tibbles
A recent development (circa 2016) is an improved data frame called a tibble. We will non discuss these new data frame objects here, but you lot can read about them at https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html.
Don't Panic! Tibbles are very like to data frames.
The important indicate to know is that if you utilize RStudio's GUI interface to import data then your information will be stored in a tibble, not a data frame.
Importing spreadsheet data
To get-go working with data in R you need to import your data into R. You are aiming to have a information frame that contains your data.
The simplest style to import data into R is from a text file (https://en.wikipedia.org/wiki/Text_file). Text files (sometimes called flat files) tin can be read past any computer operating arrangement and by many different statistical programs. Saving data equally a uncomplicated text file makes your data highly transportable.
Importing data from software specific formats (due east.g. Excel'due south .XLSX format, Minitab's .MTW format, SPSS'due south .SAV format or SAS's .SAS format) is possible (e.one thousand. using RStudio'south Import Dataset GUI). If you desire your information to be easily shared with other people then use a text file to store your data.
Nosotros suggest you lot to:
- salve your data every bit a text file (software, such as Excel, oft have an choice to relieve data as plain text)
- organize data with columns corresponding to different variables before exporting to the text file
- utilise a visible text graphic symbol to circumscribe each cavalcade (usually a comma, semi-colon). Using an invisible character (eastward.k. a space or a TAB) is not recommended because these characters all await the same at start glance.
General advice on importing data into R can be found at https://cran.r-projection.org/dr./manuals/r-release/R-data.html
Converting data to a CSV text file
A comma separated values file (CSV file) is the most mutual format for a text file that contains data.
Here are a few video tutorials on converting data into a CSV text file then that information technology is suitable for import into R.
Video Tutorial: Converting data from EXCEL to a CSV format (3 mins)
Video Tutorial: Converting information from Googlesheets to a CSV format (1 min)
Viewing text files
Earlier importing a text file into whatever software package it is a huge help if you tin look at it in a text editor. Text files can incorporate characters that are normally invisible (eastward.thousand. spaces, tabs and end of line markers). If a text editor is going to exist of use it must be able to display all the characters in a file.
3 text editors that tin do this are:
notepad++ is a complimentary program for Windows operating systems
BBedit is a costless programme for Mac OSX operating systems
emacs is a GNU opensource program primarily for Linux operating systems.
On Linux systems the cat -A
command from the terminal is likewise useful.
Here are two video tutorials on this topic
Video Tutorial: Viewing data in a text file earlier importing into R (4 mins)
Video Tutorial: An overview of the common data text file formats (3 mins)
Information import examples
The data we'll be importing are described at http://world wide web.ucd.ie/ecomodel/Resources/datasets_WebVersion.html
The files are:
- WOLF.CSV: This file is a text file of comma separated values.
- Top.CSV: This file is a text file of comma separated values.
- INSECT.TXT:This file is a text file of TAB delimited values.
- BEEKEEPER.TXT: This file is a text file with bare infinite delimiting the values.
- MALIN_HEAD.TXT: This file is a text file with TAB delimited values.
All these data files are simple text files that differ in the character used to distinguish columns of data.
Comma delimited files (CSV files)
CSV stands for comma separated values (note sometimes semi-colons are used in identify of commas considering some countries use the comma in identify of the decimal betoken).
The read.table()
office is a flexible part for importing text data
Video Tutorial: Importing a CSV file into R using read.table() (5 mins)
# Import WOLF.CSV file using read.table role wolf = read.table('WOLF.CSV', header= Truthful, sep= ',')
The wolf
variable contains the imported data. It is called a data frame.
The platonic arrangement of a data frame is for each row to exist an ascertainment of some object and each columns a variable that measures some property of the object. For instance, each row of wolf
is an observation of i individual wolf and each cavalcade of wolf
requite information almost where the wolf was observed and the data collected from its pilus sample.
The HEIGHT.CSV file also contains comma separated values. Hither is the read.table()
command to read in this file
# Import HEIGHT.CSV file using read.table function human = read.table('HEIGHT.CSV', header= TRUE, sep= ',')
Note: The part read.csv()
is a special case of the read.table()
function.
Use the R help pages to learn more about these functions
?read.table # Display help page on read.tabular array office
TAB delimited files (TXT files)
The INSECT.TXT data gear up is a text file where variables are delimited by a TAB. In improver the commencement iii lines contain a information clarification that we do not want to import.
The read.table()
function tin exist used to import this file. The statement skip=3
is used to ignore the commencement three lines. The argument sep='\t'
specifies a TAB as the variable delimiter
# Import INSECT.TXT file using read.tabular array role (TAB delimited) # skipping the first three lines (skip=three) insect = read.table('INSECT.TXT', header=T, skip= 3, sep= ' \t ')
The MALIN_HEAD.TXT as well contains TAB delimited information. Here is the read.table()
command to read in this file
# Import MALIN_HEAD.TXT file using read.table role (TAB delimited) rainfall = read.tabular array('MALIN_HEAD.TXT', header=T, sep= ' \t ')
Blank space delimited files
The BEEKEEPER.TXT data set uses white space to circumscribe the variables. The start half-dozen lines of the file contain a description of the data
Using read.tabular array()
with the argument sep=''
volition translate any space equally a variable delimiter.
# Import Apiculturist.TXT file using read.table office (white space delimited) # skipping the first 6 lines (skip=6) bees = read.table('BEEKEEPER.TXT', header=T, skip= 6, sep= '')
Summary of import commands
Type of text file | R Command |
---|---|
Comma delimited (.CSV) | read.table(<filename>, header=T, sep=',') |
TAB delimited (.TXT) | read.table(<filename>, header=T, sep='\t') |
Blank space (.TXT) | read.table(<filename>, header=T, sep='') |
# Comma separated values wolf = read.tabular array('WOLF.CSV', header= True, sep= ',') human = read.table('Meridian.CSV', header= TRUE, sep= ',') # TAB delimited values insect = read.table('INSECT.TXT', header=T, skip= three, sep= ' \t ') rainfall = read.table('MALIN_HEAD.TXT', header=T, sep= ' \t ') # White space delimited values bees = read.table('Beekeeper.TXT', header=T, skip= 6, sep= '')
Importing data using RStudio
RStudio has its own data import functionality. To use this you will need to install the R packet readr
. For more than inofmration about this see RStudio'due south guide: https://support.rstudio.com/hc/en-united states of america/articles/218611977-Importing-Data-with-RStudio
Video Tutorial: Importing a CSV file into R using RStudio'south GUI (3 mins 13 secs)
Importing data using RStudio will relieve the information as a modified data frame, called a tibble
(tibbles are briefly discussed higher up).
Importing using fread()
fread()
is a powerful data import function that is similar to read.table()
merely faster. It is role of the data.table
package, which you will need to install.
You should just accept to give fread()
the name of the file yous want to import, and fread()
will try to work out the appropriate fashion to import the data. Attempt some examples and compare the the examples to a higher place
# ****************************************** # Other packages for importing information -------- # The data.table package library(data.table) # Load the data.table bundle # Import a CSV file wolf2 = fread('WOLF.CSV') human2 = fread('HEIGHT.CSV') # Import TAB delimited file insect2 = fread('INSECT.TXT') rainfall2 = fread('MALIN_HEAD.TXT') # Import white space delimited file bees2 = fread('Apiculturist.TXT')
The fread()
command is simpler to use because it tries to guess the format of the data in the file.
Summary of the topics covered
- Organizing your files on your reckoner
- All-time exercise for formatting data
- Reading in spreadsheet information
- Data frames
Further Reading
All these books can be found in UCD's library
- Andrew P. Beckerman and Owen 50. Petchey, 2012 Getting Started with R: An introduction for biologists (Oxford University Printing, Oxford) [Chapter 2, iii]
- Mark Gardner, 2012 Statistics for Ecologists Using R and Excel (Pelagic, Exeter)
- Michael J. Crawley, 2015 Statistics : an introduction using R (John Wiley & Sons, Chichester) [Affiliate 2]
- Tenko Raykov and George A Marcoulides, 2013 Basic statistics: an introduction with R (Rowman and Littlefield, Plymouth)
matthiesthandsoll.blogspot.com
Source: https://www.ucd.ie/ecomodel/Resources/Sheet2a_data_import_WebVersion.html
0 Response to "How to Read Text File in Rstudio"
Post a Comment