How to Read Text File in Rstudio

Importing Data into R

A tutorial nearly data assay using R

Dr Jon Yearsley (Schoolhouse of Biology and Environmental Science, UCD)

Objectives
Organise yourself!
Data Workflow
Format your data (tidy data)
Information frames
Importing spreadsheet data
Summary of the topics covered
Farther Reading

How to Read this Tutorial

This tutorial is a mixture of R code chunks and explanations of the lawmaking. The R lawmaking chunks volition appear in boxes.

Below is an instance of a chunk of R code:

                                          # This is a chunk of R code. All text after a # symbol is a comment                                            # Ready working directory using setwd() function                                            setwd('Enter the path to my working directory')                                            # Clear all variables in R'due south memory                                            rm(list=                ls())                # Standard lawmaking to clear R'due south memory

Sometimes the output from running this R code volition exist displayed after the chunk of code. R output will be preceeded by ##.

Here is a chunk of code followed past the R output

                                          ii                +                4                # Use R to add together two numbers

          ## [one] 6

Objectives

The objectives of this tutorial are:

Demonstrate good do in data organisation
Innovate apparently text file formats for data
Explain data import into R

Organise yourself!

Before you outset importing data into R you should accept time to organised your workspace on your computer:

Create a binder on your computer to contain all your work for this particular projection (eastward.g. a folder called DataModule)
Inside this project folder create another binder called data. This will hold all the raw data files. These raw information files should not exist changed.
Inside this project folder create a text file called MyFirstScript.R. You can use RStudio for this (for this use File->New File->R Script menu option) or any basic text editor to do this (east.g. Notepad, TextEdit, gedit, emacs). This file will be your R script that volition contain all the commands for R. The .r or .R suffixes is the standard suffix for an R script.
If you are starting a large project consider creating carve up folder for: R scripts, figures, output from the R script

Your first R script

Now you have created the file MyFirstScript.R yous should put some header text at the start of the file to explain what the R script will exercise. This was described in tutorial ane.

Video Tutorial: Creating a new R script with RStudio (1 min)

The text should have a curt explanation of the R script followed by your name and the date yous wrote the R script. Each line should start with a # and then that the text is non interpreted by R (this text is for humans so they understand what the file is intended to do). Here is an case,

          # ********** Kickoff of header ************** # Title: <The title of your R script>  # # Add together a short description of the R script here. # # Author: <your name>  (electronic mail address) # Engagement: <today's date> # # *********** End of header ****************  # Two mutual commands at the offset of an R script are: rm(list=ls())         # Clear R'south memory  setwd('~/DataModule') # Set the working directory  # Supplant '~/DataModule' with the proper noun of your own directory  # ****************************************** # Write your commands below.  # Call up to use comments to explain your commands

Writing clear R scripts

An R script isn't just telling the calculator how to perform calculations on your data. It is also explaining your working to other man beings.

"Instead of imagining that our main task is to instruct a computer what to do, permit us concentrate rather on explaining to human beings what nosotros want a computer to do." – Donald E. Knuth

To make your R scripts usable by humans they must be clearly commented (using the # symbol to start a comment) and conspicuously organised.

Equally you lot write an R script consider these questions:

Does your R script await well organised (e.g. is information technology well spaced, are lines indented logically)?
Could someone else read the R script and understand the basic thought?
Could someone else modify your R script relatively easily?
In a couple of months fourth dimension could you lot quickly read and edit your own R script?

Professional data analysts take clarity very seriously. Hither are some links to R coding style guides:

Google's style guide, https://google.github.io/styleguide/Rguide.xml
Hadley Wickham's style guide, http://adv-r.had.co.nz/Mode.html
http://world wide web.stat.ubc.ca/~jenny/STAT545A/block19_codeFormattingOrganization.html
http://nicercode.github.io/blog/2013-04-05-why-nice-code/

Information Workflow

Below is a schematic of the workflow for handling information.

Figure: The workflow to follow when handling data.

In this tutorial we will consider formating data, in the next tutorial we'll hash out importing data, and then we'll start to consider exploring the data using graphics and numerical summaries.

Format your information (tidy data)

The workflow starts long before you analyse your data. It starts even before you have your data in some computer software.

Organising your information should follow tidy information guidelines (meet beneath) and be planned before you collect your data. The format of the data should be finalised earlier importing the data into R. It is often easiest to tidy your data using a spreadsheet plan before you import the data into R.

Well organised data from the start will brand your life a lot easier and your data import equally painless as possible.

Six guidelines for tidy information

When tidying your data yous should ensure that:

each variable has its ain cavalcade
each row is an observation
the elevation of each column contains the name of the variable
there are no blank columns or blank rows between data
all data in a column has the aforementioned type (due east.g. it is all numerical data, or it is all text information)
data are consistent (e.g. if a binary variable tin take values 'Yep' or 'No' and then only these two values are allowed, with no alternatives such as 'Y' and 'N')

PDF Summary: This PDF document reiterates the concept of tidy data

The link to the PDF is: http://www.ucd.ie/ecomodel/pdf/TidyData.pdf

Poorly vs well formatted data

The data fix shown in the figure below are an example of poorly formatted data. The data set contains data on the lead concentrations (ppm) from three species of fish (whitefish, sucker and trout). Two types of sample were nerveless: samples from fillets of fish and from whole fish. The data has iii variables: lead concentration, species of fish and blazon of fish sample.

Figure: A poorly formatted data set. This file would be hard to import and analyse in this format.

How would yous amend the format of the poorly formatted information shown in the figure? (Hint: utilise the six guidelines above)

The second figure shows some well formatted data that follows the tidy data guidelines: each column represents a single variable and each row an observation.

Figure: A well formatted data set. This file would be easy to import and analyse in this format. One column contains the data for one variable. These data are the worldwide occurences of Covid-19, downlaoded from the European Centre for Disease Prevention and Control, https://www.ecdc.europa.eu/en

Data frames

A data frame is R's proper name for spreadsheet data (e.thousand. data organised in a grid, similar Excel). R stores the vast majority of information every bit a data frame and uses data frames when analyzing data.

A data frame forces the data to be well organised.

Each column is a variable. The name of this variable becomes the name of the column.
Each row corresponds to an observation. This meas that values in the aforementioned row are data collected about the same object. Rows can besides have names.

Below is an example of a data frame (called airquality) that contains data on the air quality in New York from May - September 1973 (this is a data set that is congenital in to R).

                                          # The airquality information is a built-in dataset                                                          # Showtime 10 rows of the airquality data frame                                            head(airquality,                n=                ten)

          ##    Ozone Solar.R Wind Temp Month 24-hour interval ## 1     41     190  7.iv   67     v   1 ## 2     36     118  8.0   72     5   two ## three     12     149 12.6   74     5   3 ## 4     eighteen     313 11.five   62     v   4 ## 5     NA      NA 14.three   56     5   5 ## vi     28      NA 14.9   66     5   six ## 7     23     299  8.half dozen   65     5   7 ## 8     19      99 13.eight   59     5   eight ## 9      8      19 20.1   61     5   9 ## 10    NA     194  8.6   69     5  10

Y'all can type ?airquality to display the help file for this data ready. The data frame has 154 rows (observations) and 6 columns (variables measured). The 6 columns comprise data on: ozone concentrations (parts per billion), solar radiation, wind speed, air temperature, month and day of observation. You tin can see that each column has a proper name corresponding to the data for that column.

The construction of the data frame can be viewed using the str() function

                                          # Display the structure of the airquality information frame                                            str(airquality)

          ## 'data.frame':    153 obs. of  6 variables: ##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ... ##  $ Solar.R: int  190 118 149 313 NA NA 299 99 nineteen 194 ... ##  $ Current of air   : num  7.4 viii 12.half dozen 11.5 fourteen.3 14.nine 8.vi 13.viii twenty.1 viii.6 ... ##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ... ##  $ Month  : int  5 v v 5 5 5 five 5 5 5 ... ##  $ 24-hour interval    : int  1 two 3 4 5 6 7 8 nine 10 ...

The str() function shows that this is a information frame with 153 observations (rows) and half dozen variables (columns). It besides shows the data tyes of the variables: air current is a numerical variable (i.e. continuous) and the other variables are all integers (i.due east. whole numbers).

Tidy information in R is described in more item on this web page: https://cran.r-projection.org/spider web/packages/tidyr/vignettes/tidy-information.html

Tibbles

A recent development (circa 2016) is an improved data frame called a tibble. We will non discuss these new data frame objects here, but you lot can read about them at https://cran.r-project.org/web/packages/tibble/vignettes/tibble.html.

Don't Panic! Tibbles are very like to data frames.

The important indicate to know is that if you utilize RStudio's GUI interface to import data then your information will be stored in a tibble, not a data frame.

Importing spreadsheet data

To get-go working with data in R you need to import your data into R. You are aiming to have a information frame that contains your data.

The simplest style to import data into R is from a text file (https://en.wikipedia.org/wiki/Text_file). Text files (sometimes called flat files) tin can be read past any computer operating arrangement and by many different statistical programs. Saving data equally a uncomplicated text file makes your data highly transportable.

Importing data from software specific formats (due east.g. Excel'due south .XLSX format, Minitab's .MTW format, SPSS'due south .SAV format or SAS's .SAS format) is possible (e.one thousand. using RStudio'south Import Dataset GUI). If you desire your information to be easily shared with other people then use a text file to store your data.

Nosotros suggest you lot to:

salve your data every bit a text file (software, such as Excel, oft have an choice to relieve data as plain text)
organize data with columns corresponding to different variables before exporting to the text file
utilise a visible text graphic symbol to circumscribe each cavalcade (usually a comma, semi-colon). Using an invisible character (eastward.k. a space or a TAB) is not recommended because these characters all await the same at start glance.

General advice on importing data into R can be found at https://cran.r-projection.org/dr./manuals/r-release/R-data.html

Converting data to a CSV text file

A comma separated values file (CSV file) is the most mutual format for a text file that contains data.

Here are a few video tutorials on converting data into a CSV text file then that information technology is suitable for import into R.

Video Tutorial: Converting data from EXCEL to a CSV format (3 mins)

Video Tutorial: Converting information from Googlesheets to a CSV format (1 min)

Viewing text files

Earlier importing a text file into whatever software package it is a huge help if you tin look at it in a text editor. Text files can incorporate characters that are normally invisible (eastward.thousand. spaces, tabs and end of line markers). If a text editor is going to exist of use it must be able to display all the characters in a file.

3 text editors that tin do this are:

notepad++ is a complimentary program for Windows operating systems

BBedit is a costless programme for Mac OSX operating systems

emacs is a GNU opensource program primarily for Linux operating systems.

On Linux systems the cat -A command from the terminal is likewise useful.

Here are two video tutorials on this topic

Video Tutorial: Viewing data in a text file earlier importing into R (4 mins)

Video Tutorial: An overview of the common data text file formats (3 mins)

Information import examples

The data we'll be importing are described at http://world wide web.ucd.ie/ecomodel/Resources/datasets_WebVersion.html

The files are:

WOLF.CSV: This file is a text file of comma separated values.
Top.CSV: This file is a text file of comma separated values.
INSECT.TXT:This file is a text file of TAB delimited values.
BEEKEEPER.TXT: This file is a text file with bare infinite delimiting the values.
MALIN_HEAD.TXT: This file is a text file with TAB delimited values.

All these data files are simple text files that differ in the character used to distinguish columns of data.

Comma delimited files (CSV files)

CSV stands for comma separated values (note sometimes semi-colons are used in identify of commas considering some countries use the comma in identify of the decimal betoken).

The read.table() office is a flexible part for importing text data

Video Tutorial: Importing a CSV file into R using read.table() (5 mins)

                                          # Import WOLF.CSV file using read.table role                            wolf                =                read.table('WOLF.CSV',                header=                Truthful,                sep=                ',')

The wolf variable contains the imported data. It is called a data frame.

The platonic arrangement of a data frame is for each row to exist an ascertainment of some object and each columns a variable that measures some property of the object. For instance, each row of wolf is an observation of i individual wolf and each cavalcade of wolf requite information almost where the wolf was observed and the data collected from its pilus sample.

The HEIGHT.CSV file also contains comma separated values. Hither is the read.table() command to read in this file

                                          # Import HEIGHT.CSV file using read.table function                            human                =                read.table('HEIGHT.CSV',                header=                TRUE,                sep=                ',')

Note: The part read.csv() is a special case of the read.table() function.

Use the R help pages to learn more about these functions

                          ?read.table                # Display help page on read.tabular array office

TAB delimited files (TXT files)

The INSECT.TXT data gear up is a text file where variables are delimited by a TAB. In improver the commencement iii lines contain a information clarification that we do not want to import.

The read.table() function tin exist used to import this file. The statement skip=3 is used to ignore the commencement three lines. The argument sep='\t' specifies a TAB as the variable delimiter

                                          # Import INSECT.TXT file using read.tabular array role (TAB delimited)                                            # skipping the first three lines (skip=three)                            insect                =                read.table('INSECT.TXT',                header=T,                skip=                3,                sep=                '                \t                ')

The MALIN_HEAD.TXT as well contains TAB delimited information. Here is the read.table() command to read in this file

                                          # Import MALIN_HEAD.TXT file using read.table role (TAB delimited)                            rainfall                =                read.tabular array('MALIN_HEAD.TXT',                header=T,                sep=                '                \t                ')

Blank space delimited files

The BEEKEEPER.TXT data set uses white space to circumscribe the variables. The start half-dozen lines of the file contain a description of the data

Using read.tabular array() with the argument sep='' volition translate any space equally a variable delimiter.

                                          # Import Apiculturist.TXT file using read.table office (white space delimited)                                            # skipping the first 6 lines (skip=6)                            bees                =                read.table('BEEKEEPER.TXT',                header=T,                skip=                6,                sep=                '')

Summary of import commands

Type of text file	R Command
Comma delimited (.CSV)	`read.table(<filename>, header=T, sep=',')`
TAB delimited (.TXT)	`read.table(<filename>, header=T, sep='\t')`
Blank space (.TXT)	`read.table(<filename>, header=T, sep='')`

                                          # Comma separated values                            wolf                =                read.tabular array('WOLF.CSV',                header=                True,                sep=                ',')              human                =                read.table('Meridian.CSV',                header=                TRUE,                sep=                ',')                                            # TAB delimited values                            insect                =                read.table('INSECT.TXT',                header=T,                skip=                three,                sep=                '                \t                ')              rainfall                =                read.table('MALIN_HEAD.TXT',                header=T,                sep=                '                \t                ')                                            # White space delimited values                            bees                =                read.table('Beekeeper.TXT',                header=T,                skip=                6,                sep=                '')

Importing data using RStudio

RStudio has its own data import functionality. To use this you will need to install the R packet readr. For more than inofmration about this see RStudio'due south guide: https://support.rstudio.com/hc/en-united states of america/articles/218611977-Importing-Data-with-RStudio

Video Tutorial: Importing a CSV file into R using RStudio'south GUI (3 mins 13 secs)

Importing data using RStudio will relieve the information as a modified data frame, called a tibble (tibbles are briefly discussed higher up).

Importing using `fread()`

fread() is a powerful data import function that is similar to read.table() merely faster. It is role of the data.table package, which you will need to install.

You should just accept to give fread() the name of the file yous want to import, and fread() will try to work out the appropriate fashion to import the data. Attempt some examples and compare the the examples to a higher place

                                          # ******************************************                                            # Other packages for importing information --------                                            # The data.table package                                                          library(data.table)                # Load the data.table bundle                                                          # Import a CSV file                            wolf2                =                fread('WOLF.CSV')                            human2                =                fread('HEIGHT.CSV')                                            # Import TAB delimited file                            insect2                =                fread('INSECT.TXT')              rainfall2                =                fread('MALIN_HEAD.TXT')                                                          # Import white space delimited file                            bees2                =                fread('Apiculturist.TXT')

The fread() command is simpler to use because it tries to guess the format of the data in the file.

Summary of the topics covered

Organizing your files on your reckoner
All-time exercise for formatting data
Reading in spreadsheet information
Data frames

How to Read Text File in Rstudio

Importing Data into R

A tutorial nearly data assay using R

How to Read this Tutorial

Objectives

Organise yourself!

Your first R script

Writing clear R scripts

Information Workflow

Format your information (tidy data)

Six guidelines for tidy information

Poorly vs well formatted data

Data frames

Tibbles

Importing spreadsheet data

Converting data to a CSV text file

Viewing text files

Information import examples

Comma delimited files (CSV files)

TAB delimited files (TXT files)

Blank space delimited files

Summary of import commands

Importing data using RStudio

Importing using `fread()`

Summary of the topics covered

Further Reading

0 Response to "How to Read Text File in Rstudio"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel

How to Read Text File in Rstudio

How to Read this Tutorial

Objectives

Organise yourself!

Your first R script

Writing clear R scripts

Information Workflow

Format your information (tidy data)

Six guidelines for tidy information

Poorly vs well formatted data

Data frames

Tibbles

Importing spreadsheet data

Converting data to a CSV text file

Viewing text files

Information import examples

Comma delimited files (CSV files)

TAB delimited files (TXT files)

Blank space delimited files

Summary of import commands

Importing data using RStudio

Importing using fread()

Summary of the topics covered

Further Reading

0 Response to "How to Read Text File in Rstudio"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel

Importing using `fread()`