data.table Examples
Author: | Mitch Richling |
Updated: | 2022-06-04 16:17:46 |
Copyright 2020-2021 Mitch Richling. All rights reserved.
Table of Contents
1. Metadata
The home for this HTML file is: https://richmit.github.io/ex-R/dataTable.html
Files related to this document may be found on github: https://github.com/richmit/ex-R
Directory contents:
src |
- | The org-mode file that generated this HTML document |
docs |
- | This html document |
data |
- | Data files |
tangled |
- | Tangled R code from this document |
2. Load the library
library(data.table)
3. Example Data
mtcars
mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
4. data.frame to data.table
One can directly coerce a data.frame into a data.table; however, row names will be lost. We can do something like this to keep them:
mtcarsDT <- data.table(mtcars, names=rownames(mtcars))
mtcarsDT
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag 3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Datsun 710 4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive 5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout 6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant 7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360 8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 240D 9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 230 10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280 11: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C 12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE 13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL 14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC 15: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood 16: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental 17: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial 18: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128 19: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic 20: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla 21: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Toyota Corona 22: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger 23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin 24: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28 25: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird 26: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Fiat X1-9 27: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Porsche 914-2 28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Lotus Europa 29: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L 30: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino 31: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Maserati Bora 32: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Volvo 142E mpg cyl disp hp drat wt qsec vs am gear carb names
5. Indexing
data.tables are indexed with three arguments (as in: a[1,2,3]
).
6. Indexing: First Argument
In general the first index argument is used to select rows.
6.1. An integer (row number)
mtcarsDT[2,]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 21 6 160 110 3.9 2.875 17.02 0 1 4 4 Mazda RX4 Wag
Note that when we only have one index, we don't need the comma!!
mtcarsDT[2]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 21 6 160 110 3.9 2.875 17.02 0 1 4 4 Mazda RX4 Wag
6.2. Numeric Range (row numbers)
Get rows 2 through 5
mtcarsDT[2:5]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag 2: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Datsun 710 3: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive 4: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
6.3. Boolean (select matching rows)
Get rows where cyl is equal to 6
mtcarsDT[cyl==6]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag 3: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive 4: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant 5: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280 6: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C 7: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino
6.4. Strings
6.4.1. Key column Indexing
If you have a "key column" set, you can index with strings – note that this will change the order of the table to store on the key.
setkey(mtcarsDT,names)
mtcarsDT["Valiant"]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1 Valiant
6.4.2. Key column need not be unique
mtcarsDT$cylNames <- paste("cyl", mtcarsDT$cyl, sep='=') setkey(mtcarsDT,cylNames) mtcarsDT["cyl=6"]
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames 1: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino cyl=6 2: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive cyl=6 3: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 cyl=6 4: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag cyl=6 5: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280 cyl=6 6: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C cyl=6 7: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant cyl=6
6.4.3. Get just one match
mtcarsDT["cyl=6",mult="first"]
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames 1: 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6 Ferrari Dino cyl=6
7. Indexing: The Second Argument
7.1. The data.frame way
Grab column "cyl" the data.frame way.
mtcarsDT$cyl
[1] 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8
7.2. Column Name Expression
The second index of a data.table is an expression on the rows selected by the first index.
Here we just have an expresion with a single column name. As this will evaluate to the cyl column, that is what is returned.
mtcarsDT[,cyl]
[1] 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8
Here we just have an arithmatic expresion in two column names.
mtcarsDT[,disp/cyl]
[1] 27.00000 19.67500 19.75000 18.92500 23.77500 35.20000 36.67500 30.07500 17.77500 30.02500 30.25000 24.16667 43.00000 26.66667 26.66667 27.93333 27.93333 37.50000 38.00000 59.00000 43.75000 [22] 55.00000 39.75000 45.00000 43.87500 45.00000 57.50000 37.62500 34.47500 34.47500 34.47500 50.00000
7.3. Assignment
We can create new columns by assigning an expression in the second argument to a new name.
mtcarsDT[,cylTimesGear:=cyl*gear] mtcarsDT
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames cylTimesGear 1: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Datsun 710 cyl=4 16 2: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128 cyl=4 16 3: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Fiat X1-9 cyl=4 16 4: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic cyl=4 16 5: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Lotus Europa cyl=4 20 6: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 230 cyl=4 16 7: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 240D cyl=4 16 8: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Porsche 914-2 cyl=4 20 9: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla cyl=4 16 10: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Toyota Corona cyl=4 12 11: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Volvo 142E cyl=4 16 12: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino cyl=6 30 13: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive cyl=6 18 14: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 cyl=6 24 15: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag cyl=6 24 16: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280 cyl=6 24 17: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C cyl=6 24 18: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant cyl=6 18 19: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin cyl=8 24 20: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood cyl=8 24 21: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28 cyl=8 24 22: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial cyl=8 24 23: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger cyl=8 24 24: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360 cyl=8 24 25: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L cyl=8 40 26: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout cyl=8 24 27: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental cyl=8 24 28: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Maserati Bora cyl=8 40 29: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE cyl=8 24 30: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL cyl=8 24 31: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC cyl=8 24 32: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird cyl=8 24 mpg cyl disp hp drat wt qsec vs am gear carb names cylNames cylTimesGear
7.4. List
When the second index is a list, then a new data.table will be returned instead of a simple vector. For example we can extract a new data.table with three columns (two of which we rename)
mtcarsDT[,list(cylinder=cyl, weight=wt, gear)]
cylinder weight gear 1: 4 2.320 4 2: 4 2.200 4 3: 4 1.935 4 4: 4 1.615 4 5: 4 1.513 5 6: 4 3.150 4 7: 4 3.190 4 8: 4 2.140 5 9: 4 1.835 4 10: 4 2.465 3 11: 4 2.780 4 12: 6 2.770 5 13: 6 3.215 3 14: 6 2.620 4 15: 6 2.875 4 16: 6 3.440 4 17: 6 3.440 4 18: 6 3.460 3 19: 8 3.435 3 20: 8 5.250 3 21: 8 3.840 3 22: 8 5.345 3 23: 8 3.520 3 24: 8 3.570 3 25: 8 3.170 5 26: 8 3.440 3 27: 8 5.424 3 28: 8 3.570 5 29: 8 4.070 3 30: 8 3.730 3 31: 8 3.780 3 32: 8 3.845 3 cylinder weight gear
7.5. Aggregation
That second expression can do more than transform columns into new columns; it aggregate things.
mtcarsDT[,sum(wt)]
[1] 102.952
As before, if the second expression is a list we get a new data.table. If the second expression computes aggregates, then we get more than one aggregation.
mtcarsDT[,list(daSum=sum(wt), daSd=sd(wt))]
daSum daSd 1: 102.952 0.9784574
8. Indexing: The third argument
8.1. Aggregation by factors
The third argument is used to group data.
mtcarsDT[,sum(wt),cyl]
cyl V1 1: 4 25.143 2: 6 21.820 3: 8 55.989
The groups can contain more than one factor if we use a list. We can also name the aggregate column if we put that in a list.
mtcarsDT[,list(sumWt=sum(wt)),list(cyl, gear)]
cyl gear sumWt 1: 4 4 19.025 2: 4 5 3.653 3: 4 3 2.465 4: 6 5 2.770 5: 6 3 6.675 6: 6 4 12.375 7: 8 3 49.249 8: 8 5 6.740
8.2. Aggregation and new columns
We can aggregate by groups and put it back into the data.table!!
mtcarsDT[,sumWtByCyl:=sum(wt),cyl] mtcarsDT
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames cylTimesGear sumWtByCyl 1: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Datsun 710 cyl=4 16 25.143 2: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128 cyl=4 16 25.143 3: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Fiat X1-9 cyl=4 16 25.143 4: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic cyl=4 16 25.143 5: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Lotus Europa cyl=4 20 25.143 6: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 230 cyl=4 16 25.143 7: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 240D cyl=4 16 25.143 8: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Porsche 914-2 cyl=4 20 25.143 9: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla cyl=4 16 25.143 10: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Toyota Corona cyl=4 12 25.143 11: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Volvo 142E cyl=4 16 25.143 12: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino cyl=6 30 21.820 13: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive cyl=6 18 21.820 14: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 cyl=6 24 21.820 15: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag cyl=6 24 21.820 16: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280 cyl=6 24 21.820 17: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C cyl=6 24 21.820 18: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant cyl=6 18 21.820 19: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin cyl=8 24 55.989 20: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood cyl=8 24 55.989 21: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28 cyl=8 24 55.989 22: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial cyl=8 24 55.989 23: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger cyl=8 24 55.989 24: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360 cyl=8 24 55.989 25: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L cyl=8 40 55.989 26: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout cyl=8 24 55.989 27: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental cyl=8 24 55.989 28: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Maserati Bora cyl=8 40 55.989 29: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE cyl=8 24 55.989 30: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL cyl=8 24 55.989 31: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC cyl=8 24 55.989 32: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird cyl=8 24 55.989 mpg cyl disp hp drat wt qsec vs am gear carb names cylNames cylTimesGear sumWtByCyl
9. Joining
Merge is much like with data.frames. One nice feature is that key columns will be used for a merge automatically if they are set.
crbn <- data.table(carb=c(1,2,3), carbs=c("one", "two", "three")) setkey(mtcarsDT, carb) setkey(crbn, carb) merge(mtcarsDT, crbn, all.x=TRUE)
carb mpg cyl disp hp drat wt qsec vs am gear names cylNames cylTimesGear sumWtByCyl carbs 1: 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 Datsun 710 cyl=4 16 25.143 one 2: 1 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 Fiat 128 cyl=4 16 25.143 one 3: 1 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 Fiat X1-9 cyl=4 16 25.143 one 4: 1 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 Toyota Corolla cyl=4 16 25.143 one 5: 1 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 Toyota Corona cyl=4 12 25.143 one 6: 1 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 Hornet 4 Drive cyl=6 18 21.820 one 7: 1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 Valiant cyl=6 18 21.820 one 8: 2 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 Honda Civic cyl=4 16 25.143 two 9: 2 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 Lotus Europa cyl=4 20 25.143 two 10: 2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 Merc 230 cyl=4 16 25.143 two 11: 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 Merc 240D cyl=4 16 25.143 two 12: 2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 Porsche 914-2 cyl=4 20 25.143 two 13: 2 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 Volvo 142E cyl=4 16 25.143 two 14: 2 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 AMC Javelin cyl=8 24 55.989 two 15: 2 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 Dodge Challenger cyl=8 24 55.989 two 16: 2 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 Hornet Sportabout cyl=8 24 55.989 two 17: 2 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 Pontiac Firebird cyl=8 24 55.989 two 18: 3 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 Merc 450SE cyl=8 24 55.989 three 19: 3 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 Merc 450SL cyl=8 24 55.989 three 20: 3 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 Merc 450SLC cyl=8 24 55.989 three 21: 4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 Mazda RX4 cyl=6 24 21.820 <NA> 22: 4 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 Mazda RX4 Wag cyl=6 24 21.820 <NA> 23: 4 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 Merc 280 cyl=6 24 21.820 <NA> 24: 4 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 Merc 280C cyl=6 24 21.820 <NA> 25: 4 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 Cadillac Fleetwood cyl=8 24 55.989 <NA> 26: 4 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 Camaro Z28 cyl=8 24 55.989 <NA> 27: 4 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 Chrysler Imperial cyl=8 24 55.989 <NA> 28: 4 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 Duster 360 cyl=8 24 55.989 <NA> 29: 4 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 Ford Pantera L cyl=8 40 55.989 <NA> 30: 4 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 Lincoln Continental cyl=8 24 55.989 <NA> 31: 6 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 Ferrari Dino cyl=6 30 21.820 <NA> 32: 8 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 Maserati Bora cyl=8 40 55.989 <NA> carb mpg cyl disp hp drat wt qsec vs am gear names cylNames cylTimesGear sumWtByCyl carbs
10. Reading data from the outside world
10.1. Read a CSV or TSV file
Note this is much faster than read.csv
and read.table
.
df1 <- fread("../data/dataTable_f1.csv") df1
name age weight 1: babar 25 900 2: grumpy 2 3
10.2. Compressed files are decompressed automatically
df2 <- fread("../data/dataTable_f2.csv.gz") df2
name age weight 1: kong 12 700 2: george 7 15
10.3. Reading data from a URL
df1u <- fread('https://github.com/richmit/ex-R/raw/master/data/dataTable_f2.csv.gz') df1u
name age weight 1: kong 12 700 2: george 7 15
10.4. Read output from a command (in this case compressing with gunzip and filtering with awk)
df2 <- fread("gunzip < ../data/dataTable_f2.csv.gz | awk -F, 'NR==1 || $3<100 { print $0 }'") df2
Empty data.table (0 rows and 3 cols): name,age,weight
10.5. Read a colon (:) separated file with extra whitespace and no column names
df3 <- fread('dataTable_f3.txt', sep=':', header=FALSE, strip.white=TRUE, col.names=c('name', 'age', 'weight')) df3
Error in fread("dataTable_f3.txt", sep = ":", header = FALSE, strip.white = TRUE, : File 'dataTable_f3.txt' does not exist or is non-readable. getwd()=='c:/Users/richmit/Documents/world/my_prog/learn/R/src' Error: object 'df3' not found