data.table Examples
| Author: | Mitch Richling |
| Updated: | 2022-06-04 16:17:46 |
Copyright 2020-2021 Mitch Richling. All rights reserved.
Table of Contents
1. Metadata
The home for this HTML file is: https://richmit.github.io/ex-R/dataTable.html
Files related to this document may be found on github: https://github.com/richmit/ex-R
Directory contents:
src |
- | The org-mode file that generated this HTML document |
docs |
- | This html document |
data |
- | Data files |
tangled |
- | Tangled R code from this document |
2. Load the library
library(data.table)
3. Example Data
mtcars
mpg cyl disp hp drat wt qsec vs am gear carb Mazda RX4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 Wag 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Hornet 4 Drive 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Valiant 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 280 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280C 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Dodge Challenger 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 AMC Javelin 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
4. data.frame to data.table
One can directly coerce a data.frame into a data.table; however, row names will be lost. We can do something like this to keep them:
mtcarsDT <- data.table(mtcars, names=rownames(mtcars))
mtcarsDT
mpg cyl disp hp drat wt qsec vs am gear carb names
1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4
2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag
3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Datsun 710
4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive
5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant
7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360
8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 240D
9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 230
10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280
11: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C
12: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE
13: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL
14: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC
15: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood
16: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental
17: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial
18: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128
19: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic
20: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla
21: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Toyota Corona
22: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger
23: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin
24: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28
25: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird
26: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Fiat X1-9
27: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Porsche 914-2
28: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Lotus Europa
29: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L
30: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino
31: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Maserati Bora
32: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Volvo 142E
mpg cyl disp hp drat wt qsec vs am gear carb names
5. Indexing
data.tables are indexed with three arguments (as in: a[1,2,3]).
6. Indexing: First Argument
In general the first index argument is used to select rows.
6.1. An integer (row number)
mtcarsDT[2,]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 21 6 160 110 3.9 2.875 17.02 0 1 4 4 Mazda RX4 Wag
Note that when we only have one index, we don't need the comma!!
mtcarsDT[2]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 21 6 160 110 3.9 2.875 17.02 0 1 4 4 Mazda RX4 Wag
6.2. Numeric Range (row numbers)
Get rows 2 through 5
mtcarsDT[2:5]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag 2: 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1 Datsun 710 3: 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive 4: 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout
6.3. Boolean (select matching rows)
Get rows where cyl is equal to 6
mtcarsDT[cyl==6]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag 3: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive 4: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant 5: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280 6: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C 7: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino
6.4. Strings
6.4.1. Key column Indexing
If you have a "key column" set, you can index with strings – note that this will change the order of the table to store on the key.
setkey(mtcarsDT,names)
mtcarsDT["Valiant"]
mpg cyl disp hp drat wt qsec vs am gear carb names 1: 18.1 6 225 105 2.76 3.46 20.22 1 0 3 1 Valiant
6.4.2. Key column need not be unique
mtcarsDT$cylNames <- paste("cyl", mtcarsDT$cyl, sep='=') setkey(mtcarsDT,cylNames) mtcarsDT["cyl=6"]
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames 1: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino cyl=6 2: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive cyl=6 3: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 cyl=6 4: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag cyl=6 5: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280 cyl=6 6: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C cyl=6 7: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant cyl=6
6.4.3. Get just one match
mtcarsDT["cyl=6",mult="first"]
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames 1: 19.7 6 145 175 3.62 2.77 15.5 0 1 5 6 Ferrari Dino cyl=6
7. Indexing: The Second Argument
7.1. The data.frame way
Grab column "cyl" the data.frame way.
mtcarsDT$cyl
[1] 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8
7.2. Column Name Expression
The second index of a data.table is an expression on the rows selected by the first index.
Here we just have an expresion with a single column name. As this will evaluate to the cyl column, that is what is returned.
mtcarsDT[,cyl]
[1] 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8
Here we just have an arithmatic expresion in two column names.
mtcarsDT[,disp/cyl]
[1] 27.00000 19.67500 19.75000 18.92500 23.77500 35.20000 36.67500 30.07500 17.77500 30.02500 30.25000 24.16667 43.00000 26.66667 26.66667 27.93333 27.93333 37.50000 38.00000 59.00000 43.75000 [22] 55.00000 39.75000 45.00000 43.87500 45.00000 57.50000 37.62500 34.47500 34.47500 34.47500 50.00000
7.3. Assignment
We can create new columns by assigning an expression in the second argument to a new name.
mtcarsDT[,cylTimesGear:=cyl*gear] mtcarsDT
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames cylTimesGear
1: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Datsun 710 cyl=4 16
2: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128 cyl=4 16
3: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Fiat X1-9 cyl=4 16
4: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic cyl=4 16
5: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Lotus Europa cyl=4 20
6: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 230 cyl=4 16
7: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 240D cyl=4 16
8: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Porsche 914-2 cyl=4 20
9: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla cyl=4 16
10: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Toyota Corona cyl=4 12
11: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Volvo 142E cyl=4 16
12: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino cyl=6 30
13: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive cyl=6 18
14: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 cyl=6 24
15: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag cyl=6 24
16: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280 cyl=6 24
17: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C cyl=6 24
18: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant cyl=6 18
19: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin cyl=8 24
20: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood cyl=8 24
21: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28 cyl=8 24
22: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial cyl=8 24
23: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger cyl=8 24
24: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360 cyl=8 24
25: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L cyl=8 40
26: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout cyl=8 24
27: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental cyl=8 24
28: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Maserati Bora cyl=8 40
29: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE cyl=8 24
30: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL cyl=8 24
31: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC cyl=8 24
32: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird cyl=8 24
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames cylTimesGear
7.4. List
When the second index is a list, then a new data.table will be returned instead of a simple vector. For example we can extract a new data.table with three columns (two of which we rename)
mtcarsDT[,list(cylinder=cyl, weight=wt, gear)]
cylinder weight gear
1: 4 2.320 4
2: 4 2.200 4
3: 4 1.935 4
4: 4 1.615 4
5: 4 1.513 5
6: 4 3.150 4
7: 4 3.190 4
8: 4 2.140 5
9: 4 1.835 4
10: 4 2.465 3
11: 4 2.780 4
12: 6 2.770 5
13: 6 3.215 3
14: 6 2.620 4
15: 6 2.875 4
16: 6 3.440 4
17: 6 3.440 4
18: 6 3.460 3
19: 8 3.435 3
20: 8 5.250 3
21: 8 3.840 3
22: 8 5.345 3
23: 8 3.520 3
24: 8 3.570 3
25: 8 3.170 5
26: 8 3.440 3
27: 8 5.424 3
28: 8 3.570 5
29: 8 4.070 3
30: 8 3.730 3
31: 8 3.780 3
32: 8 3.845 3
cylinder weight gear
7.5. Aggregation
That second expression can do more than transform columns into new columns; it aggregate things.
mtcarsDT[,sum(wt)]
[1] 102.952
As before, if the second expression is a list we get a new data.table. If the second expression computes aggregates, then we get more than one aggregation.
mtcarsDT[,list(daSum=sum(wt), daSd=sd(wt))]
daSum daSd 1: 102.952 0.9784574
8. Indexing: The third argument
8.1. Aggregation by factors
The third argument is used to group data.
mtcarsDT[,sum(wt),cyl]
cyl V1 1: 4 25.143 2: 6 21.820 3: 8 55.989
The groups can contain more than one factor if we use a list. We can also name the aggregate column if we put that in a list.
mtcarsDT[,list(sumWt=sum(wt)),list(cyl, gear)]
cyl gear sumWt 1: 4 4 19.025 2: 4 5 3.653 3: 4 3 2.465 4: 6 5 2.770 5: 6 3 6.675 6: 6 4 12.375 7: 8 3 49.249 8: 8 5 6.740
8.2. Aggregation and new columns
We can aggregate by groups and put it back into the data.table!!
mtcarsDT[,sumWtByCyl:=sum(wt),cyl] mtcarsDT
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames cylTimesGear sumWtByCyl
1: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Datsun 710 cyl=4 16 25.143
2: 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Fiat 128 cyl=4 16 25.143
3: 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Fiat X1-9 cyl=4 16 25.143
4: 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Honda Civic cyl=4 16 25.143
5: 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Lotus Europa cyl=4 20 25.143
6: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Merc 230 cyl=4 16 25.143
7: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 240D cyl=4 16 25.143
8: 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Porsche 914-2 cyl=4 20 25.143
9: 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corolla cyl=4 16 25.143
10: 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Toyota Corona cyl=4 12 25.143
11: 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2 Volvo 142E cyl=4 16 25.143
12: 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6 Ferrari Dino cyl=6 30 21.820
13: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1 Hornet 4 Drive cyl=6 18 21.820
14: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4 Mazda RX4 cyl=6 24 21.820
15: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4 Mazda RX4 Wag cyl=6 24 21.820
16: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4 Merc 280 cyl=6 24 21.820
17: 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4 Merc 280C cyl=6 24 21.820
18: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1 Valiant cyl=6 18 21.820
19: 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2 AMC Javelin cyl=8 24 55.989
20: 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4 Cadillac Fleetwood cyl=8 24 55.989
21: 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4 Camaro Z28 cyl=8 24 55.989
22: 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4 Chrysler Imperial cyl=8 24 55.989
23: 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2 Dodge Challenger cyl=8 24 55.989
24: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4 Duster 360 cyl=8 24 55.989
25: 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4 Ford Pantera L cyl=8 40 55.989
26: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2 Hornet Sportabout cyl=8 24 55.989
27: 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4 Lincoln Continental cyl=8 24 55.989
28: 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8 Maserati Bora cyl=8 40 55.989
29: 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3 Merc 450SE cyl=8 24 55.989
30: 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3 Merc 450SL cyl=8 24 55.989
31: 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3 Merc 450SLC cyl=8 24 55.989
32: 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2 Pontiac Firebird cyl=8 24 55.989
mpg cyl disp hp drat wt qsec vs am gear carb names cylNames cylTimesGear sumWtByCyl
9. Joining
Merge is much like with data.frames. One nice feature is that key columns will be used for a merge automatically if they are set.
crbn <- data.table(carb=c(1,2,3), carbs=c("one", "two", "three")) setkey(mtcarsDT, carb) setkey(crbn, carb) merge(mtcarsDT, crbn, all.x=TRUE)
carb mpg cyl disp hp drat wt qsec vs am gear names cylNames cylTimesGear sumWtByCyl carbs
1: 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 Datsun 710 cyl=4 16 25.143 one
2: 1 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 Fiat 128 cyl=4 16 25.143 one
3: 1 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 Fiat X1-9 cyl=4 16 25.143 one
4: 1 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 Toyota Corolla cyl=4 16 25.143 one
5: 1 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 Toyota Corona cyl=4 12 25.143 one
6: 1 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 Hornet 4 Drive cyl=6 18 21.820 one
7: 1 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 Valiant cyl=6 18 21.820 one
8: 2 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 Honda Civic cyl=4 16 25.143 two
9: 2 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 Lotus Europa cyl=4 20 25.143 two
10: 2 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 Merc 230 cyl=4 16 25.143 two
11: 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 Merc 240D cyl=4 16 25.143 two
12: 2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 Porsche 914-2 cyl=4 20 25.143 two
13: 2 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 Volvo 142E cyl=4 16 25.143 two
14: 2 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 AMC Javelin cyl=8 24 55.989 two
15: 2 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 Dodge Challenger cyl=8 24 55.989 two
16: 2 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 Hornet Sportabout cyl=8 24 55.989 two
17: 2 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 Pontiac Firebird cyl=8 24 55.989 two
18: 3 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 Merc 450SE cyl=8 24 55.989 three
19: 3 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 Merc 450SL cyl=8 24 55.989 three
20: 3 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 Merc 450SLC cyl=8 24 55.989 three
21: 4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 Mazda RX4 cyl=6 24 21.820 <NA>
22: 4 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 Mazda RX4 Wag cyl=6 24 21.820 <NA>
23: 4 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 Merc 280 cyl=6 24 21.820 <NA>
24: 4 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 Merc 280C cyl=6 24 21.820 <NA>
25: 4 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 Cadillac Fleetwood cyl=8 24 55.989 <NA>
26: 4 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 Camaro Z28 cyl=8 24 55.989 <NA>
27: 4 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 Chrysler Imperial cyl=8 24 55.989 <NA>
28: 4 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 Duster 360 cyl=8 24 55.989 <NA>
29: 4 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 Ford Pantera L cyl=8 40 55.989 <NA>
30: 4 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 Lincoln Continental cyl=8 24 55.989 <NA>
31: 6 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 Ferrari Dino cyl=6 30 21.820 <NA>
32: 8 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 Maserati Bora cyl=8 40 55.989 <NA>
carb mpg cyl disp hp drat wt qsec vs am gear names cylNames cylTimesGear sumWtByCyl carbs
10. Reading data from the outside world
10.1. Read a CSV or TSV file
Note this is much faster than read.csv and read.table.
df1 <- fread("../data/dataTable_f1.csv") df1
name age weight 1: babar 25 900 2: grumpy 2 3
10.2. Compressed files are decompressed automatically
df2 <- fread("../data/dataTable_f2.csv.gz") df2
name age weight 1: kong 12 700 2: george 7 15
10.3. Reading data from a URL
df1u <- fread('https://github.com/richmit/ex-R/raw/master/data/dataTable_f2.csv.gz') df1u
name age weight 1: kong 12 700 2: george 7 15
10.4. Read output from a command (in this case compressing with gunzip and filtering with awk)
df2 <- fread("gunzip < ../data/dataTable_f2.csv.gz | awk -F, 'NR==1 || $3<100 { print $0 }'") df2
Empty data.table (0 rows and 3 cols): name,age,weight
10.5. Read a colon (:) separated file with extra whitespace and no column names
df3 <- fread('dataTable_f3.txt', sep=':', header=FALSE, strip.white=TRUE, col.names=c('name', 'age', 'weight')) df3
Error in fread("dataTable_f3.txt", sep = ":", header = FALSE, strip.white = TRUE, :
File 'dataTable_f3.txt' does not exist or is non-readable. getwd()=='c:/Users/richmit/Documents/world/my_prog/learn/R/src'
Error: object 'df3' not found