UP | HOME

data.table Examples

Author: Mitch Richling
Updated: 2022-06-04 16:17:46

Copyright 2020-2021 Mitch Richling. All rights reserved.

Table of Contents

1. Metadata

The home for this HTML file is: https://richmit.github.io/ex-R/dataTable.html

Files related to this document may be found on github: https://github.com/richmit/ex-R

Directory contents:

src - The org-mode file that generated this HTML document
docs - This html document
data - Data files
tangled - Tangled R code from this document

2. Load the library

library(data.table)

3. Example Data

mtcars
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2

4. data.frame to data.table

One can directly coerce a data.frame into a data.table; however, row names will be lost. We can do something like this to keep them:

mtcarsDT <- data.table(mtcars, names=rownames(mtcars))
mtcarsDT
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb               names
 1: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4           Mazda RX4
 2: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       Mazda RX4 Wag
 3: 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1          Datsun 710
 4: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1      Hornet 4 Drive
 5: 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2   Hornet Sportabout
 6: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1             Valiant
 7: 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4          Duster 360
 8: 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2           Merc 240D
 9: 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2            Merc 230
10: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4            Merc 280
11: 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4           Merc 280C
12: 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3          Merc 450SE
13: 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3          Merc 450SL
14: 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3         Merc 450SLC
15: 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  Cadillac Fleetwood
16: 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 Lincoln Continental
17: 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4   Chrysler Imperial
18: 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1            Fiat 128
19: 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2         Honda Civic
20: 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1      Toyota Corolla
21: 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1       Toyota Corona
22: 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2    Dodge Challenger
23: 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2         AMC Javelin
24: 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4          Camaro Z28
25: 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2    Pontiac Firebird
26: 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1           Fiat X1-9
27: 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2       Porsche 914-2
28: 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2        Lotus Europa
29: 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4      Ford Pantera L
30: 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6        Ferrari Dino
31: 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8       Maserati Bora
32: 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2          Volvo 142E
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb               names

5. Indexing

data.tables are indexed with three arguments (as in: a[1,2,3]).

6. Indexing: First Argument

In general the first index argument is used to select rows.

6.1. An integer (row number)

mtcarsDT[2,]
   mpg cyl disp  hp drat    wt  qsec vs am gear carb         names
1:  21   6  160 110  3.9 2.875 17.02  0  1    4    4 Mazda RX4 Wag

Note that when we only have one index, we don't need the comma!!

mtcarsDT[2]
   mpg cyl disp  hp drat    wt  qsec vs am gear carb         names
1:  21   6  160 110  3.9 2.875 17.02  0  1    4    4 Mazda RX4 Wag

6.2. Numeric Range (row numbers)

Get rows 2 through 5

mtcarsDT[2:5]
    mpg cyl disp  hp drat    wt  qsec vs am gear carb             names
1: 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4     Mazda RX4 Wag
2: 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1        Datsun 710
3: 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1    Hornet 4 Drive
4: 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2 Hornet Sportabout

6.3. Boolean (select matching rows)

Get rows where cyl is equal to 6

mtcarsDT[cyl==6]
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb          names
1: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4      Mazda RX4
2: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda RX4 Wag
3: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 Hornet 4 Drive
4: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1        Valiant
5: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4       Merc 280
6: 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4      Merc 280C
7: 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6   Ferrari Dino

6.4. Strings

6.4.1. Key column Indexing

If you have a "key column" set, you can index with strings – note that this will change the order of the table to store on the key.

setkey(mtcarsDT,names)
mtcarsDT["Valiant"]
    mpg cyl disp  hp drat   wt  qsec vs am gear carb   names
1: 18.1   6  225 105 2.76 3.46 20.22  1  0    3    1 Valiant

6.4.2. Key column need not be unique

mtcarsDT$cylNames <- paste("cyl", mtcarsDT$cyl, sep='=')
setkey(mtcarsDT,cylNames)
mtcarsDT["cyl=6"]
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb          names cylNames
1: 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6   Ferrari Dino    cyl=6
2: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1 Hornet 4 Drive    cyl=6
3: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4      Mazda RX4    cyl=6
4: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4  Mazda RX4 Wag    cyl=6
5: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4       Merc 280    cyl=6
6: 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4      Merc 280C    cyl=6
7: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1        Valiant    cyl=6

6.4.3. Get just one match

mtcarsDT["cyl=6",mult="first"]
    mpg cyl disp  hp drat   wt qsec vs am gear carb        names cylNames
1: 19.7   6  145 175 3.62 2.77 15.5  0  1    5    6 Ferrari Dino    cyl=6

7. Indexing: The Second Argument

7.1. The data.frame way

Grab column "cyl" the data.frame way.

mtcarsDT$cyl
[1] 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8

7.2. Column Name Expression

The second index of a data.table is an expression on the rows selected by the first index.

Here we just have an expresion with a single column name. As this will evaluate to the cyl column, that is what is returned.

mtcarsDT[,cyl]
[1] 4 4 4 4 4 4 4 4 4 4 4 6 6 6 6 6 6 6 8 8 8 8 8 8 8 8 8 8 8 8 8 8

Here we just have an arithmatic expresion in two column names.

mtcarsDT[,disp/cyl]
 [1] 27.00000 19.67500 19.75000 18.92500 23.77500 35.20000 36.67500 30.07500 17.77500 30.02500 30.25000 24.16667 43.00000 26.66667 26.66667 27.93333 27.93333 37.50000 38.00000 59.00000 43.75000
[22] 55.00000 39.75000 45.00000 43.87500 45.00000 57.50000 37.62500 34.47500 34.47500 34.47500 50.00000

7.3. Assignment

We can create new columns by assigning an expression in the second argument to a new name.

mtcarsDT[,cylTimesGear:=cyl*gear]
mtcarsDT
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb               names cylNames cylTimesGear
 1: 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1          Datsun 710    cyl=4           16
 2: 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1            Fiat 128    cyl=4           16
 3: 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1           Fiat X1-9    cyl=4           16
 4: 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2         Honda Civic    cyl=4           16
 5: 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2        Lotus Europa    cyl=4           20
 6: 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2            Merc 230    cyl=4           16
 7: 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2           Merc 240D    cyl=4           16
 8: 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2       Porsche 914-2    cyl=4           20
 9: 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1      Toyota Corolla    cyl=4           16
10: 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1       Toyota Corona    cyl=4           12
11: 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2          Volvo 142E    cyl=4           16
12: 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6        Ferrari Dino    cyl=6           30
13: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1      Hornet 4 Drive    cyl=6           18
14: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4           Mazda RX4    cyl=6           24
15: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       Mazda RX4 Wag    cyl=6           24
16: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4            Merc 280    cyl=6           24
17: 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4           Merc 280C    cyl=6           24
18: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1             Valiant    cyl=6           18
19: 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2         AMC Javelin    cyl=8           24
20: 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  Cadillac Fleetwood    cyl=8           24
21: 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4          Camaro Z28    cyl=8           24
22: 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4   Chrysler Imperial    cyl=8           24
23: 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2    Dodge Challenger    cyl=8           24
24: 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4          Duster 360    cyl=8           24
25: 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4      Ford Pantera L    cyl=8           40
26: 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2   Hornet Sportabout    cyl=8           24
27: 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 Lincoln Continental    cyl=8           24
28: 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8       Maserati Bora    cyl=8           40
29: 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3          Merc 450SE    cyl=8           24
30: 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3          Merc 450SL    cyl=8           24
31: 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3         Merc 450SLC    cyl=8           24
32: 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2    Pontiac Firebird    cyl=8           24
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb               names cylNames cylTimesGear

7.4. List

When the second index is a list, then a new data.table will be returned instead of a simple vector. For example we can extract a new data.table with three columns (two of which we rename)

mtcarsDT[,list(cylinder=cyl, weight=wt, gear)]
    cylinder weight gear
 1:        4  2.320    4
 2:        4  2.200    4
 3:        4  1.935    4
 4:        4  1.615    4
 5:        4  1.513    5
 6:        4  3.150    4
 7:        4  3.190    4
 8:        4  2.140    5
 9:        4  1.835    4
10:        4  2.465    3
11:        4  2.780    4
12:        6  2.770    5
13:        6  3.215    3
14:        6  2.620    4
15:        6  2.875    4
16:        6  3.440    4
17:        6  3.440    4
18:        6  3.460    3
19:        8  3.435    3
20:        8  5.250    3
21:        8  3.840    3
22:        8  5.345    3
23:        8  3.520    3
24:        8  3.570    3
25:        8  3.170    5
26:        8  3.440    3
27:        8  5.424    3
28:        8  3.570    5
29:        8  4.070    3
30:        8  3.730    3
31:        8  3.780    3
32:        8  3.845    3
    cylinder weight gear

7.5. Aggregation

That second expression can do more than transform columns into new columns; it aggregate things.

mtcarsDT[,sum(wt)]
[1] 102.952

As before, if the second expression is a list we get a new data.table. If the second expression computes aggregates, then we get more than one aggregation.

mtcarsDT[,list(daSum=sum(wt), daSd=sd(wt))]
     daSum      daSd
1: 102.952 0.9784574

8. Indexing: The third argument

8.1. Aggregation by factors

The third argument is used to group data.

mtcarsDT[,sum(wt),cyl]
   cyl     V1
1:   4 25.143
2:   6 21.820
3:   8 55.989

The groups can contain more than one factor if we use a list. We can also name the aggregate column if we put that in a list.

mtcarsDT[,list(sumWt=sum(wt)),list(cyl, gear)]
   cyl gear  sumWt
1:   4    4 19.025
2:   4    5  3.653
3:   4    3  2.465
4:   6    5  2.770
5:   6    3  6.675
6:   6    4 12.375
7:   8    3 49.249
8:   8    5  6.740

8.2. Aggregation and new columns

We can aggregate by groups and put it back into the data.table!!

mtcarsDT[,sumWtByCyl:=sum(wt),cyl]
mtcarsDT
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb               names cylNames cylTimesGear sumWtByCyl
 1: 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1          Datsun 710    cyl=4           16     25.143
 2: 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1            Fiat 128    cyl=4           16     25.143
 3: 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1           Fiat X1-9    cyl=4           16     25.143
 4: 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2         Honda Civic    cyl=4           16     25.143
 5: 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2        Lotus Europa    cyl=4           20     25.143
 6: 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2            Merc 230    cyl=4           16     25.143
 7: 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2           Merc 240D    cyl=4           16     25.143
 8: 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2       Porsche 914-2    cyl=4           20     25.143
 9: 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1      Toyota Corolla    cyl=4           16     25.143
10: 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1       Toyota Corona    cyl=4           12     25.143
11: 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2          Volvo 142E    cyl=4           16     25.143
12: 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6        Ferrari Dino    cyl=6           30     21.820
13: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1      Hornet 4 Drive    cyl=6           18     21.820
14: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4           Mazda RX4    cyl=6           24     21.820
15: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       Mazda RX4 Wag    cyl=6           24     21.820
16: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4            Merc 280    cyl=6           24     21.820
17: 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4           Merc 280C    cyl=6           24     21.820
18: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1             Valiant    cyl=6           18     21.820
19: 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2         AMC Javelin    cyl=8           24     55.989
20: 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4  Cadillac Fleetwood    cyl=8           24     55.989
21: 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4          Camaro Z28    cyl=8           24     55.989
22: 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4   Chrysler Imperial    cyl=8           24     55.989
23: 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2    Dodge Challenger    cyl=8           24     55.989
24: 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4          Duster 360    cyl=8           24     55.989
25: 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4      Ford Pantera L    cyl=8           40     55.989
26: 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2   Hornet Sportabout    cyl=8           24     55.989
27: 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4 Lincoln Continental    cyl=8           24     55.989
28: 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8       Maserati Bora    cyl=8           40     55.989
29: 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3          Merc 450SE    cyl=8           24     55.989
30: 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3          Merc 450SL    cyl=8           24     55.989
31: 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3         Merc 450SLC    cyl=8           24     55.989
32: 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2    Pontiac Firebird    cyl=8           24     55.989
     mpg cyl  disp  hp drat    wt  qsec vs am gear carb               names cylNames cylTimesGear sumWtByCyl

9. Joining

Merge is much like with data.frames. One nice feature is that key columns will be used for a merge automatically if they are set.

crbn <- data.table(carb=c(1,2,3), carbs=c("one", "two", "three"))
setkey(mtcarsDT, carb)
setkey(crbn, carb)
merge(mtcarsDT, crbn, all.x=TRUE)
    carb  mpg cyl  disp  hp drat    wt  qsec vs am gear               names cylNames cylTimesGear sumWtByCyl carbs
 1:    1 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4          Datsun 710    cyl=4           16     25.143   one
 2:    1 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4            Fiat 128    cyl=4           16     25.143   one
 3:    1 27.3   4  79.0  66 4.08 1.935 18.90  1  1    4           Fiat X1-9    cyl=4           16     25.143   one
 4:    1 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4      Toyota Corolla    cyl=4           16     25.143   one
 5:    1 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3       Toyota Corona    cyl=4           12     25.143   one
 6:    1 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3      Hornet 4 Drive    cyl=6           18     21.820   one
 7:    1 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3             Valiant    cyl=6           18     21.820   one
 8:    2 30.4   4  75.7  52 4.93 1.615 18.52  1  1    4         Honda Civic    cyl=4           16     25.143   two
 9:    2 30.4   4  95.1 113 3.77 1.513 16.90  1  1    5        Lotus Europa    cyl=4           20     25.143   two
10:    2 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4            Merc 230    cyl=4           16     25.143   two
11:    2 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4           Merc 240D    cyl=4           16     25.143   two
12:    2 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5       Porsche 914-2    cyl=4           20     25.143   two
13:    2 21.4   4 121.0 109 4.11 2.780 18.60  1  1    4          Volvo 142E    cyl=4           16     25.143   two
14:    2 15.2   8 304.0 150 3.15 3.435 17.30  0  0    3         AMC Javelin    cyl=8           24     55.989   two
15:    2 15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    Dodge Challenger    cyl=8           24     55.989   two
16:    2 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3   Hornet Sportabout    cyl=8           24     55.989   two
17:    2 19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    Pontiac Firebird    cyl=8           24     55.989   two
18:    3 16.4   8 275.8 180 3.07 4.070 17.40  0  0    3          Merc 450SE    cyl=8           24     55.989 three
19:    3 17.3   8 275.8 180 3.07 3.730 17.60  0  0    3          Merc 450SL    cyl=8           24     55.989 three
20:    3 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3         Merc 450SLC    cyl=8           24     55.989 three
21:    4 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4           Mazda RX4    cyl=6           24     21.820  <NA>
22:    4 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4       Mazda RX4 Wag    cyl=6           24     21.820  <NA>
23:    4 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4            Merc 280    cyl=6           24     21.820  <NA>
24:    4 17.8   6 167.6 123 3.92 3.440 18.90  1  0    4           Merc 280C    cyl=6           24     21.820  <NA>
25:    4 10.4   8 472.0 205 2.93 5.250 17.98  0  0    3  Cadillac Fleetwood    cyl=8           24     55.989  <NA>
26:    4 13.3   8 350.0 245 3.73 3.840 15.41  0  0    3          Camaro Z28    cyl=8           24     55.989  <NA>
27:    4 14.7   8 440.0 230 3.23 5.345 17.42  0  0    3   Chrysler Imperial    cyl=8           24     55.989  <NA>
28:    4 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3          Duster 360    cyl=8           24     55.989  <NA>
29:    4 15.8   8 351.0 264 4.22 3.170 14.50  0  1    5      Ford Pantera L    cyl=8           40     55.989  <NA>
30:    4 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3 Lincoln Continental    cyl=8           24     55.989  <NA>
31:    6 19.7   6 145.0 175 3.62 2.770 15.50  0  1    5        Ferrari Dino    cyl=6           30     21.820  <NA>
32:    8 15.0   8 301.0 335 3.54 3.570 14.60  0  1    5       Maserati Bora    cyl=8           40     55.989  <NA>
    carb  mpg cyl  disp  hp drat    wt  qsec vs am gear               names cylNames cylTimesGear sumWtByCyl carbs

10. Reading data from the outside world

10.1. Read a CSV or TSV file

Note this is much faster than read.csv and read.table.

df1 <- fread("../data/dataTable_f1.csv")
df1
     name age weight
1:  babar  25    900
2: grumpy   2      3

10.2. Compressed files are decompressed automatically

df2 <- fread("../data/dataTable_f2.csv.gz")
df2
     name age weight
1:   kong  12    700
2: george   7     15

10.3. Reading data from a URL

df1u <- fread('https://github.com/richmit/ex-R/raw/master/data/dataTable_f2.csv.gz')
df1u
     name age weight
1:   kong  12    700
2: george   7     15

10.4. Read output from a command (in this case compressing with gunzip and filtering with awk)

df2 <- fread("gunzip < ../data/dataTable_f2.csv.gz | awk -F, 'NR==1 || $3<100 { print $0 }'")
df2
Empty data.table (0 rows and 3 cols): name,age,weight

10.5. Read a colon (:) separated file with extra whitespace and no column names

df3 <- fread('dataTable_f3.txt', sep=':', header=FALSE, strip.white=TRUE, col.names=c('name', 'age', 'weight'))
df3
Error in fread("dataTable_f3.txt", sep = ":", header = FALSE, strip.white = TRUE,  : 
  File 'dataTable_f3.txt' does not exist or is non-readable. getwd()=='c:/Users/richmit/Documents/world/my_prog/learn/R/src'
Error: object 'df3' not found