Previous Up Next

4.1.4  Reading CSV data

CSV data is textual data formatted into m lines separated by a character linesep, each of which contains n elements separated by a character sep. Usually, linesep is the newline character and sep is the comma or tab character.

The csv2gen command converts CSV data from files and strings to Xcas matrices.

Examples

To convert Matlab array syntax to a giac matrix:

csv2gen("1 2 3; 4 5 6"," ",";",string)
     


123
456


          

Assuming that the file hooke.csv (containing Hooke’s Law demo data) is downloaded from here to the Downloads folder, you can load it by typing e.g.

hooke:=csv2gen("/home/luka/Downloads/hooke.csv",",")
     














“Index”“Mass (kg)”“Spring 1 (m)”“Spring 2 (m)”
10.00.050.05
20.490.0660.066
30.980.0870.08
41.470.1160.108
51.960.1420.138
62.450.1660.158
72.940.1930.174
83.430.2040.192
93.920.2260.205
104.410.2380.232














          

The command

tail(hooke)

is a convenient way to obtain the table with the header row removed (see Section 6.1.5).

Application: loading and sorting real-world data

Assume that you require global annual mean temperature anomaly data for the last two of centuries (it has been recorded since 1880). The corresponding CSV file can be found here. The file can be imported in Xcas by entering:

data:=csv2gen("/home/luka/Downloads/annual_csv.csv",","):; header:=data[0]
     

“Source”,“Year”,“Mean”
          

There are three columns in the obtained table: Source, Year, and Mean. The last column contains the mean anomalities in degrees Celsius. To collect different data sources, enter:

sources:=set[op(tail(col(data,0)))]
     
“GCAG”,“GISTEMP”          

There are two sources of data: GCAG and GISTEMP, and the corresponding entries are interleaved. To sort data by source, enter:

t:=table():; for src in sources do t[src]=<sort(tran([col(select(r->r[0]==src,data),1..2)])); od:;

Indeed, select selects the data rows in which the first element is the source src; col returns a sequence containing the second and third column, which is converted into a two-row matrix by using the [] delimiters; tran transposes the matrix, returning the desired list of pairs; finally, sort sorts the list according to the lexicographic order (effectively along the first column, i.e. the time axis).

For instance, to plot the GCAG data, enter:

gcag:=t["GCAG"]:; labels=["year","°C"]; title="Annual mean temperature anomalies [°C]"; listplot(gcag)

Previous Up Next