Plotting with Matplotlib ------------------------ Also creating a presentation with rst2pdf ========================================= Data Structures --------------- Favour simpler data structures if they do what you need. In order: #. Built-in Lists - 2xN data or simpler - Can't install system dependencies #. Numpy arrays - 2 (or higher) dimensional data - Lots of numerical calculations #. Pandas series/dataframes - 'Data Wrangling', reshaping, merging, sorting, querying - Importing from complex formats Shamelessly stolen from https://stackoverflow.com/a/45288000 Loading Data from Disk ---------------------- Natively ======== .. code-block:: python >>> import csv >>> with open('eggs.csv', newline='') as csvfile: ... spam = csv.reader(csvfile, ... delimiter=' ', ... quotechar='|') ... for row in spam: ... # Do things ... pass Loading Data from Disk ---------------------- Numpy ===== .. code-block:: python >>> import numpy >>> spam = numpy.genfromtxt('eggs.csv', ... delimiter=' ', ... dtype=None) # No error handling! >>> for row in spam: ... # Do things ... pass ``numpy.genfromtxt`` will try to infer the datatype of each column if ``dtype=None`` is set. ``numpy.loadtxt`` is generally faster at runtime if your data is well formated (no missing values, only numerical data or constant length strings) Loading Data from Disk ---------------------- Numpy NB. ========= **Remind me to look at some actual numpy usage at the end** - I think numpy does some type coercion when creating arrays. - Arrays created by ``numpy.genfromtxt`` can not in general be indexed like ``data[xstart:xend, ystart:yend]``. - Data of unequal types are problematic! Pandas *may* be a better choice in that case. - Specifying some value for ``dtype`` is probably necessary in most cases in practice: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html Loading Data from Disk ---------------------- Pandas ====== .. code-block:: python >>> import pandas >>> # dtype=None is def >>> spam = pandas.read_csv('eggs.csv', ... delimiter=' ', ... header=None) >>> for row in spam: ... # Do things ... pass ``header=None`` is required if the flie does not have a header. Generating Data for Testing --------------------------- Generating the data on the fly with numpy is convenient. .. code-block:: python >>> import numpy.random as ran >>> # For repeatability >>> ran.seed(7890234) >>> # Uniform [0, 1) floats >>> data = ran.rand(100, 2) >>> # Uniform [0, 1) floats >>> data = ran.rand(100, 100, 100) >>> # Std. normal floats >>> data = ran.randn(100) >>> # 3x14x15 array of binomial ints with n = 100, p = 0.1 >>> data = ran.binomial(100, 0.1, (3, 14, 15)) Plotting Time Series -------------------- Plot data of the form: .. math:: y=f(t) Subplots -------- Saving Plots ------------ So far I've just displayed plots with ``plt.show()``. You can actually save the plots from that interface manually, but when scripting, it's convenient to do so automatically: .. code-block:: python >>> # Some plotting has previously occured >>> plt.savefig('eggs.pdf', dpi=300, transparent=False) The output format is interpreted from the file extension. The keyword arguments are optional here. Other options exist. Error Bars ---------- Stacked Bar Graph ----------------- Resources --------- NumPy User Guide: https://docs.scipy.org/doc/numpy/user/index.html NumPy Reference: https://docs.scipy.org/doc/numpy/reference/index.html#reference Matplotlib example gallery: https://matplotlib.org/gallery/index.html Pandas: It probably exists. Good luck. This presentation: https://git.friedersdorff.com/max/plotting_with_matplotlib.git