diceware_presentation/slides.rst

158 lines
3.9 KiB
ReStructuredText

Plotting with Matplotlib
------------------------
Also creating a presentation with rst2pdf
=========================================
Data Structures
---------------
Favour simpler data structures if they do what you need. In order:
#. Built-in Lists
- 2xN data or simpler
- Can't install system dependencies
#. Numpy arrays
- 2 (or higher) dimensional data
- Lots of numerical calculations
#. Pandas series/dataframes
- 'Data Wrangling', reshaping, merging, sorting, querying
- Importing from complex formats
Shamelessly stolen from https://stackoverflow.com/a/45288000
Loading Data from Disk
----------------------
Natively
========
.. code-block:: python
>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
... spam = csv.reader(csvfile,
... delimiter=' ',
... quotechar='|')
... for row in spam:
... # Do things
... pass
Loading Data from Disk
----------------------
Numpy
=====
.. code-block:: python
>>> import numpy
>>> spam = numpy.genfromtxt('eggs.csv',
... delimiter=' ',
... dtype=None) # No error handling!
>>> for row in spam:
... # Do things
... pass
``numpy.genfromtxt`` will try to infer the datatype of each column if
``dtype=None`` is set.
``numpy.loadtxt`` is generally faster at runtime if your data is well formated
(no missing values, only numerical data or constant length strings)
Loading Data from Disk
----------------------
Numpy NB.
=========
**Remind me to look at some actual numpy usage at the end**
- I think numpy does some type coercion when creating arrays.
- Arrays created by ``numpy.genfromtxt`` can not in general be indexed like
``data[xstart:xend, ystart:yend]``.
- Data of unequal types are problematic! Pandas *may* be a better choice in
that case.
- Specifying some value for ``dtype`` is probably necessary in most cases in
practice: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
Loading Data from Disk
----------------------
Pandas
======
.. code-block:: python
>>> import pandas
>>> # dtype=None is def
>>> spam = pandas.read_csv('eggs.csv',
... delimiter=' ',
... header=None)
>>> for row in spam:
... # Do things
... pass
``header=None`` is required if the flie does not have a header.
Generating Data for Testing
---------------------------
Generating the data on the fly with numpy is convenient.
.. code-block:: python
>>> import numpy.random as ran
>>> # For repeatability
>>> ran.seed(7890234)
>>> # Uniform [0, 1) floats
>>> data = ran.rand(100, 2)
>>> # Uniform [0, 1) floats
>>> data = ran.rand(100, 100, 100)
>>> # Std. normal floats
>>> data = ran.randn(100)
>>> # 3x14x15 array of binomial ints with n = 100, p = 0.1
>>> data = ran.binomial(100, 0.1, (3, 14, 15))
Plotting Time Series
--------------------
Plot data of the form:
.. math:: y=f(t)
Subplots
--------
Saving Plots
------------
So far I've just displayed plots with ``plt.show()``. You can actually save
the plots from that interface manually, but when scripting, it's convenient
to do so automatically:
.. code-block:: python
>>> # Some plotting has previously occured
>>> plt.savefig('eggs.pdf', dpi=300, transparent=False)
The output format is interpreted from the file extension.
The keyword arguments are optional here. Other options exist.
Error Bars
----------
Stacked Bar Graph
-----------------
Resources
---------
NumPy User Guide: https://docs.scipy.org/doc/numpy/user/index.html
NumPy Reference: https://docs.scipy.org/doc/numpy/reference/index.html#reference
Matplotlib example gallery: https://matplotlib.org/gallery/index.html
Pandas: It probably exists. Good luck.
This presentation: https://git.friedersdorff.com/max/plotting_with_matplotlib.git