diceware_presentation/slides.rst

Plotting with Matplotlib
------------------------

Also creating a presentation with rst2pdf
=========================================

Data Structures
---------------
Favour simpler data structures if they do what you need.  In order:

#. Built-in Lists
    - 2xN data or simpler
    - Can't install system dependencies
#. Numpy arrays
    - 2 (or higher) dimensional data
    - Lots of numerical calculations
#. Pandas series/dataframes
    - 'Data Wrangling', reshaping, merging, sorting, querying
    - Importing from complex formats

Shamelessly stolen from https://stackoverflow.com/a/45288000

Loading Data from Disk
----------------------
Natively
========

.. code-block:: python

   >>> import csv
   >>> with open('eggs.csv', newline='') as csvfile:
   ...     spam = csv.reader(csvfile,
   ...                       delimiter=' ',
   ...                       quotechar='|')
   ...     for row in spam:
   ...         # Do things
   ...         pass

Loading Data from Disk
----------------------
Numpy
=====

.. code-block:: python

   >>> import numpy
   >>> spam = numpy.genfromtxt('eggs.csv',
   ...                         delimiter=' ',
   ...                         dtype=None) # No error handling!
   >>> for row in spam:
   ...     # Do things
   ...     pass

``numpy.genfromtxt`` will try to infer the datatype of each column if
``dtype=None`` is set.

``numpy.loadtxt`` is generally faster at runtime if your data is well formated
(no missing values, only numerical data or constant length strings)

Loading Data from Disk
----------------------
Numpy NB.
=========
**Remind me to look at some actual numpy usage at the end**

- I think numpy does some type coercion when creating arrays.
- Arrays created by ``numpy.genfromtxt`` can not in general be indexed like
  ``data[xstart:xend, ystart:yend]``.
- Data of unequal types are problematic!  Pandas *may* be a better choice in
  that case.
- Specifying some value for ``dtype`` is probably necessary in most cases in
  practice: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html

Loading Data from Disk
----------------------
Pandas
======

.. code-block:: python

   >>> import pandas
   >>> # dtype=None is def
   >>> spam = pandas.read_csv('eggs.csv',
   ...                        delimiter=' ',
   ...                        header=None)
   >>> for row in spam:
   ...     # Do things
   ...     pass

``header=None`` is required if the flie does not have a header.


Generating Data for Testing
---------------------------

Generating the data on the fly with numpy is convenient.

.. code-block:: python

   >>> import numpy.random as ran
   >>> # For repeatability
   >>> ran.seed(7890234)
   >>> # Uniform [0, 1) floats
   >>> data = ran.rand(100, 2)
   >>> # Uniform [0, 1) floats
   >>> data = ran.rand(100, 100, 100)
   >>> # Std. normal floats
   >>> data = ran.randn(100)
   >>> # 3x14x15 array of binomial ints with n = 100, p = 0.1
   >>> data = ran.binomial(100, 0.1, (3, 14, 15))

Plotting Time Series
--------------------

Plot data of the form:

.. math:: y=f(t)


Subplots
--------


Saving Plots
------------

So far I've just displayed plots with ``plt.show()``.  You can actually save
the plots from that interface manually, but when scripting, it's convenient
to do so automatically:

.. code-block:: python

   >>> # Some plotting has previously occured
   >>> plt.savefig('eggs.pdf', dpi=300, transparent=False)

The output format is interpreted from the file extension.
The keyword arguments are optional here.  Other options exist.

Error Bars
----------


Stacked Bar Graph
-----------------


Resources
---------
NumPy User Guide: https://docs.scipy.org/doc/numpy/user/index.html

NumPy Reference: https://docs.scipy.org/doc/numpy/reference/index.html#reference

Matplotlib example gallery: https://matplotlib.org/gallery/index.html

Pandas: It probably exists.  Good luck.

This presentation: https://git.friedersdorff.com/max/plotting_with_matplotlib.git