159 lines
3.9 KiB
ReStructuredText
159 lines
3.9 KiB
ReStructuredText
|
|
Plotting with Matplotlib
|
||
|
|
------------------------
|
||
|
|
|
||
|
|
Also creating a presentation with rst2pdf
|
||
|
|
=========================================
|
||
|
|
|
||
|
|
Data Structures
|
||
|
|
---------------
|
||
|
|
Favour simpler data structures if they do what you need. In order:
|
||
|
|
|
||
|
|
#. Built-in Lists
|
||
|
|
- 2xN data or simpler
|
||
|
|
- Can't install system dependencies
|
||
|
|
#. Numpy arrays
|
||
|
|
- 2 (or higher) dimensional data
|
||
|
|
- Lots of numerical calculations
|
||
|
|
#. Pandas series/dataframes
|
||
|
|
- 'Data Wrangling', reshaping, merging, sorting, querying
|
||
|
|
- Importing from complex formats
|
||
|
|
|
||
|
|
Shamelessly stolen from https://stackoverflow.com/a/45288000
|
||
|
|
|
||
|
|
Loading Data from Disk
|
||
|
|
----------------------
|
||
|
|
Natively
|
||
|
|
========
|
||
|
|
|
||
|
|
.. code-block:: python
|
||
|
|
|
||
|
|
>>> import csv
|
||
|
|
>>> with open('eggs.csv', newline='') as csvfile:
|
||
|
|
... spam = csv.reader(csvfile,
|
||
|
|
... delimiter=' ',
|
||
|
|
... quotechar='|')
|
||
|
|
... for row in spam:
|
||
|
|
... # Do things
|
||
|
|
... pass
|
||
|
|
|
||
|
|
Loading Data from Disk
|
||
|
|
----------------------
|
||
|
|
Numpy
|
||
|
|
=====
|
||
|
|
|
||
|
|
.. code-block:: python
|
||
|
|
|
||
|
|
>>> import numpy
|
||
|
|
>>> spam = numpy.genfromtxt('eggs.csv',
|
||
|
|
... delimiter=' ',
|
||
|
|
... dtype=None) # No error handling!
|
||
|
|
>>> for row in spam:
|
||
|
|
... # Do things
|
||
|
|
... pass
|
||
|
|
|
||
|
|
``numpy.genfromtxt`` will try to infer the datatype of each column if
|
||
|
|
``dtype=None`` is set.
|
||
|
|
|
||
|
|
``numpy.loadtxt`` is generally faster at runtime if your data is well formated
|
||
|
|
(no missing values, only numerical data or constant length strings)
|
||
|
|
|
||
|
|
Loading Data from Disk
|
||
|
|
----------------------
|
||
|
|
Numpy NB.
|
||
|
|
=========
|
||
|
|
**Remind me to look at some actual numpy usage at the end**
|
||
|
|
|
||
|
|
- I think numpy does some type coercion when creating arrays.
|
||
|
|
- Arrays created by ``numpy.genfromtxt`` can not in general be indexed like
|
||
|
|
``data[xstart:xend, ystart:yend]``.
|
||
|
|
- Data of unequal types are problematic! Pandas *may* be a better choice in
|
||
|
|
that case.
|
||
|
|
- Specifying some value for ``dtype`` is probably necessary in most cases in
|
||
|
|
practice: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
|
||
|
|
|
||
|
|
Loading Data from Disk
|
||
|
|
----------------------
|
||
|
|
Pandas
|
||
|
|
======
|
||
|
|
|
||
|
|
.. code-block:: python
|
||
|
|
|
||
|
|
>>> import pandas
|
||
|
|
>>> # dtype=None is def
|
||
|
|
>>> spam = pandas.read_csv('eggs.csv',
|
||
|
|
... delimiter=' ',
|
||
|
|
... header=None)
|
||
|
|
>>> for row in spam:
|
||
|
|
... # Do things
|
||
|
|
... pass
|
||
|
|
|
||
|
|
``header=None`` is required if the flie does not have a header.
|
||
|
|
|
||
|
|
|
||
|
|
|
||
|
|
Generating Data for Testing
|
||
|
|
---------------------------
|
||
|
|
|
||
|
|
Generating the data on the fly with numpy is convenient.
|
||
|
|
|
||
|
|
.. code-block:: python
|
||
|
|
|
||
|
|
>>> import numpy.random as ran
|
||
|
|
>>> # For repeatability
|
||
|
|
>>> ran.seed(7890234)
|
||
|
|
>>> # Uniform [0, 1) floats
|
||
|
|
>>> data = ran.rand(100, 2)
|
||
|
|
>>> # Uniform [0, 1) floats
|
||
|
|
>>> data = ran.rand(100, 100, 100)
|
||
|
|
>>> # Std. normal floats
|
||
|
|
>>> data = ran.randn(100)
|
||
|
|
>>> # 3x14x15 array of binomial ints with n = 100, p = 0.1
|
||
|
|
>>> data = ran.binomial(100, 0.1, (3, 14, 15))
|
||
|
|
|
||
|
|
Plotting Time Series
|
||
|
|
--------------------
|
||
|
|
|
||
|
|
Plot data of the form:
|
||
|
|
|
||
|
|
.. math:: y=f(t)
|
||
|
|
|
||
|
|
|
||
|
|
Subplots
|
||
|
|
--------
|
||
|
|
|
||
|
|
|
||
|
|
Saving Plots
|
||
|
|
------------
|
||
|
|
|
||
|
|
So far I've just displayed plots with ``plt.show()``. You can actually save
|
||
|
|
the plots from that interface manually, but when scripting, it's convenient
|
||
|
|
to do so automatically:
|
||
|
|
|
||
|
|
.. code-block:: python
|
||
|
|
|
||
|
|
>>> # Some plotting has previously occured
|
||
|
|
>>> plt.savefig('eggs.pdf', dpi=300, transparent=False)
|
||
|
|
|
||
|
|
The output format is interpreted from the file extension.
|
||
|
|
The keyword arguments are optional here. Other options exist.
|
||
|
|
|
||
|
|
Error Bars
|
||
|
|
----------
|
||
|
|
|
||
|
|
|
||
|
|
Stacked Bar Graph
|
||
|
|
-----------------
|
||
|
|
|
||
|
|
|
||
|
|
Resources
|
||
|
|
---------
|
||
|
|
NumPy User Guide: https://docs.scipy.org/doc/numpy/user/index.html
|
||
|
|
|
||
|
|
NumPy Reference: https://docs.scipy.org/doc/numpy/reference/index.html#reference
|
||
|
|
|
||
|
|
Matplotlib example gallery: https://matplotlib.org/gallery/index.html
|
||
|
|
|
||
|
|
Pandas: It probably exists. Good luck.
|
||
|
|
|
||
|
|
This presentation: https://git.friedersdorff.com/max/plotting_with_matplotlib.git
|