Add content for slides

This commit is contained in:
Maximilian Friedersdorff 2019-05-30 13:43:56 +01:00
parent 9d72f9f066
commit 7d3423f860
4 changed files with 67 additions and 140 deletions

BIN
password_reuse_1.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 80 KiB

BIN
password_reuse_2.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 65 KiB

BIN
password_reuse_3.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 36 KiB

View file

@ -1,158 +1,85 @@
Plotting with Matplotlib
------------------------
Why is Password Reuse a Problem?
--------------------------------
.. image:: password_reuse_1.png
.. image:: password_reuse_2.png
.. image:: password_reuse_3.png
Also creating a presentation with rst2pdf
=========================================
About password strength
-----------------------
How is strength measured?
=========================
'Entropy' `s` depends on the size of the alphabet `a` and the length `n` of the
password:
Data Structures
---------------
Favour simpler data structures if they do what you need. In order:
.. math::
s = log_2(a^n)
#. Built-in Lists
- 2xN data or simpler
- Can't install system dependencies
#. Numpy arrays
- 2 (or higher) dimensional data
- Lots of numerical calculations
#. Pandas series/dataframes
- 'Data Wrangling', reshaping, merging, sorting, querying
- Importing from complex formats
* 0889234877724602 -> 53 bits
* ZeZJieatdH -> 60 bits
Shamelessly stolen from https://stackoverflow.com/a/45288000
Loading Data from Disk
----------------------
Natively
========
.. code-block:: python
>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
... spam = csv.reader(csvfile,
... delimiter=' ',
... quotechar='|')
... for row in spam:
... # Do things
... pass
Loading Data from Disk
----------------------
Numpy
=====
.. code-block:: python
>>> import numpy
>>> spam = numpy.genfromtxt('eggs.csv',
... delimiter=' ',
... dtype=None) # No error handling!
>>> for row in spam:
... # Do things
... pass
``numpy.genfromtxt`` will try to infer the datatype of each column if
``dtype=None`` is set.
``numpy.loadtxt`` is generally faster at runtime if your data is well formated
(no missing values, only numerical data or constant length strings)
Loading Data from Disk
----------------------
Numpy NB.
=========
**Remind me to look at some actual numpy usage at the end**
- I think numpy does some type coercion when creating arrays.
- Arrays created by ``numpy.genfromtxt`` can not in general be indexed like
``data[xstart:xend, ystart:yend]``.
- Data of unequal types are problematic! Pandas *may* be a better choice in
that case.
- Specifying some value for ``dtype`` is probably necessary in most cases in
practice: https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html
Loading Data from Disk
----------------------
Pandas
======
.. code-block:: python
>>> import pandas
>>> # dtype=None is def
>>> spam = pandas.read_csv('eggs.csv',
... delimiter=' ',
... header=None)
>>> for row in spam:
... # Do things
... pass
``header=None`` is required if the flie does not have a header.
Why are weak passwords problematic?
===================================
Weak passwords are trivial to crack in many situations. A password with 53 bits
may be cracked by a criminal organisation in less than an hour.
What about strong passwords?
============================
They are difficult to remember, a problem especially when you use a different
strong password for every service. You are also tempted to write them down, or
reuse them.
Generating Data for Testing
---------------------------
It's surprisingly difficult for humans to generate good passwords!
Generating the data on the fly with numpy is convenient.
Password Managers to the Rescue!
--------------------------------
Password managers allow you to create a unique and strong password for every
service.
.. code-block:: python
Additional benefits:
>>> import numpy.random as ran
>>> # For repeatability
>>> ran.seed(7890234)
>>> # Uniform [0, 1) floats
>>> data = ran.rand(100, 2)
>>> # Uniform [0, 1) floats
>>> data = ran.rand(100, 100, 100)
>>> # Std. normal floats
>>> data = ran.randn(100)
>>> # 3x14x15 array of binomial ints with n = 100, p = 0.1
>>> data = ran.binomial(100, 0.1, (3, 14, 15))
* Remembers passwords for you
* Generates passwords for you
* Automagically fills in passwords on websites for you, this is important!
* Makes passwords available on all your configured devices
* Can store additional related data, usernames, answers to security questions,
pins for debit/credit cards
Plotting Time Series
--------------------
Any of the mainstream password manager is equivalent in the above respects.
Plot data of the form:
Can you trust password managers?
--------------------------------
Yes*
.. math:: y=f(t)
How do they keep passwords secure?
----------------------------------
1. User supplies a password
2. The password is used to derive an encryption key. This process is designed
to be slow, even on modern hardware
3. The so generated encryption key is used to encrypt/decrypt your passwords
Note that the security of the encryption depends on the strengh of your
password. With a poor password (50 bits), it would take the entire computing
power of the world less than a month to crack the database. With a decent ish
password (60 bits), it would take on the order of 50 years on average. With a
better password (70 bits), it would take on the order of 50,000 years.
Subplots
--------
Generating a Strong Password
----------------------------
Passphrases are better than passwords:
* Tr0ub4dor&3 -> 28 bits of entropy, hard to remember
* correct horse battery stable -> 44 bits of entropy, easy to remember
Saving Plots
------------
Use passphrases everywhere you have to remember.
So far I've just displayed plots with ``plt.show()``. You can actually save
the plots from that interface manually, but when scripting, it's convenient
to do so automatically:
.. code-block:: python
>>> # Some plotting has previously occured
>>> plt.savefig('eggs.pdf', dpi=300, transparent=False)
The output format is interpreted from the file extension.
The keyword arguments are optional here. Other options exist.
Error Bars
----------
Stacked Bar Graph
-----------------
Resources
---------
NumPy User Guide: https://docs.scipy.org/doc/numpy/user/index.html
NumPy Reference: https://docs.scipy.org/doc/numpy/reference/index.html#reference
Matplotlib example gallery: https://matplotlib.org/gallery/index.html
Pandas: It probably exists. Good luck.
This presentation: https://git.friedersdorff.com/max/plotting_with_matplotlib.git
Generate passphrases with Diceware
==================================
1. Roll 5, 6 sided, *physical* dice
2. Read the numbers left to right
3. Find the word with that number on a list 6^5 (7776) words
4. Repeat until desired length is reached. For a password manager, use at
least 7.
5. Write down your passphrase on paper and keep it somewhere secure
6. If you are 100% confident that you will not forget the passphrase, destroy
the paper by burning