BIP39 Mnemonics Made Easy (Part 2 – The Tech of Bits to Backups)

Overview

In the last article, we discussed a high level overview of BIP39 mnemonics and their value as a simplified backup tool. Mnemonics make it much easier to take a single seed, back it up, and ensure access to an entire wallet of private keys, addresses, and transactions. But how do we go from a random set of bits to a list of words? Let’s discuss the technical side of BIP39.

Bits to Backups – The Steps for Generating a Mnemonic

First, Chaos

In order to generate a good seed, a fair amount of entropy or “randomness” is desirable. Good random number generators are hard to get, but modern OS’s like Linux do a pretty good job of sourcing entropy from the user and hard drives, and something like /dev/urandom on a daily driver machine should be sufficiently secure for generating the entropy we need.

Now how many random bits do we need? The BIP39 standard specifies 128-256 bits of entropy to be used for generating the seed. This will correspond to 12-24 words later on when we “map” the entropy to the words.

First, a warning: DO NOT USE any of the examples in this article to generate a wallet – your funds will be stolen!

With that out of the way, let’s look at an example. First, let’s generate 128 bits of entropy using os.urandom() in Python. Represented as binary, our entropy looks like this:

10111110011001010101110111001111010100011111011010110001110101111011110111000101101001100011110100010100011101000011011011100000

Next, a checksum

In order to better secure the seed, we’ll add a checksum to the end of the entropy. This makes it easier for wallet software to validate a backup seed.

To get the checksum, we’ll first take the SHA-256 hash of our entropy. Then, we take the first N/32 bits of the hash and append it to the entropy.

In our case, 128/32 bits gives us a 4 bit checksum size. In our example, the 4 bit checksum will be 0101. We’ll append that to the entropy to give us a 132 bit value:

101111100110010101011101110011110101000111110110101100011101011110111101110001011010011000111101000101000111010000110110111000000101

Dividing and our Dictionary

The final step of the process involves dividing our checksummed bits into “chunks” and mapping those chunks to the mnemonic words from the dictionary. The BIP39 standard specifies that the chunks will always be 11 bits long. So, we divide our 132 bit checksummed entropy into 12 chunks of 11 bits each:

  1. 10111110011
  2. 00101010111
  3. 01110011110

Now, each of these 11 bit chunks can be interpreted as an unsigned 11 bit integer value ranging from 0-2047. This “maps” to a word from the dictionary of 2048 words directly! These are standardized and listed in alphabetic order. So, we can take the 11 bit chunk as an index in the dictionary to extract the words we need:

  1. 10111110011 = 1523 -> salmon
  2. 00101010111 = 343 -> cliff
  3. 01110011110 = 926 -> inherit

The overall mnemonic we generate is this example turns out to be:

  1. salmon
  2. cliff
  3. inherit
  4. physical
  5. help
  6. type
  7. warfare
  8. regular
  9. dial
  10. photo
  11. asset
  12. scheme

Mnemonics – from Entropy to Dictionary Entries

The process of generating a mnemonic seed is both ingenious and straightforward. One can easily create a secure wallet seed of 12-24 words by generating some entropy, checksumming the data, and mapping to a standard dictionary.

I’ve written a project called MnemonicGen that generates mnemonics using these steps. Take a look at this project to see these steps implemented in Python. This code should be considered academic/experimental – use it to create wallets at your own risk. Other proven implementations such as Ian Coleman’s BIP39 are also available to study.

Happy generating!

BIP39 Mnemonics Made Easy (Part 1 – Backups, Simplified!)

Overview

A critical component of cryptocurrency security is the ability for users to easily and efficiently backup the private keys that control access to their funds on the blockchain. Without one’s private keys, any funds in a user’s addresses are irrecoverably lost.

However, the nature of early wallets made backing up one’s private keys a regularly-scheduled necessity, unwieldy and annoying for most users. BIP39 and associated Bitcoin Improvement Proposals have thankfully simplified private key backup by introducing HD wallets and mnemonics.

Newer wallets explained

HD? So like, High Definition? No, Hierarchical Deterministic!

In the early days of Bitcoin and other cryptocurrencies, address generation was done non-deterministically. For each new address needed, a private key would be randomly generated and stored in the wallet’s backup file. Most wallets would pre-generate some addresses in the initial wallet file, but every new private key/address introduced into the wallet meant a new backup would be needed.

For privacy reasons, it is recommended to use a new address for each transaction. And for every new address generated since the last backup, a user would need to create a new backup to avoid losing recent funds in the event of a wallet resortation. Even for “power users”, backups became an unwieldy and annoying task.

Enter BIPs (Bitcoin Improvement Proposals) 32 and 44. In summary, these proposals introduce HD (“Hierarchical Deterministics Wallets”). These wallets only require one seed to be randomly generated. And from that seed, all the private keys and addresses a wallet needs can be derived in a tree structure; all associated with the initial seed. Since the private keys and addresses can be regenerated from the seed, one only has to back up the seed to recover all of their private keys, addresses, and transactions for a wallet. Much better!

Introducing Mnemonics – Simplifying Seed Backups

The ability to generate an entire wallet from one seed drastically simplified wallet backups, and therefore has improved the ease by which users can keep their funds safe. However, a seed is still just a random binary value. Represented in hex or Base64 encoding, it is still fairly easy to misread/miswrite a character and accidentally create a useless wallet backup.

To truly simplify the task of backing up a wallet seed, some developers in the Bitcoin space proposed a system that allows the translation of the binary seed value into English words that can be more easily transcribed or even memorized to secure access to one’s funds. This proposal, given the designation BIP39, was written by Marek Palatinus, Pavol Rusnak, Aarone Voisine, and Sean Bowe.

What does a mnemonic look like?

Mnemonics don’t use just any set of words. These words are carefully chosen to avoid ambiguity and make transcription easy, so that a user doesn’t accidentally create an incorrect backup.

There are a total of 2048 words in the dictionary, and a wallet mnemonic contains 12-24 words. The last word contains a checksum validating the other words in the list, making it easy for wallets to validate a backup.

Here is a sample Bitcoin or Bitcoin Cash BIP39 mnemonic:

  1. army
  2. van
  3. defense
  4. carry
  5. jealous
  6. true
  7. garbage
  8. claim
  9. echo
  10. media
  11. make
  12. crunch

WARNING, DO NOT use this seed for a wallet. A seed must remain private, and your funds will be stolen! This mnemonic is excerpted from Andreas’ Antonopoulos’ Mastering Bitcoin to further discourage its use – millions of people have access to this wallet.

Now how do you get one of these fancy mnemonics for a wallet? Most modern wallet software will generate this for you when you create a wallet. Then, all you need to do is write down the phrase and store it in a secure location to backup access to your funds if your wallet device is lost or stolen.

Alternatively, a mnemonic can be generated by a separate tool and imported as a backup into the wallet software. I’ve written a generator called MnemonicGen that produces standard phrases that can be imported into any modern HD wallet that supports BIP39. Keep in mind that this particular project is meant to be academic/experimental and may not be sufficiently secure for your needs. But other mnemonic generators like Ian Coleman’s are widely used and well-vetted.

Mnemonics – Backups Made Better

With BIP39 mnemonics, Bitcoin newbies and power users alike can easily create and backup secure wallets without the need to keep a schedule or deal with unwieldy binary data encoding. This modern standard is implemented in widely-used and accessible wallets like the Bitcoin.com wallet, Electron Cash, Blockchain.info, and more. I would personally advise upgrading your cryptocurrency experience by using an HD wallet, simplifying your security practices and keeping your funds safe!

In the next article, we’ll discuss the technical workings of BIP39, showing how we can go from a random seed to a set of words in a few fairly straightforward steps.

Proof of Work, Explained (Part 2 – A Hash Bash for Techies)

Overview

In the last article, we looked at the overall idea of proof of work and its applications. That article covered the origins of this concept, how it works at a high level, and some of its applications. Now, let’s take a look at the technical inner workings of these algorithms.

In a nutshell, proof of work involves the use of hash functions. These one way functions form the basis for a difficult to solve, but easily verifiable computational puzzle as a way to prove that one did some amount of desired computing work.

Proof of Work, the Technical Perspective

Hash Functions

First, we need to understand a bit about hash functions and why they form the basis for proof of work. A hash function is a one-way function that takes some input of any size and outputs a consistently sized set of bits. The two most important characteristics of hash functions are that they:

  • Are one-way – you cannot take an output and find the input without brute force guessing
  • Have unique outputs for every possible input (if the hash function is a good one!)

These two properties are critical for proof of work. First, the one-way nature of these makes it so that brute-force is required to find some desired output. Second, the desired one-to-one input/output property makes it so we can easily verify the solution once we have one.

An Overview of the Algorithm

Hashing and Binary and Difficulty Targets, Oh My!

Proof of work builds on top of the properties of hash functions by realizing that as a stream of bits, hash outputs actually represent binary numbers. For example, an 8 bit hash 00001000 represents the decimal number “8”. Now remember that hash outputs can only be matched to a particular input by using brute force to guess.

Using these properties, proof of work takes a pretty ingenious approach to making a user do some amount of predetermined work – it makes them look for a hash that, interpreted as a number, is less than some target value!

This is where the idea of difficulty comes in. Let’s say you want the user to find some input where the hash value, when representing an 8 bit number, has two zeros in the front (00101010, for example). Now imagine you want the user to find an input that gives a hash with four zeros in front (00001011). Which one takes more guesses to compute? It turns out that the smaller the “difficulty target” value, the more guesses (and more computing time) it takes to find an input that gives the desired hash output. This is the fundamental basis for proof of work. It can be statistically predicted that a certain difficulty target will take roughly some amount of guesses (and therefore computing time) to find. So the smaller the difficulty target, the harder the puzzle.

The nonce value

Now since hash outputs map one-to-one with some input, how can we prevent the worker from just using a dictionary to find an input that meets the difficulty? Here is what makes proof of work truly proof – we always create a hash input unique to the problem we’re trying to solve, using an applicable message and a random guess we call a nonce.

See, for our proof of work to truly require the desired amount of work, we always start with a unique message for the problem. In the case of Bitcoin, our message is what is called the “block header” – a chunk of data containing information about the transactions included in the current block. In an anti-spam application, this message would be something like a forum post or the contents of an email message. Since this message is unique, a dictionary cannot be used to guess a hash that meets the difficulty target.

The worker then has to take the message plus a random number guess called the nonce, and run that combined string through the hashing function. If the hash output isn’t less than the target number, then the worker increments that random number guess concatenated to the message and tries again – and again and again until the right output is found.

Verifying the nonce

Once a nonce is found, another node or server can very easily verify that the solution (the nonce) is correct. All the verifying party has to do is take the original message plus the nonce value found by the worker and run that through the hash function. Since a hash input will always give the same output, it only takes one step to verify that the worker’s nonce is in fact a correct solution to the proof of work problem.

A Practical Example

Let’s take a look at an example anti-spam proof of work problem. Let’s say a user wants to contact a site owner with the message “Hello”, and the site owner wants the user to do some proof of work before sending that email. The site owner specifies a difficulty target of 2^240. This difficulty target can be any 8 bit number in this case, but a power of two is easy to work when building an application. This system uses the compute-intensive SHA-256 hashing algorithm for its proof of work. Here’s what the steps would look like:

Worker (Client)

  1. Uses “Hello” as the message
  2. Starts guessing a nonce with 0 – the hash input is the string “Hello0”
  3. The SHA-256 output of “Hello0” (in hexadecimal format) is 80878c5b013ba72c0d2b7e8f65868649cbdb1e7e7a8c8a07537d6b3619e4e32f
  4. Clearly, this output is greater than the difficulty target of 2^240, which would have three prepending 0’s in hexadecimal: 0001000000000000000000000000000000000000000000000000000000000000
  5. Increment nonce to 1, and try again. This continues until an appropriate nonce is found
  6. The client finally finds a nonce that works with the value 9172. The SHA-256 hash of “Hello9172” is 00001f2e9f8f74117b4178eb04b368c807f906ae2a07bece562266cbc9adff3c, which is less than the difficulty target of 0001000000000000000000000000000000000000000000000000000000000000 (2^240)
  7. Since the client has a nonce guess that meets the difficulty target for this unique message, it now has proof that it did all that computing work!

Verifying Party (Server)

  1. Take the message for this problem, “Hello”, plus the client’s found nonce, “9172” and pass “Hello9172” through the SHA-256 hash function
  2. Since hash functions produce the same output for any input, we get the same output the client found: 00001f2e9f8f74117b4178eb04b368c807f906ae2a07bece562266cbc9adff3c.
  3. Since the above output is indeed less than the difficulty target 2^240, the server has now verified that the client did the desired amount of computing work to find the nonce. The message can now be sent.

Proof of Work – Hashing for a Cause

These algorithms put the properties of hashing algorithms to new and innovative uses, particularly in the incredible space of cryptocurrencies. Proof of work takes the one-to-one input/output and irreversible properties of hash functions and uses them to create difficult to solve, easy to verify computing problems. This simple but interesting bit of math and computer science powers new approaches to interesting challenges. Proof of work can be used to help prevent spam in a new and unique way – by making large-volume spam uneconomical for its propagators. Arguably at its most revolutionary, proof of work powers the transaction verification and currency issuance components of cryptocurrencies like Bitcoin and Litecoin, allowing for an entirely new form of money free from centralized institutions.

Proof of Work, Explained (Part 1 – POW for Non-Techies)

Overview

Personally, I’m fascinated by both the technical and financial implications of cryptocurrencies like Bitcoin, Bitcoin Cash, and Litecoin (to name a few). The way these currencies work is a complex topic, with lots of moving parts to discuss. One of the core components of cryptocurrencies like Bitcoin is the mechanism by which an entirely decentralized system of money can securely verify transactions as well as issue new currency, all while preventing fraud and issuing at a predictable rate.

Most of these currencies solve this problem using a concept called “proof of work” by which nodes solve a computationally difficult but easily verifiable mathematical problem. This concept goes beyond cryptocurrencies as well, and actually originated as an anti-spam measure.

Proof of Work – The 10,000 foot view

What is Proof of Work?

Proof of work, fundamentally, is the solving of a computationally intensive mathematical problem. This problem has two very important properties – the solution to the problem is both:

  • Difficult (computationally intensive) to find
  • Easy to verify once found

The idea is this: for an application like cryptocurrency or anti-spam, a “node” or computer is challenged to find a solution to this puzzle. The solution can only be found by brute-force guessing. However, once the solution is found, all the other nodes on a network or a server can verify the solution in one step. Since the answer can only be found by brute-force computation but can easily be verified as correct, the solution to the problem serves as proof that a certain amount of computing work was done – hence the term “proof of work”.

Why is it Useful?

First, let’s look at the original application of proof of work: anti-spam. The original idea was implemented in a system called HashCash, invented by Adam Back. Back’s system works like so: Before performing an action like posting to a forum or sending an email, the user of a site is made to do a small proof of work problem. This problem only takes half a second or so of computing to solve, and of course is almost instantaneous for the system to verify. For a legitimate user of a forum or email system, the half second of computing is no obstacle to completing his or her task. However, for a spammer trying to send hundreds of thousands of spam messages, the task suddenly becomes very uneconomical since it would tie up their computer for minutes or even hours at a time!

Now how does this system apply to cryptocurrencies like Bitcoin? In this system, transaction verification and currency issuance is totally decentralized – no third party is trusted to create new value tokens or verify that transactions are legitimate. This of course presents a massive fraud-prevention challenge – how can the network ensure that malicious parties don’t create “counterfeit” currency or send through transactions that aren’t valid?

Proof of work helps to solve this problem. On the Bitcoin network, new transactions are broadcast to computers running what is called “mining” software and accumulated into “blocks” of transactions that will be validated at one time. Every time a new block is waiting to be verified, all the nodes on the network running this software essentially “race” to solve a proof of work problem first. The Bitcoin network adjusts the difficulty of this problem so that about once every ten minutes, one miner wins the race and finds a solution to this problem. Once one node finds the answer, it tells all the other nodes on the network that it’s found an answer, and the other nodes can instantly verify that the answer is correct.

The node that finds proof of work for this block is rewarded with brand new Bitcoin (issued at a predictable rate) as well as all the transaction fees in that block. This computationally expensive proof of work problem creates an excellent system of economic incentives- the reward of new Bitcoin drives miners to to verify transactions are correct, and also make fraud more expensive than legitimate mining. If a miner were to try and cheat, all the other nodes running the legitimate software would instantly reject the new block since it doesn’t meet the rules of the network, and all of the time and computing power of the malicious node would thus be wasted.

Proof of Work – Powering Cryptocurrency and Thwarting Spammers

The idea of proof of work has incredible value for multiple applications. This system allows computing to be used as a precious resource in a purely digital economy; a way to both secure monetary transactions and prevent the waste of resources like time and storage space. In an anti-spam setting, proof of work allows the operators of a curated space to reduce the impact of spam on their systems, reducing wasted time, clutter, and storage space. In a cryptocurrency application, proof of work allows the secure verification of transactions and the issuance of new currency without the need for a trusted third party, the often fatal flaw in fiat systems.

The applications of this technology are incredibly interesting. In the case of cryptocurrencies, I would say its application is part of a system that is revolutionary. Now, as a software engineer, I find the actual technical workings of proof of work to be even more interesting than the surface description. In the next article, I’ll walk through how these algorithms work from a more technical perspective.

Bitcoin as “Digital Gold” is Bad for Crypto Adoption

“Digital Gold” vs. “Digital Cash”

The core Bitcoin network has a scaling problem, and has had this problem for a while now. As more and more transactions try to fit in Bitcoin’s 1MB blocks (once every 10 minutes), network fees have skyrocketed to $5-10 dollars, and that’s even a little low if you want your transaction confirmed within an hour or so.

One response to this problem, especially from supporters of the Bitcoin Core roadmap, is to ignore the problem to some extent. As Bitcoin has seen more attention over the last few years, the price has risen dramatically. As a result, many are now claiming that Bitcoin is not meant to be a “means of exchange” or a form of “digital cash” for day to day transactions. Rather, their viewpoint is that Bitcoin should be seen as a “store of value” or “digital gold”.

To be clear about my biases in this space – I think cryptocurrencies are at their most interesting and valuable as a means of exchange; a way to do truly global, peer-to-peer, decentralized cash. I do not care at all for the risky speculative investing that goes on in the crypto space; I believe these currencies should be used or held in small amounts, allowing one to learn more about this fascinating technology and spread adoption.

With that said, I don’t have a problem with a digital currency being used as a long term store of value like a “digital gold”. There is plenty of room in the crypto space for currencies that solve different problems in different ways. I do, however, think there is a big problem with Bitcoin being the currency of choice for that use case.

The Bitcoin Brand, and the problem with Bitcoin as “Digital Gold”

Let’s be honest, fellow crypto nerds. How many people in your daily life have actually heard of Bitcoin? And how many of them actually understand it, at least at a high level? How many people actually own some and use it? If you’ve got the same variety of people in your life as I do, the percentage isn’t that high.

Now how many of that small subset of Bitcoin-aware people in your life know about Bitcoin Cash? Ethereum? Litecoin, Vertcoin, Monero, Dash? It’s an even smaller percentage, surely. Even if they’ve heard of them, do they understand how these alternatives to Bitcoin solve different problems? We’re down to a sliver of people that understand and adopt these different currencies beyond Bitcoin.

Herein lies the crux of the problem:

Bitcoin is the defacto cryptocurrency. It is the face of digital money, the storefront, the brand, however you want to refer to it.

Whether anyone likes it or not, Bitcoin is what most people hear about first when they hear about cryptocurrencies. And with a now large chunk of the Bitcoin community marketing this as “digital gold”, we have the potential to miss out on opportunities for widespread adoption in the coming years.

More and more businesses are going to become interested in adopting cryptocurrencies as a form of payment, they’re going to want to start with Bitcoin. It is the biggest after all, and the original value proposition we still see on bitcoin.org is “fast peer-to-peer transactions” and “low processing fees”. The reality is far from that, however. When the coffee shop owner realizes his or her customer will have to pay $10 to buy a $3 coffee, they’re not going to find Bitcoin usable for their business. How many of the already small percentages of crypto-curious entrepeneurs are going to take the time to understand Bitcoin Cash, Litecoin, or Dash? Many will probably say: screw this and return to business as usual with fiat.

Satoshi’s “Peer-to-peer electronic cash” – How Does Bitcoin Continue That Vision?

The problem is not a cryptocurrency as “digital gold”, the problem is Bitcoin as “digital gold”. The original promise of Bitcoin was indeed a global, peer-to-peer, low fee alternative to centralized payment processors like Visa, Mastercard, or PayPal. But as the Bitcoin community shifts away from that vision, they take the adoption of that vision with them.

With a functional implementation of the lighting network at least two years away, and the lack of press for already-scaled alternatives like Bitcoin Cash, Litecoin, etc., it concerns me that adoption of cryptocurrencies will stall. I don’t at all fear that they will go away or stop the ceaseless flow of innovation that we’ve seen since the advent of Satoshi’s brilliant whitepaper. But I do worry that Bitcoin’s current scaling problems and the community’s attitude toward it will lead to several years of stalled adoption in the mainstream. I do hope, however, the problem gets fixed soon and that I am wrong.

Configuring SSL with Apache and Let’s Encrypt (Part 2 – Manual Apache Configuration)

Overview

In the first part of “Configuring SSL with Apache and Let’s Encrypt”, we discussed why you should be offering HTTPS connections to your site, and how to get a free certificate from Let’s Encrypt. That tutorial shows commands that use CertBot to automatically configure your Apache web server with the new certificate.

However, if you’re like me you may want some more fine-tuned control over how your web server is configured. We can use CertBot to just retrieve a certificate for us and set up Apache virtual hosts by hand. You can even force connections to your site to use HTTPS all the time.

Configuring HTTPS with Apache By Hand

Finding your Let’s Encrypt certificates

In the case you want to hand-configure your server, you can ask CertBot to just fetch the certificates and leave Apache alone with:

sudo certbot --apache certonly

This will create a folder containing the files you’ll need for that domain. You’ll need these two files:

/etc/letsencrypt/live/mydomain/fullchain.pem
/etc/letsencrypt/live/mydomain/privkey.pem

fullchain.pem is the certificate itself, and privkey.pem is the certificate’s private key. Make sure you keep these safe, especially the private key!

You can then copy these two files to their respective directories in /etc/ssl. If you want, you can change the file names. I like to name mine with the site and use .crt/.key for the file extensions:

sudo cp fullchain.pem /etc/ssl/certs/mydomain.crt
sudo cp privkey.pem /etc/ssl/private/mydomain.key

Setting up Apache Virtual Hosts with SSL

The next step is to set up Apache virtual hosts to use our certificate. Virtual hosts are a great tool that allow you to manage multiple websites (with multiple domains) on one web server. They also allow you to configure things like which SSL certificates to use and which folders to serve for your site, so it is valuable to learn how they work.

Your Apache SSL virtual hosts files will most likely be in the /etc/apache2/sites-available directory. For my version of Apache, the SSL vhosts file is called default-ssl.conf.

Let’s take a look at what a basic SSL-configured virtual host definition looks like:


<IfModule mod_ssl.c>

 <VirtualHost *:443>

  ServerAdmin you@yoursite.net

  DocumentRoot /var/www/html/yoursite

  ServerName yoursite.net
  ServerAlias www.yoursite.net

  SSLEngine on
  SSLCertificateFile /etc/ssl/certs/yoursite.crt
  SSLCertificateKeyFile /etc/ssl/private/yoursite.key

  <FilesMatch "\.(cgi|shtml|phtml|php)$">
    SSLOptions +StdEnvVars
  </FilesMatch>
  <Directory /usr/lib/cgi-bin>
    SSLOptions +StdEnvVars
  </Directory>

  BrowserMatch "MSIE [2-6]" \
    nokeepalive ssl-unclean-shutdown \
    downgrade-1.0 force-response-1.0
  BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

 </VirtualHost>

</IfModule>

Let’s break down what this all means, especially the SSL part.

  ServerAdmin you@yoursite.net

  DocumentRoot /var/www/html/yoursite

  ServerName yoursite.net
  ServerAlias www.yoursite.net

This first block of information is the same as a virtual host for plain HTTP. This information tells apache what to do if someone comes to your web server from the domain name yoursite.net. If a client makes a request, Apache will serve files starting at the document root /var/www/html/yoursite. All of the files your site will serve should be contained in that folder. The ServerAdmin bit tells someone who they can contact if something is wrong. According to this Stack Overflow question though, that feature is deprecated so you may want to omit it.


  SSLEngine on
  SSLCertificateFile /etc/ssl/certs/yoursite.crt
  SSLCertificateKeyFile /etc/ssl/private/yoursite.key

  <FilesMatch "\.(cgi|shtml|phtml|php)$">
    SSLOptions +StdEnvVars
  </FilesMatch>
  <Directory /usr/lib/cgi-bin>
    SSLOptions +StdEnvVars
  </Directory>

  BrowserMatch "MSIE [2-6]" \
    nokeepalive ssl-unclean-shutdown \
    downgrade-1.0 force-response-1.0
  BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

This second block of code starts the configuration of SSL. It tells Apache that connections to this site via HTTPS can be accepted, and includes some information about what to do with certain browsers and CGI requests. You can safely leave the bottom portion alone; it is included by default and I’ve not found a reason in my experience to touch it.

The more important piece you’ll need to edit is the section containing SSLCertificateFile and SSLCertificateKeyFile. Make sure that those paths match up to the files you copied to /etc/ssl/certs and /etc/ssl/private. Those directives tell Apache what certificate file to use for incoming SSL connections, and the private key that will be used to decrypt data sent by the client.

Once you’ve got that file correctly edited, you can restart Apache using the command sudo service apache2 restart. If you get an error here, there is most likely a syntax problem in the default-ssl.conf file that you just edited.

Forcing HTTPS connections

Now that you’ve got a shiny, new, CA-signed SSL certificate configured for Apache, you may want to do away with HTTP connections altogether. Fortunately, Apache makes this fairly simple, again with the use of virtual hosts.

This time, you’ll be editing the HTTP vhosts file. In /etc/apache2/sites-available, my version of Apache has a file called 000-default.conf. Open this up in your favorite text editor, and add the following to each virtual host you want to force HTTPS with:

Redirect / https://mysite.net

This directive tells apache to redirect any plain text connection to the document root of that site to redirect to the document root of your site with HTTPS. One thing I’ve noticed is that this can cause issues with HTTP links to subdirectories like http://mysite.net/subdir, so you’ll want to update any links in your site to use HTTPS as well.

Other solutions include using the rewrite engine, but Apache discourages doing so for this use case.

Configuring Securely Encrypted Connections to Your Web Server

It’s now easy and free to get a CA-signed SSL certificate thanks to Let’s Encrypt and CertBot. Combined with the highly configurable Apache web server, it’s fairly straightforward to get your websites tuned up to use HTTPS with your desired site setup in mind.

Using secure connections is a great way to provide your users with increased security and privacy, preventing man-in-the-middle attackers from getting their sensitive information or interfering with the content you provide. Happy encrypting!

Configuring SSL with Apache and Let’s Encrypt (Part 1 – Let’s Encrypt and CertBot)

Overview

With the robustness of modern webpages, HTTPS is the way to go for almost any website. Most of the time an individual is online, they’re exchanging sensitive information like passwords and personal information. If you’re logging into a site with a password, a securely encrypted connection is a must.

However, even informational sites should use encryption! If you’re running a site for a small business, a topic that interests you, or even your own portfolio, your users can benefit from increased privacy and security when they connect using HTTPS rather than plain text HTTP.

Configuring SSL for a Website

Why SSL?

It’s important to roughly understand how HTTPS works, and why using it will improve the privacy and security of your users. From the 10,000 foot view, what SSL does is encrypt all the traffic between the client (the user’s browser) and the server (your web server). Everything beyond the domain name is encrypted, and therefore hidden from a man-in-the middle attacker.

From a security perspective, this prevents an adversary on the network from modifying content en-route to the user. Imagine you’re an application developer and you have software available for download. You even go so far as to supply a SHA-256 fingerprint of your application so that users can verify its contents. If you’re transmitting your application over HTTP, a malicious man-in-the-middle could modify the application and fingerprint in transit, giving your end user a nice dose of malware. But if you allows your users to download over a secure connection, the attacker could not see or change either component on its way to your (happy, malware free) consumer.

From a privacy perspective, HTTPS prevents a man-in-the-middle from snooping on what your user is viewing on your website. The only thing someone on the network could see is that they are connecting to your particular domain, no specific URLs or content. Imagine your user is doing research on a sensitive topic, or trying to download a piece of software that someone doesn’t want them to have. To some extent, HTTPS protects that person’s privacy by preventing a snooper from understanding what they see going over the wire.

Obtaining an SSL Certificate with Let’s Encrypt

If you’re sufficiently convinced that your website should use SSL, the good news is that you can set that up for free! The awesome folks over at Let’s Encrypt have developed a service that allows you to obtain your very own, CA signed SSL certificate from the command line.

To actually get the certificate for your system, you’ll want to use an ACME client called CertBot from the Electronic Frontier Foundation. CertBot uses the ACME protocol to automatically verify your ownership of the domain you want a certificate for and fetch that certificate from the Let’s Encrypt service.

The CertBot website has instructions for all kinds of web servers and server operating systems. You can visit that site if you need further help for a particular combination, but in this article I’ll show you how to get a certificate on Ubuntu Server for Apache (that’s what I use). To install the CertBot application, run:


sudo apt-get update
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:certbot/certbot
sudo apt-get update
sudo apt-get install python-certbot-apache

Once certbot is installed, you have two options for fetching a certificate. If you only have one website with a basic configuration, you can have CertBot automatically configure Apache to use the SSL certificate it will generate. This is done using:

sudo certbot --apache

That’s all you need! You can now visit your website and you should see a little green lock in your browser indicating that you’re connected via HTTPS with a valid certificate.

Using HTTPS to Keep Your Users Secure

It’s important to encrypt traffic between your server and the clients browsing your site so that sensitive information cannot be snooped or manipulated by someone on the network. This article discusses why you should be using HTTPS for your websites, and how to get a free certificate with EFF’s CertBot and Let’s Encrypt. The Apache configuration is done automatically using the commands shown here.

However, if you’re like me, you may prefer a little more fine-tuned control over your system configuration. The next article will go more in depth on hand-tuning Apache for your needs – we’ll cover certificate-only CertBot usage, making SSL virtual hosts, and forcing secure connections.

Avoiding “Mixed Content” Issues In Your Client-Side Scripts

Overview

Web applications are particularly vulnerable to security issues, given that the internet is one giant, unsecured network that anyone can access. One of these major vulnerabilities is man-in-the middle attacks, where a malicious user accesses content sent between the client (the machine receiving web content) and the server (the machine providing web content).

Fortunately, the widespread adoption of SSL/TLS has improved the security of web pages by encrypting content between client and server and making it difficult for man-in-the-middle attacks to occur. It is important, however, that web pages and web applications are properly marked up and coded to avoid having content sent unsecured.

Avoiding mixed content for greater security

What is mixed content?

Mixed content refers to unsecured (HTTP) content loaded within a secure page (HTTPS). For example, this content could be an image, video, or audio file requested by the page with an unsecure URL. This could also be a script loaded by the page over HTTP. Scripts loaded unsecurely can potentially be more dangerous than media files like images.

Why is mixed content a problem?

Privacy concerns

Mixed content can create several problems for an end user of your website. The first issue is that of privacy. When a user loads a page over SSL/TLS (HTTPS), the only thing someone snooping on the network (a “man-in-the-middle”) can see is the domain of the page being requested. For example, a if user visiting https://somesite.com/sensitive-article, the only thing a snooper would see is that the user is requesting content from somesite.com. They can’t see the particular page the user is requesting. However, what if the page sensitive-article requests an image over HTTP? Then the snooper could see the unencrypted image, possibly giving them clues about what the end user is viewing!

When browsing with HTTPS, the user has some expectation of privacy between his or her self and the server. Mixed content breaks that by exposing some parts of the page in plaintext as they travel across the wire, potentially giving some clues as to what the user is viewing.

Code security concerns

A second concern with mixed content is that a script sent in plain text could potentially be modified en-route to the client. If a script is sent over HTTP, a malicious user could potentially intercept that script and inject some malicious code of their own. This code could be used to collect sensitive information or trick the user into visiting an illegitimate website (very common with phishing scams). The end user expects a legitimate script sent securely from the server (they requested a page over HTTPS, after all), but in a mixed content scenario this code could be intercepted and modified by an attacker.

How to avoid mixed content

Use relative links as much as possible

One of the easiest ways to avoid mixed content in web pages is to use relative links. With a relative link, the browser will use the protocol the page was requested with by default, no hard-coding of a protocol required. For example, loading your images from a subfolder called photos can be done with a relative path like so: src="photos/myimage.jpg". If the user requests your page over HTTPS, the image will be loaded over HTTPS as well.

Use // to auto-detect the protocol for outside links, or always use HTTPS

Sometimes you may want to load content from outside your own domain and your own web server. Many developers like to load scripts like jQuery from Content Delivery Networks (CDN’s) rather than store those scripts on their own server. In this case, you can use // in front of the URL instead of a hard coded protocol like http://. This will match the protocol the page was requested from. For example, src="//cdn.jquery.com/some-jquery-version.js". Better yet, if you know the site provides content over HTTPS, just use that by default!

Avoiding mixed content is fairly straightforward!

While mixed content can pose several issues for user privacy and application security, it is thankfully trivial to mitigate. Avoiding mixed content problems is as easy as fixing the links in any pages your site provides. If you’re not sure your pages avoid this problem, the developer console in a modern browser like Firefox will tell you if you’re trying to load insecure content. Many browsers (including Firefox) even block that content from loading. Be sure to be mindful of how media and scripts are requested in your web applications, and your users will be more secure for it!

Starting to Secure MySQL (Part 2 – Principle of Least Privilege)

Overview

In the last article, we discussed a solid starting point for securing MySQL installations using mysql_secure_installation. Running this script is great way to eliminate some default security flaws with MySQL, but it is just a starting point.

In MySQL, multiple database users can be created for different hosts and privileges can be set for different databases. In order to further secure each database created for a particular application, it is important to look at how users are created and what privileges are granted on that database.

Creating users for the database

Why isolated users improve security

Say you’re creating a web application and need to supply a database user and password in your backend code so you can connect to the database and select some records to display. It might be tempting to just plug in the root user. After all, root has all the privileges you need and it’s already built in, so it should just work.

But what happens if say, PHP isn’t properly configured and a user requests your webpage from the server? Now they have the root username and password since the source code was returned to them instead of executed! A malicious attacker could gain access to your server through another vulnerability and wreak havoc on your databases.

This is the first part of using the “Principle of Least Privilege” to secure your database. It’s a really good idea to have a user account (or multiple user accounts) with access to only one database. Don’t configure this user to have privileges on any other database. In the event that user’s credentials are compromised, the damage can at least be isolated to only one database.

What about hosts?

Another important consideration for creating users is where they have privileges to access the database from. With a database engine like MySQL, it’s possible to connect directly to the database from a remote location. You could configure a user for your web application’s database to connect from any host, so you could always work on the database from a coffee shop or a friend’s house.

However, this is another important place to limit privileges to the least amount needed to be operational. In the above scenario, if an attacker gets the credentials for your database user, they could log in from their bedroom in their PJ’s and start messing with your database!

Instead of allowing a user to connect from any host, consider the only place a user should be able to connect from to make the database functional. In a web application scenario localhost is often a good choice because it limits the user to connecting from the web server the application is running on. Now if an attacker gets access to your user’s credentials, they would have to also gain shell access to the server. This presents an additional obstacle between them and your database that can be difficult to overcome with a properly configured server.

How to create a user for a specific host

In MySQL, creating a user for a specific host only requires one command that is fairly straightforward. For example:

CREATE USER "MyUser"@"localhost" IDENTIFIED BY "MyLongAndStrongPassphrase!";

The above example creates a user called MyUser that can only log in from localhost. The IDENTIFIED BY part tells the database engine what password the user will log in with. Be sure to create a unique, strong password for this user.

Configuring user privileges

What privileges should a user have?

Much like the question of “what hosts should I allow my user to connect to?”, the question of what privileges a user should be granted comes down to the least amount of privileges needed for the user to do its job.

Just like it’s easy to use the root user for everything, it’s easy to grant all privileges to a user for your database so everything just works. But again consider the scenario where an attacker gains access to that user’s credentials. Even if that user can only log into one database, they could drop tables or modify records maliciously. Imagine an attacker being able to modify the price of an expensive item in your web store to any amount they want!

Now in that scenario, the only access your web store needs to the database is to SELECTthe prices of available items. Why give the user access to update or delete records at all? This is the core of the “Principle of Least Privilege”. Only give users the minimum amount of privileges they need to do the job they do. In our web store example, that means they only need the ability to do a SELECT on the database and nothing more.

How to grant user privileges in MySQL

In MySQL, specifying user privileges is done using the GRANT statement. For example:

GRANT SELECT ON MyWebStore.PRICES to "MyUser"@"localhost";

The first part of this command (the GRANT SELECT) tells the engine that this user will get the SELECT privilege only. There are a bunch more, but some of the common ones include INSERT, UPDATE, DELETE, SELECT, DROP, CREATE.

The second part of this command (MyWebStore.PRICES)specifies which database and which table(s) the privileges will apply to. If we’re following the “Principle of Least Privilege”, this user should only be granted these privileges on one database. As for tables, you may want to specify one table or all tables in the database depending on your application. You can apply the privileges to all tables using the * wildcard operator, like MyWebStore.*.

The final part of this command ("MyUser"@"localhost") specifies which user (on which host) the privileges will apply to.

There is one final step, and that’s to immediately apply all these privileges. In MySQL the command FLUSH PRIVILEGES immediately commits the changes to the database.

Being mindful

Applying the “Principle of Least Privilege” is another important methodology for securing MySQL database installations against attackers. It’s important to be mindful of the needs of your application when creating users and granting privileges, so that each user has the minimal amount of access needed to do their job. By isolating users to one database, one or few hosts, with a minimal amount of privileges, we can isolate and minimize the potential done by an attacker that gains access to database user credentials.

Starting to Secure MySQL (Part 1 – mysql_secure_installation)

Overview

Databases should always secured, regardless of the environment. However, database security is particularly important for web-facing applications. MySQL is a very popular database especially for web development, as a part of the LAMP (Linux-Apache-MySQL-PHP) stack.

Database security can be a complex topic, but there are a few basic security measures everyone can and should use when building applications that use MySQL.

The mysql_secure_installation script

Fortunately, the developers of MySQL wanted to make security accessible. They’ve added a great security script to the package called mysql_secure_installation. This script allows the user to quickly attack some basic security flaws in the default MySQL installation.

What it does

This script addresses several issues with the default installation:

  • Sets a password for the root user
  • Disallows root login from remote hosts
  • Removes anonymous users
  • Removes the test database

Why these measures improve security

    • Sets a password for the root user

For any system with an “administrative” user, it is paramount that not just anyone can access that account and the privileges associated with it. Giving the root user a secure password prevents an unauthorized person from modifying databases, users, or user privileges

    • Disallows root login from remote hosts

Similar to setting a secure password for root, preventing root login from remote hosts is an extra measure that prevents an unauthorized user from having access to do anything they want to your database installation.

  • Removes anonymous users

By default, MySQL allows anyone without a user account to connect to the database engine for testing purposes. This is undesirable because even though anonymous users may not have the same privileges as root, it could allow an unauthorized user to snoop around your MySQL installation and see how databases are defined, what users are available, etc. Removing anonymous users means that only an authorized user with a password can connect to MySQL.

  • Removes the test database

Another default feature of MySQL is a test database that anyone can connect to. Since anyone can connect to the database, this gives an attacker a starting point for access to your database if there are any vulnerabilities. We don’t want any databases that allow unauthorized users to connect.

How to use this feature

Since this application comes bundled with MySQL, it is fairly straightforward to run. At a terminal window, execute mysql_secure_installtion. You’ll see several prompts for each security enhancement:

Change the root password? [Y/n]
Remove anonymous users? [Y/n]
Disallow root login remotely? [Y/n]
Remove test database and access to it? [Y/n]
Reload privilege tables now? [Y/n]

The last item ensures that the changes made by the script take effect immediately. It is best practice to answer Y (yes) to all of the prompts!

Next Steps

All of the features provided by mysql_secure_installation are a great starting point for securing your MySQL installations. In another blog, I’ll address how to set user privileges for your database following the “Principle of Least Privilege”.