Playing With Blocks: The Basics of Blockchain Databases (Part 1 – Blockchain for Everyone)

Overview

Blockchain is the latest and greatest buzzword in the information technology world. From open source, decentralized cryptocurrencies like Bitcoin to traditional financial institutions, it seems as though everyone is dying to create and release their own blockchain based applications. But what is blockchain? Why is it such a popular concept, and what is it actually good for? Let’s discuss.

What and why: Blockchain simplified

What is blockchain?

So you’ve heard that blockchain is going to revolutionize everything, but what is it exactly? Let’s cut through the hype and discuss the technical foundations of Blockchain.

A blockchain is a distributed, cryptographically secured database that focuses on making historical data immutable.

In a traditional database, information is often stored on one or a few machines, controlled by a central authority. Access is controlled by this authority (think IT administrator) and the data is kept secure by granting credentials to modify that data to a select few trusted parties. By contrast, a blockchain database is governed by what is called distributed consensus, using mechanisms such as proof-of-work. For more information on proof-of-work, you can read my series of articles on it.. The important thing to note is that (in general), no one central person or authority decides what data is “verified” in a blockchain, a community of network nodes and software does.

If anyone can modify the data in a blockchain rather than a trusted party, then how is this consensus on what is correct achieved? Again, the secret lies in the science of cryptography. Through a mechanism like proof-of-work, a cryptographic puzzle is solved by software with some incentive to do so. In Bitcoin, the node that solves this puzzle is granted new currency. The real magic, however, is the fact that any other node in the network can verify that this answer is correct in a split second, so anyone can independently verify that a block meets the cryptographic standards set by that blockchain’s protocol.

You may be wondering how the cryptography in each block keeps the overall blockchain secure. This is the question of immutability, or how easy it is to modify the history stored in the blockchain. Blockchains solve this by cryptographically “linking” each block to the previous block, thereby making each individual block a critical part of the history stored by that “chain”. Each block has a header full of useful metadata about that block – a timestamp, a “summary” of the included data or transactions, a difficulty target and nonce for mining (part of proof-of-work), and the hash of the previous block’s header. Each block header is run through a one-way, cryptographically secure function called a “hash function” that creates a unique digital fingerprint for the data.

Immutability is achieved when combining the proof-of-work consensus mechanism with this system of chaining each block together. In order to create each block, the cryptographic puzzle solved by the proof-of-work algorithm allows a unique block header hash to be generated. It is computationally difficult to get this value, but very easy to verify it is correct. Now, it’s not that hard to re-solve that hard problem in a matter of minutes…it would be easy to create a fake block at the top of the chain. But what about 10 blocks back? Well, since each block contains a hash of the previous block header that is generated by solving this hard problem, you would have to now fake history for ten whole blocks! It is exponentially more difficult to do so the further back in the chain that you go. Unless you can truly do the work required to fake history in a blockchain, any independent network node could easily see that the rest of your history on forward is invalid. The immense difficulty of “faking” history in a blockchain gives it the most important property it has, its immutability.

Cool, so why is it useful then?

By far the most important aspect of blockchain, in my opinion, is its ability to decentralize applications. With a traditional database, a central authority has to be trusted, which can be a disadvantage in applications that are controversial or have high incentives for fraud. For example, previous attempts at digital money like DigiCash had central services for issuing currency and validating transactions. These were promptly shut down by governments that didn’t like independent currencies very much.

With blockchain, it is possible to have things like completely peer-to-peer money as with Bitcoin, Litecoin, and countless others because no central government or individual has to be trusted! The network is secured by math (cryptography) rather than trust thanks to the blockchain. You don’t have to trust anyone to not defraud you of your money, because the math cannot lie about who owns what.

The other critical function of blockchains beyond decentralization are the preservation of history. Because blockchains are immutable, they can be useful for keeping things like medical records, property transactions, court histories, and more secure from malicious tampering like a traditional database. This does rely on some degree of decentralization, but even within a single company a blockchain is far harder to tamper with than a traditional database.

Cool, now I want a blockchain!

Blockchains are a fascinating and novel way to handle problems with traditional databases in certain applications. Thanks to the decentralized and cryptographically secure nature of these databases, it’s possible to create peer-to-peer applications that don’t require trusting a third party – a key problem to solve for concepts like digital money. As well, their immutability makes them useful even beyond the first few money-centric applications that existed – they may be coming do a real-estate authority, doctor’s office, or justice system near you!

Round and Round – Using Generators in Python

Overview

Most all modern programming languages support constructs for storing lists of data – think C++ arrays, Java ArrayLists, and Python lists. Any time we have a list of data, it’s often necessary to use loops to perform some operation on each item in that list. Again, most modern languages support ways of looping over these lists of data, most commonly “for” loops.

In Python, it’s trivial to iterate over a list using a for loop. However, Python lists are stored in memory. What happens if the data set you need to operate on is large and therefore memory inefficient? Python offers constructs called iterators that allow one to loop over data sets that are not stored in memory. Some iterators are built in for things like file reading, but you can easily create your own custom iterators using a concept called generators!

Iterators? Generators? Combobulators?

Traditional List Iteration

It’s easy to create a list of numbers in Python an loop over that list. Let’s say for example, we want to create a list of multiples of 2. We’ll store 5 numbers in this list. For each number in this list, we’ll just print it out to the screen for now:

def print_multiples():

    multiples = get_multiples()
    for m in multiples:
        print m

def get_multiples():

    return [ 2, 4, 6, 8, 10 ]

if __name__ == "__main__":

    print_multiples()

Let’s step through this code a bit and explain how it works. When we enter the print_multiples function from main, the first call multiples = get_multiples() assigns the return value of get_multiples() to the multiples variable.

The multiples variable now stores an in-memory list with all of our multiple of 2 values – 2, 4, 6, 8, and 10. We next go to the for loop for m in multiples:. These Python for loops are slick and fairly straightforward – for each go around the loop, the next value in the list is stored directly in the variable m. The loops continues until each value in the list is exhausted.

The output looks like this:

python test.py
2
4
6
8
10

So what’s a generator look like?

Now let’s try this code again, using Python’s generator construct. Here’s what that looks like:

def print_multiples():

    multiples = get_multiples()
    for m in multiples:
        print m

def get_multiples():

    for i in range(1, 6):
        yield i * 2

if __name__ == "__main__":
    print_multiples()

Notice our output is the same:

python test.py
2
4
6
8
10

This code requires some closer examination to understand how it works. When we assign multiples = get_multiples(), we don’t a assign a list, we assign what’s known in Python as an iterator. An iterator object exposes a next method that allows for loops to retrieve each sequential item in a list or other iterable object.

When we enter our for loop this time, we don’t iterate over the list – instead our iterator uses the get_multiples() generator function to retrieve each item one by one. The first time we go around the for loop in print_multiples, we enter the get_multiples function and enter its for loop.

Now, you’ll notice a different keyword being using in get_multiples – instead of returning an entire list, the function only yields one item, the result of i * 2. The value is passed through the iterator’s next function and printed to the screen. The next time around the for loop in print_multiples, the code goes to the same spot the value was yielded from in get_multiples. The return keyword returns code control to the caller, whereas the yield keyword only temporarily yields control back to the caller until the next time the caller needs a value from the iterator. The get_multiples function’s for loop continues, and yields the next multiple of 2. The function will continue yielding values of 2 until it’s for loop ends, yielding 10.

Cool, so why would I want to use generators instead of a list?

Our trivial example makes the use of generators pretty clear, but it doesn’t explain why they’re actually useful. Why all the extra complexity to avoid storing a whole 5 numbers in memory?. The use case of generators goes far beyond small lists of numbers.

First, what if we wanted to display the first one billion multiples of 2? In that case, it becomes much more expensive to store one billion integers in memory – it would eat up over a gigagbyte!.

In real-world software engineering applications, we often use generators to deal with large datasets even beyond simple calculations such as this. Generators can be used to operate on large database queries or data read from files on disk, where it would be inefficient or even impossible to read the information into memory.

Generate your own generators

In this article, we’ve explained how to go beyond memory-stored lists of information and create our own generators. Instead of hogging memory for large data operations, we can make our own memory-efficient iterators. Now when you have large calculations, database queries, or file reads to worry about, you can keep your memory usage low and your code easy to understand thanks to Python!

BIP39 Mnemonics Made Easy (Part 2 – The Tech of Bits to Backups)

Overview

In the last article, we discussed a high level overview of BIP39 mnemonics and their value as a simplified backup tool. Mnemonics make it much easier to take a single seed, back it up, and ensure access to an entire wallet of private keys, addresses, and transactions. But how do we go from a random set of bits to a list of words? Let’s discuss the technical side of BIP39.

Bits to Backups – The Steps for Generating a Mnemonic

First, Chaos

In order to generate a good seed, a fair amount of entropy or “randomness” is desirable. Good random number generators are hard to get, but modern OS’s like Linux do a pretty good job of sourcing entropy from the user and hard drives, and something like /dev/urandom on a daily driver machine should be sufficiently secure for generating the entropy we need.

Now how many random bits do we need? The BIP39 standard specifies 128-256 bits of entropy to be used for generating the seed. This will correspond to 12-24 words later on when we “map” the entropy to the words.

First, a warning: DO NOT USE any of the examples in this article to generate a wallet – your funds will be stolen!

With that out of the way, let’s look at an example. First, let’s generate 128 bits of entropy using os.urandom() in Python. Represented as binary, our entropy looks like this:

10111110011001010101110111001111010100011111011010110001110101111011110111000101101001100011110100010100011101000011011011100000

Next, a checksum

In order to better secure the seed, we’ll add a checksum to the end of the entropy. This makes it easier for wallet software to validate a backup seed.

To get the checksum, we’ll first take the SHA-256 hash of our entropy. Then, we take the first N/32 bits of the hash and append it to the entropy.

In our case, 128/32 bits gives us a 4 bit checksum size. In our example, the 4 bit checksum will be 0101. We’ll append that to the entropy to give us a 132 bit value:

101111100110010101011101110011110101000111110110101100011101011110111101110001011010011000111101000101000111010000110110111000000101

Dividing and our Dictionary

The final step of the process involves dividing our checksummed bits into “chunks” and mapping those chunks to the mnemonic words from the dictionary. The BIP39 standard specifies that the chunks will always be 11 bits long. So, we divide our 132 bit checksummed entropy into 12 chunks of 11 bits each:

  1. 10111110011
  2. 00101010111
  3. 01110011110

Now, each of these 11 bit chunks can be interpreted as an unsigned 11 bit integer value ranging from 0-2047. This “maps” to a word from the dictionary of 2048 words directly! These are standardized and listed in alphabetic order. So, we can take the 11 bit chunk as an index in the dictionary to extract the words we need:

  1. 10111110011 = 1523 -> salmon
  2. 00101010111 = 343 -> cliff
  3. 01110011110 = 926 -> inherit

The overall mnemonic we generate is this example turns out to be:

  1. salmon
  2. cliff
  3. inherit
  4. physical
  5. help
  6. type
  7. warfare
  8. regular
  9. dial
  10. photo
  11. asset
  12. scheme

Mnemonics – from Entropy to Dictionary Entries

The process of generating a mnemonic seed is both ingenious and straightforward. One can easily create a secure wallet seed of 12-24 words by generating some entropy, checksumming the data, and mapping to a standard dictionary.

I’ve written a project called MnemonicGen that generates mnemonics using these steps. Take a look at this project to see these steps implemented in Python. This code should be considered academic/experimental – use it to create wallets at your own risk. Other proven implementations such as Ian Coleman’s BIP39 are also available to study.

Happy generating!

BIP39 Mnemonics Made Easy (Part 1 – Backups, Simplified!)

Overview

A critical component of cryptocurrency security is the ability for users to easily and efficiently backup the private keys that control access to their funds on the blockchain. Without one’s private keys, any funds in a user’s addresses are irrecoverably lost.

However, the nature of early wallets made backing up one’s private keys a regularly-scheduled necessity, unwieldy and annoying for most users. BIP39 and associated Bitcoin Improvement Proposals have thankfully simplified private key backup by introducing HD wallets and mnemonics.

Newer wallets explained

HD? So like, High Definition? No, Hierarchical Deterministic!

In the early days of Bitcoin and other cryptocurrencies, address generation was done non-deterministically. For each new address needed, a private key would be randomly generated and stored in the wallet’s backup file. Most wallets would pre-generate some addresses in the initial wallet file, but every new private key/address introduced into the wallet meant a new backup would be needed.

For privacy reasons, it is recommended to use a new address for each transaction. And for every new address generated since the last backup, a user would need to create a new backup to avoid losing recent funds in the event of a wallet resortation. Even for “power users”, backups became an unwieldy and annoying task.

Enter BIPs (Bitcoin Improvement Proposals) 32 and 44. In summary, these proposals introduce HD (“Hierarchical Deterministics Wallets”). These wallets only require one seed to be randomly generated. And from that seed, all the private keys and addresses a wallet needs can be derived in a tree structure; all associated with the initial seed. Since the private keys and addresses can be regenerated from the seed, one only has to back up the seed to recover all of their private keys, addresses, and transactions for a wallet. Much better!

Introducing Mnemonics – Simplifying Seed Backups

The ability to generate an entire wallet from one seed drastically simplified wallet backups, and therefore has improved the ease by which users can keep their funds safe. However, a seed is still just a random binary value. Represented in hex or Base64 encoding, it is still fairly easy to misread/miswrite a character and accidentally create a useless wallet backup.

To truly simplify the task of backing up a wallet seed, some developers in the Bitcoin space proposed a system that allows the translation of the binary seed value into English words that can be more easily transcribed or even memorized to secure access to one’s funds. This proposal, given the designation BIP39, was written by Marek Palatinus, Pavol Rusnak, Aarone Voisine, and Sean Bowe.

What does a mnemonic look like?

Mnemonics don’t use just any set of words. These words are carefully chosen to avoid ambiguity and make transcription easy, so that a user doesn’t accidentally create an incorrect backup.

There are a total of 2048 words in the dictionary, and a wallet mnemonic contains 12-24 words. The last word contains a checksum validating the other words in the list, making it easy for wallets to validate a backup.

Here is a sample Bitcoin or Bitcoin Cash BIP39 mnemonic:

  1. army
  2. van
  3. defense
  4. carry
  5. jealous
  6. true
  7. garbage
  8. claim
  9. echo
  10. media
  11. make
  12. crunch

WARNING, DO NOT use this seed for a wallet. A seed must remain private, and your funds will be stolen! This mnemonic is excerpted from Andreas’ Antonopoulos’ Mastering Bitcoin to further discourage its use – millions of people have access to this wallet.

Now how do you get one of these fancy mnemonics for a wallet? Most modern wallet software will generate this for you when you create a wallet. Then, all you need to do is write down the phrase and store it in a secure location to backup access to your funds if your wallet device is lost or stolen.

Alternatively, a mnemonic can be generated by a separate tool and imported as a backup into the wallet software. I’ve written a generator called MnemonicGen that produces standard phrases that can be imported into any modern HD wallet that supports BIP39. Keep in mind that this particular project is meant to be academic/experimental and may not be sufficiently secure for your needs. But other mnemonic generators like Ian Coleman’s are widely used and well-vetted.

Mnemonics – Backups Made Better

With BIP39 mnemonics, Bitcoin newbies and power users alike can easily create and backup secure wallets without the need to keep a schedule or deal with unwieldy binary data encoding. This modern standard is implemented in widely-used and accessible wallets like the Bitcoin.com wallet, Electron Cash, Blockchain.info, and more. I would personally advise upgrading your cryptocurrency experience by using an HD wallet, simplifying your security practices and keeping your funds safe!

In the next article, we’ll discuss the technical workings of BIP39, showing how we can go from a random seed to a set of words in a few fairly straightforward steps.

Proof of Work, Explained (Part 2 – A Hash Bash for Techies)

Overview

In the last article, we looked at the overall idea of proof of work and its applications. That article covered the origins of this concept, how it works at a high level, and some of its applications. Now, let’s take a look at the technical inner workings of these algorithms.

In a nutshell, proof of work involves the use of hash functions. These one way functions form the basis for a difficult to solve, but easily verifiable computational puzzle as a way to prove that one did some amount of desired computing work.

Proof of Work, the Technical Perspective

Hash Functions

First, we need to understand a bit about hash functions and why they form the basis for proof of work. A hash function is a one-way function that takes some input of any size and outputs a consistently sized set of bits. The two most important characteristics of hash functions are that they:

  • Are one-way – you cannot take an output and find the input without brute force guessing
  • Have unique outputs for every possible input (if the hash function is a good one!)

These two properties are critical for proof of work. First, the one-way nature of these makes it so that brute-force is required to find some desired output. Second, the desired one-to-one input/output property makes it so we can easily verify the solution once we have one.

An Overview of the Algorithm

Hashing and Binary and Difficulty Targets, Oh My!

Proof of work builds on top of the properties of hash functions by realizing that as a stream of bits, hash outputs actually represent binary numbers. For example, an 8 bit hash 00001000 represents the decimal number “8”. Now remember that hash outputs can only be matched to a particular input by using brute force to guess.

Using these properties, proof of work takes a pretty ingenious approach to making a user do some amount of predetermined work – it makes them look for a hash that, interpreted as a number, is less than some target value!

This is where the idea of difficulty comes in. Let’s say you want the user to find some input where the hash value, when representing an 8 bit number, has two zeros in the front (00101010, for example). Now imagine you want the user to find an input that gives a hash with four zeros in front (00001011). Which one takes more guesses to compute? It turns out that the smaller the “difficulty target” value, the more guesses (and more computing time) it takes to find an input that gives the desired hash output. This is the fundamental basis for proof of work. It can be statistically predicted that a certain difficulty target will take roughly some amount of guesses (and therefore computing time) to find. So the smaller the difficulty target, the harder the puzzle.

The nonce value

Now since hash outputs map one-to-one with some input, how can we prevent the worker from just using a dictionary to find an input that meets the difficulty? Here is what makes proof of work truly proof – we always create a hash input unique to the problem we’re trying to solve, using an applicable message and a random guess we call a nonce.

See, for our proof of work to truly require the desired amount of work, we always start with a unique message for the problem. In the case of Bitcoin, our message is what is called the “block header” – a chunk of data containing information about the transactions included in the current block. In an anti-spam application, this message would be something like a forum post or the contents of an email message. Since this message is unique, a dictionary cannot be used to guess a hash that meets the difficulty target.

The worker then has to take the message plus a random number guess called the nonce, and run that combined string through the hashing function. If the hash output isn’t less than the target number, then the worker increments that random number guess concatenated to the message and tries again – and again and again until the right output is found.

Verifying the nonce

Once a nonce is found, another node or server can very easily verify that the solution (the nonce) is correct. All the verifying party has to do is take the original message plus the nonce value found by the worker and run that through the hash function. Since a hash input will always give the same output, it only takes one step to verify that the worker’s nonce is in fact a correct solution to the proof of work problem.

A Practical Example

Let’s take a look at an example anti-spam proof of work problem. Let’s say a user wants to contact a site owner with the message “Hello”, and the site owner wants the user to do some proof of work before sending that email. The site owner specifies a difficulty target of 2^240. This difficulty target can be any 8 bit number in this case, but a power of two is easy to work when building an application. This system uses the compute-intensive SHA-256 hashing algorithm for its proof of work. Here’s what the steps would look like:

Worker (Client)

  1. Uses “Hello” as the message
  2. Starts guessing a nonce with 0 – the hash input is the string “Hello0”
  3. The SHA-256 output of “Hello0” (in hexadecimal format) is 80878c5b013ba72c0d2b7e8f65868649cbdb1e7e7a8c8a07537d6b3619e4e32f
  4. Clearly, this output is greater than the difficulty target of 2^240, which would have three prepending 0’s in hexadecimal: 0001000000000000000000000000000000000000000000000000000000000000
  5. Increment nonce to 1, and try again. This continues until an appropriate nonce is found
  6. The client finally finds a nonce that works with the value 9172. The SHA-256 hash of “Hello9172” is 00001f2e9f8f74117b4178eb04b368c807f906ae2a07bece562266cbc9adff3c, which is less than the difficulty target of 0001000000000000000000000000000000000000000000000000000000000000 (2^240)
  7. Since the client has a nonce guess that meets the difficulty target for this unique message, it now has proof that it did all that computing work!

Verifying Party (Server)

  1. Take the message for this problem, “Hello”, plus the client’s found nonce, “9172” and pass “Hello9172” through the SHA-256 hash function
  2. Since hash functions produce the same output for any input, we get the same output the client found: 00001f2e9f8f74117b4178eb04b368c807f906ae2a07bece562266cbc9adff3c.
  3. Since the above output is indeed less than the difficulty target 2^240, the server has now verified that the client did the desired amount of computing work to find the nonce. The message can now be sent.

Proof of Work – Hashing for a Cause

These algorithms put the properties of hashing algorithms to new and innovative uses, particularly in the incredible space of cryptocurrencies. Proof of work takes the one-to-one input/output and irreversible properties of hash functions and uses them to create difficult to solve, easy to verify computing problems. This simple but interesting bit of math and computer science powers new approaches to interesting challenges. Proof of work can be used to help prevent spam in a new and unique way – by making large-volume spam uneconomical for its propagators. Arguably at its most revolutionary, proof of work powers the transaction verification and currency issuance components of cryptocurrencies like Bitcoin and Litecoin, allowing for an entirely new form of money free from centralized institutions.

Proof of Work, Explained (Part 1 – POW for Non-Techies)

Overview

Personally, I’m fascinated by both the technical and financial implications of cryptocurrencies like Bitcoin, Bitcoin Cash, and Litecoin (to name a few). The way these currencies work is a complex topic, with lots of moving parts to discuss. One of the core components of cryptocurrencies like Bitcoin is the mechanism by which an entirely decentralized system of money can securely verify transactions as well as issue new currency, all while preventing fraud and issuing at a predictable rate.

Most of these currencies solve this problem using a concept called “proof of work” by which nodes solve a computationally difficult but easily verifiable mathematical problem. This concept goes beyond cryptocurrencies as well, and actually originated as an anti-spam measure.

Proof of Work – The 10,000 foot view

What is Proof of Work?

Proof of work, fundamentally, is the solving of a computationally intensive mathematical problem. This problem has two very important properties – the solution to the problem is both:

  • Difficult (computationally intensive) to find
  • Easy to verify once found

The idea is this: for an application like cryptocurrency or anti-spam, a “node” or computer is challenged to find a solution to this puzzle. The solution can only be found by brute-force guessing. However, once the solution is found, all the other nodes on a network or a server can verify the solution in one step. Since the answer can only be found by brute-force computation but can easily be verified as correct, the solution to the problem serves as proof that a certain amount of computing work was done – hence the term “proof of work”.

Why is it Useful?

First, let’s look at the original application of proof of work: anti-spam. The original idea was implemented in a system called HashCash, invented by Adam Back. Back’s system works like so: Before performing an action like posting to a forum or sending an email, the user of a site is made to do a small proof of work problem. This problem only takes half a second or so of computing to solve, and of course is almost instantaneous for the system to verify. For a legitimate user of a forum or email system, the half second of computing is no obstacle to completing his or her task. However, for a spammer trying to send hundreds of thousands of spam messages, the task suddenly becomes very uneconomical since it would tie up their computer for minutes or even hours at a time!

Now how does this system apply to cryptocurrencies like Bitcoin? In this system, transaction verification and currency issuance is totally decentralized – no third party is trusted to create new value tokens or verify that transactions are legitimate. This of course presents a massive fraud-prevention challenge – how can the network ensure that malicious parties don’t create “counterfeit” currency or send through transactions that aren’t valid?

Proof of work helps to solve this problem. On the Bitcoin network, new transactions are broadcast to computers running what is called “mining” software and accumulated into “blocks” of transactions that will be validated at one time. Every time a new block is waiting to be verified, all the nodes on the network running this software essentially “race” to solve a proof of work problem first. The Bitcoin network adjusts the difficulty of this problem so that about once every ten minutes, one miner wins the race and finds a solution to this problem. Once one node finds the answer, it tells all the other nodes on the network that it’s found an answer, and the other nodes can instantly verify that the answer is correct.

The node that finds proof of work for this block is rewarded with brand new Bitcoin (issued at a predictable rate) as well as all the transaction fees in that block. This computationally expensive proof of work problem creates an excellent system of economic incentives- the reward of new Bitcoin drives miners to to verify transactions are correct, and also make fraud more expensive than legitimate mining. If a miner were to try and cheat, all the other nodes running the legitimate software would instantly reject the new block since it doesn’t meet the rules of the network, and all of the time and computing power of the malicious node would thus be wasted.

Proof of Work – Powering Cryptocurrency and Thwarting Spammers

The idea of proof of work has incredible value for multiple applications. This system allows computing to be used as a precious resource in a purely digital economy; a way to both secure monetary transactions and prevent the waste of resources like time and storage space. In an anti-spam setting, proof of work allows the operators of a curated space to reduce the impact of spam on their systems, reducing wasted time, clutter, and storage space. In a cryptocurrency application, proof of work allows the secure verification of transactions and the issuance of new currency without the need for a trusted third party, the often fatal flaw in fiat systems.

The applications of this technology are incredibly interesting. In the case of cryptocurrencies, I would say its application is part of a system that is revolutionary. Now, as a software engineer, I find the actual technical workings of proof of work to be even more interesting than the surface description. In the next article, I’ll walk through how these algorithms work from a more technical perspective.

Bitcoin as “Digital Gold” is Bad for Crypto Adoption

“Digital Gold” vs. “Digital Cash”

The core Bitcoin network has a scaling problem, and has had this problem for a while now. As more and more transactions try to fit in Bitcoin’s 1MB blocks (once every 10 minutes), network fees have skyrocketed to $5-10 dollars, and that’s even a little low if you want your transaction confirmed within an hour or so.

One response to this problem, especially from supporters of the Bitcoin Core roadmap, is to ignore the problem to some extent. As Bitcoin has seen more attention over the last few years, the price has risen dramatically. As a result, many are now claiming that Bitcoin is not meant to be a “means of exchange” or a form of “digital cash” for day to day transactions. Rather, their viewpoint is that Bitcoin should be seen as a “store of value” or “digital gold”.

To be clear about my biases in this space – I think cryptocurrencies are at their most interesting and valuable as a means of exchange; a way to do truly global, peer-to-peer, decentralized cash. I do not care at all for the risky speculative investing that goes on in the crypto space; I believe these currencies should be used or held in small amounts, allowing one to learn more about this fascinating technology and spread adoption.

With that said, I don’t have a problem with a digital currency being used as a long term store of value like a “digital gold”. There is plenty of room in the crypto space for currencies that solve different problems in different ways. I do, however, think there is a big problem with Bitcoin being the currency of choice for that use case.

The Bitcoin Brand, and the problem with Bitcoin as “Digital Gold”

Let’s be honest, fellow crypto nerds. How many people in your daily life have actually heard of Bitcoin? And how many of them actually understand it, at least at a high level? How many people actually own some and use it? If you’ve got the same variety of people in your life as I do, the percentage isn’t that high.

Now how many of that small subset of Bitcoin-aware people in your life know about Bitcoin Cash? Ethereum? Litecoin, Vertcoin, Monero, Dash? It’s an even smaller percentage, surely. Even if they’ve heard of them, do they understand how these alternatives to Bitcoin solve different problems? We’re down to a sliver of people that understand and adopt these different currencies beyond Bitcoin.

Herein lies the crux of the problem:

Bitcoin is the defacto cryptocurrency. It is the face of digital money, the storefront, the brand, however you want to refer to it.

Whether anyone likes it or not, Bitcoin is what most people hear about first when they hear about cryptocurrencies. And with a now large chunk of the Bitcoin community marketing this as “digital gold”, we have the potential to miss out on opportunities for widespread adoption in the coming years.

More and more businesses are going to become interested in adopting cryptocurrencies as a form of payment, they’re going to want to start with Bitcoin. It is the biggest after all, and the original value proposition we still see on bitcoin.org is “fast peer-to-peer transactions” and “low processing fees”. The reality is far from that, however. When the coffee shop owner realizes his or her customer will have to pay $10 to buy a $3 coffee, they’re not going to find Bitcoin usable for their business. How many of the already small percentages of crypto-curious entrepeneurs are going to take the time to understand Bitcoin Cash, Litecoin, or Dash? Many will probably say: screw this and return to business as usual with fiat.

Satoshi’s “Peer-to-peer electronic cash” – How Does Bitcoin Continue That Vision?

The problem is not a cryptocurrency as “digital gold”, the problem is Bitcoin as “digital gold”. The original promise of Bitcoin was indeed a global, peer-to-peer, low fee alternative to centralized payment processors like Visa, Mastercard, or PayPal. But as the Bitcoin community shifts away from that vision, they take the adoption of that vision with them.

With a functional implementation of the lighting network at least two years away, and the lack of press for already-scaled alternatives like Bitcoin Cash, Litecoin, etc., it concerns me that adoption of cryptocurrencies will stall. I don’t at all fear that they will go away or stop the ceaseless flow of innovation that we’ve seen since the advent of Satoshi’s brilliant whitepaper. But I do worry that Bitcoin’s current scaling problems and the community’s attitude toward it will lead to several years of stalled adoption in the mainstream. I do hope, however, the problem gets fixed soon and that I am wrong.

Configuring SSL with Apache and Let’s Encrypt (Part 2 – Manual Apache Configuration)

Overview

In the first part of “Configuring SSL with Apache and Let’s Encrypt”, we discussed why you should be offering HTTPS connections to your site, and how to get a free certificate from Let’s Encrypt. That tutorial shows commands that use CertBot to automatically configure your Apache web server with the new certificate.

However, if you’re like me you may want some more fine-tuned control over how your web server is configured. We can use CertBot to just retrieve a certificate for us and set up Apache virtual hosts by hand. You can even force connections to your site to use HTTPS all the time.

Configuring HTTPS with Apache By Hand

Finding your Let’s Encrypt certificates

In the case you want to hand-configure your server, you can ask CertBot to just fetch the certificates and leave Apache alone with:

sudo certbot --apache certonly

This will create a folder containing the files you’ll need for that domain. You’ll need these two files:

/etc/letsencrypt/live/mydomain/fullchain.pem
/etc/letsencrypt/live/mydomain/privkey.pem

fullchain.pem is the certificate itself, and privkey.pem is the certificate’s private key. Make sure you keep these safe, especially the private key!

You can then copy these two files to their respective directories in /etc/ssl. If you want, you can change the file names. I like to name mine with the site and use .crt/.key for the file extensions:

sudo cp fullchain.pem /etc/ssl/certs/mydomain.crt
sudo cp privkey.pem /etc/ssl/private/mydomain.key

Setting up Apache Virtual Hosts with SSL

The next step is to set up Apache virtual hosts to use our certificate. Virtual hosts are a great tool that allow you to manage multiple websites (with multiple domains) on one web server. They also allow you to configure things like which SSL certificates to use and which folders to serve for your site, so it is valuable to learn how they work.

Your Apache SSL virtual hosts files will most likely be in the /etc/apache2/sites-available directory. For my version of Apache, the SSL vhosts file is called default-ssl.conf.

Let’s take a look at what a basic SSL-configured virtual host definition looks like:


<IfModule mod_ssl.c>

 <VirtualHost *:443>

  ServerAdmin you@yoursite.net

  DocumentRoot /var/www/html/yoursite

  ServerName yoursite.net
  ServerAlias www.yoursite.net

  SSLEngine on
  SSLCertificateFile /etc/ssl/certs/yoursite.crt
  SSLCertificateKeyFile /etc/ssl/private/yoursite.key

  <FilesMatch "\.(cgi|shtml|phtml|php)$">
    SSLOptions +StdEnvVars
  </FilesMatch>
  <Directory /usr/lib/cgi-bin>
    SSLOptions +StdEnvVars
  </Directory>

  BrowserMatch "MSIE [2-6]" \
    nokeepalive ssl-unclean-shutdown \
    downgrade-1.0 force-response-1.0
  BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

 </VirtualHost>

</IfModule>

Let’s break down what this all means, especially the SSL part.

  ServerAdmin you@yoursite.net

  DocumentRoot /var/www/html/yoursite

  ServerName yoursite.net
  ServerAlias www.yoursite.net

This first block of information is the same as a virtual host for plain HTTP. This information tells apache what to do if someone comes to your web server from the domain name yoursite.net. If a client makes a request, Apache will serve files starting at the document root /var/www/html/yoursite. All of the files your site will serve should be contained in that folder. The ServerAdmin bit tells someone who they can contact if something is wrong. According to this Stack Overflow question though, that feature is deprecated so you may want to omit it.


  SSLEngine on
  SSLCertificateFile /etc/ssl/certs/yoursite.crt
  SSLCertificateKeyFile /etc/ssl/private/yoursite.key

  <FilesMatch "\.(cgi|shtml|phtml|php)$">
    SSLOptions +StdEnvVars
  </FilesMatch>
  <Directory /usr/lib/cgi-bin>
    SSLOptions +StdEnvVars
  </Directory>

  BrowserMatch "MSIE [2-6]" \
    nokeepalive ssl-unclean-shutdown \
    downgrade-1.0 force-response-1.0
  BrowserMatch "MSIE [17-9]" ssl-unclean-shutdown

This second block of code starts the configuration of SSL. It tells Apache that connections to this site via HTTPS can be accepted, and includes some information about what to do with certain browsers and CGI requests. You can safely leave the bottom portion alone; it is included by default and I’ve not found a reason in my experience to touch it.

The more important piece you’ll need to edit is the section containing SSLCertificateFile and SSLCertificateKeyFile. Make sure that those paths match up to the files you copied to /etc/ssl/certs and /etc/ssl/private. Those directives tell Apache what certificate file to use for incoming SSL connections, and the private key that will be used to decrypt data sent by the client.

Once you’ve got that file correctly edited, you can restart Apache using the command sudo service apache2 restart. If you get an error here, there is most likely a syntax problem in the default-ssl.conf file that you just edited.

Forcing HTTPS connections

Now that you’ve got a shiny, new, CA-signed SSL certificate configured for Apache, you may want to do away with HTTP connections altogether. Fortunately, Apache makes this fairly simple, again with the use of virtual hosts.

This time, you’ll be editing the HTTP vhosts file. In /etc/apache2/sites-available, my version of Apache has a file called 000-default.conf. Open this up in your favorite text editor, and add the following to each virtual host you want to force HTTPS with:

Redirect / https://mysite.net

This directive tells apache to redirect any plain text connection to the document root of that site to redirect to the document root of your site with HTTPS. One thing I’ve noticed is that this can cause issues with HTTP links to subdirectories like http://mysite.net/subdir, so you’ll want to update any links in your site to use HTTPS as well.

Other solutions include using the rewrite engine, but Apache discourages doing so for this use case.

Configuring Securely Encrypted Connections to Your Web Server

It’s now easy and free to get a CA-signed SSL certificate thanks to Let’s Encrypt and CertBot. Combined with the highly configurable Apache web server, it’s fairly straightforward to get your websites tuned up to use HTTPS with your desired site setup in mind.

Using secure connections is a great way to provide your users with increased security and privacy, preventing man-in-the-middle attackers from getting their sensitive information or interfering with the content you provide. Happy encrypting!

Configuring SSL with Apache and Let’s Encrypt (Part 1 – Let’s Encrypt and CertBot)

Overview

With the robustness of modern webpages, HTTPS is the way to go for almost any website. Most of the time an individual is online, they’re exchanging sensitive information like passwords and personal information. If you’re logging into a site with a password, a securely encrypted connection is a must.

However, even informational sites should use encryption! If you’re running a site for a small business, a topic that interests you, or even your own portfolio, your users can benefit from increased privacy and security when they connect using HTTPS rather than plain text HTTP.

Configuring SSL for a Website

Why SSL?

It’s important to roughly understand how HTTPS works, and why using it will improve the privacy and security of your users. From the 10,000 foot view, what SSL does is encrypt all the traffic between the client (the user’s browser) and the server (your web server). Everything beyond the domain name is encrypted, and therefore hidden from a man-in-the middle attacker.

From a security perspective, this prevents an adversary on the network from modifying content en-route to the user. Imagine you’re an application developer and you have software available for download. You even go so far as to supply a SHA-256 fingerprint of your application so that users can verify its contents. If you’re transmitting your application over HTTP, a malicious man-in-the-middle could modify the application and fingerprint in transit, giving your end user a nice dose of malware. But if you allows your users to download over a secure connection, the attacker could not see or change either component on its way to your (happy, malware free) consumer.

From a privacy perspective, HTTPS prevents a man-in-the-middle from snooping on what your user is viewing on your website. The only thing someone on the network could see is that they are connecting to your particular domain, no specific URLs or content. Imagine your user is doing research on a sensitive topic, or trying to download a piece of software that someone doesn’t want them to have. To some extent, HTTPS protects that person’s privacy by preventing a snooper from understanding what they see going over the wire.

Obtaining an SSL Certificate with Let’s Encrypt

If you’re sufficiently convinced that your website should use SSL, the good news is that you can set that up for free! The awesome folks over at Let’s Encrypt have developed a service that allows you to obtain your very own, CA signed SSL certificate from the command line.

To actually get the certificate for your system, you’ll want to use an ACME client called CertBot from the Electronic Frontier Foundation. CertBot uses the ACME protocol to automatically verify your ownership of the domain you want a certificate for and fetch that certificate from the Let’s Encrypt service.

The CertBot website has instructions for all kinds of web servers and server operating systems. You can visit that site if you need further help for a particular combination, but in this article I’ll show you how to get a certificate on Ubuntu Server for Apache (that’s what I use). To install the CertBot application, run:


sudo apt-get update
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:certbot/certbot
sudo apt-get update
sudo apt-get install python-certbot-apache

Once certbot is installed, you have two options for fetching a certificate. If you only have one website with a basic configuration, you can have CertBot automatically configure Apache to use the SSL certificate it will generate. This is done using:

sudo certbot --apache

That’s all you need! You can now visit your website and you should see a little green lock in your browser indicating that you’re connected via HTTPS with a valid certificate.

Using HTTPS to Keep Your Users Secure

It’s important to encrypt traffic between your server and the clients browsing your site so that sensitive information cannot be snooped or manipulated by someone on the network. This article discusses why you should be using HTTPS for your websites, and how to get a free certificate with EFF’s CertBot and Let’s Encrypt. The Apache configuration is done automatically using the commands shown here.

However, if you’re like me, you may prefer a little more fine-tuned control over your system configuration. The next article will go more in depth on hand-tuning Apache for your needs – we’ll cover certificate-only CertBot usage, making SSL virtual hosts, and forcing secure connections.

Avoiding “Mixed Content” Issues In Your Client-Side Scripts

Overview

Web applications are particularly vulnerable to security issues, given that the internet is one giant, unsecured network that anyone can access. One of these major vulnerabilities is man-in-the middle attacks, where a malicious user accesses content sent between the client (the machine receiving web content) and the server (the machine providing web content).

Fortunately, the widespread adoption of SSL/TLS has improved the security of web pages by encrypting content between client and server and making it difficult for man-in-the-middle attacks to occur. It is important, however, that web pages and web applications are properly marked up and coded to avoid having content sent unsecured.

Avoiding mixed content for greater security

What is mixed content?

Mixed content refers to unsecured (HTTP) content loaded within a secure page (HTTPS). For example, this content could be an image, video, or audio file requested by the page with an unsecure URL. This could also be a script loaded by the page over HTTP. Scripts loaded unsecurely can potentially be more dangerous than media files like images.

Why is mixed content a problem?

Privacy concerns

Mixed content can create several problems for an end user of your website. The first issue is that of privacy. When a user loads a page over SSL/TLS (HTTPS), the only thing someone snooping on the network (a “man-in-the-middle”) can see is the domain of the page being requested. For example, a if user visiting https://somesite.com/sensitive-article, the only thing a snooper would see is that the user is requesting content from somesite.com. They can’t see the particular page the user is requesting. However, what if the page sensitive-article requests an image over HTTP? Then the snooper could see the unencrypted image, possibly giving them clues about what the end user is viewing!

When browsing with HTTPS, the user has some expectation of privacy between his or her self and the server. Mixed content breaks that by exposing some parts of the page in plaintext as they travel across the wire, potentially giving some clues as to what the user is viewing.

Code security concerns

A second concern with mixed content is that a script sent in plain text could potentially be modified en-route to the client. If a script is sent over HTTP, a malicious user could potentially intercept that script and inject some malicious code of their own. This code could be used to collect sensitive information or trick the user into visiting an illegitimate website (very common with phishing scams). The end user expects a legitimate script sent securely from the server (they requested a page over HTTPS, after all), but in a mixed content scenario this code could be intercepted and modified by an attacker.

How to avoid mixed content

Use relative links as much as possible

One of the easiest ways to avoid mixed content in web pages is to use relative links. With a relative link, the browser will use the protocol the page was requested with by default, no hard-coding of a protocol required. For example, loading your images from a subfolder called photos can be done with a relative path like so: src="photos/myimage.jpg". If the user requests your page over HTTPS, the image will be loaded over HTTPS as well.

Use // to auto-detect the protocol for outside links, or always use HTTPS

Sometimes you may want to load content from outside your own domain and your own web server. Many developers like to load scripts like jQuery from Content Delivery Networks (CDN’s) rather than store those scripts on their own server. In this case, you can use // in front of the URL instead of a hard coded protocol like http://. This will match the protocol the page was requested from. For example, src="//cdn.jquery.com/some-jquery-version.js". Better yet, if you know the site provides content over HTTPS, just use that by default!

Avoiding mixed content is fairly straightforward!

While mixed content can pose several issues for user privacy and application security, it is thankfully trivial to mitigate. Avoiding mixed content problems is as easy as fixing the links in any pages your site provides. If you’re not sure your pages avoid this problem, the developer console in a modern browser like Firefox will tell you if you’re trying to load insecure content. Many browsers (including Firefox) even block that content from loading. Be sure to be mindful of how media and scripts are requested in your web applications, and your users will be more secure for it!