(Bitcoin) Script Kiddies – Understanding Basic Transaction Scripts

Overview

In the Bitcoin world, money is not just digital – money is programmable! When transactions between users on the network are created and broadcast, miners and nodes independently verify that these transactions are valid. But this verification is not just checking some basic data points – it involves the execution of special scripts specified in the transaction parameters.

Script Basics

The Bitcoin Scripting Language

Before we can understand how basic Bitcoin scripts operate, we need to know a little bit about the scripting language itself. Unlike common scripting languages such as Python and Bash, the Bitcoin scripting language is quite limited and fairly simple in its execution. “Script” is stack based, meaning data is stored on an execution stack and script operators “push” and “pop” data from this stack. As well, Script is not Turing complete. There are no functions for looping or jumping around in the order of script execution. Operations are completely linear from the beginning of execution to the end. This keeps scripts secure, as it is not possible to tie up machines executing the scripts with an infinite loop.

Some operators are general, but most are specific to the cryptography of Bitcoin. Operators such as OP_ADD, OP_SUB, OP_DUP are pretty self explanatory – they make it possible to add, subtract, or duplicate data on the stack. Operators such as OP_HASH160 are more specific to the way Bitcoin operates – this operator takes the top item on the stack, hashes it using SHA-256 and then RIPEMD160, and finally pushes the result back on to the stack.

Common Bitcoin Script Mechanics

Pay to Public Key Hash (P2PKH) Script Basics

Finally, we need to understand a bit about how scripts are formed by discussing the basics of transactions. When a user “receives” Bitcoin in a transaction, they don’t just have a “bank account” balance on the blockchain. Rather, the blockchain stores what are called unspent transaction outputs associated with a user’s address. These outputs specify a locking condition that must be satisfied in a script when the user tries to spend that output in a future transaction. When the user creates a new transaction with that UTXO, they specify an unlocking script that satisfies that locking script

The most common form of script on the Bitcoin network is called Pay to Public Key Hash. With this type of script, the locking script requires that the user provide their public key and a digital signature formed with transaction data and their private key. This public key and digital signature will “satisfy” the locking script. When this unlocking script is combined with the UTXO locking script and executed, the final result on the Script stack should be true, meaning that the user can spend the Bitcoin.

P2PKH Formation

For a P2PKH script, the locking script specified by an unspent transaction output looks like this:

OP_DUP OP_HASH160 <Public Key Hash> OP_EQUALVERIFY OP_CHECKSIG

The unlocking script provides the user’s signature and public key in order

<Signature> <Public Key>

In order to verify that the user owns the Bitcoin they wish to spend, a node verifying this transaction will append the locking script to the unlocking script and then execute it:

<Signature> <Public Key> >OP_DUP OP_HASH160 <Public Key Hash> OP_EQUALVERIFY OP_CHECKSIG

Script Execution

Now let’s walk through how this P2PKH script executes.

<Signature> <Public Key> OP_DUP OP_HASH160 <Public Key Hash> OP_EQUALVERIFY OP_CHECKSIG

First, the signature and public key specified by the unlocking script are pushed on to the stack:

STACK: <Signature> <Public Key>

Next, OP_DUP pushes a copy of the top item on to the stack:

STACK: <Signature> <Public Key> <Public Key>

OP_HASH160 will pop the top stack item and hash it using SHA-256, and then RIPEMD160. Once the hashing operations are complete, the result is pushed on to the stack:

STACK: <Signature> <Public Key> <Public Key Hash>

The user’s public key hash (a data item) specified by the locking script is pushed on to the stack:

STACK: <Signature> <Public Key> <Public Key Hash> <Public Key Hash>

OP_EQUALVERIFY now pops the top two stack items and checks that they are equal. If they are equal, execution continues. If the comparison fails, script execution exits with a failure.

STACK: <Signature> <Public Key>

OP_CHECKSIG now verifies that the signature is valid against the public key specified. An elliptic curve digital signature is created using a private key and a specific message, and any user with that message and public key can verify that the signature is valid without knowing the private key! Note that the message is not a part of the script, but is garnered from the overall transaction data. If the signature is valid, OP_CHECKSIG pushes true on to the stack.

STACK: true

Any Bitcoin script that ends with just true on the stack indicates a valid transaction. The user that created this transaction to spend some currency is in fact the rightful owner of the unspent output they want to use.

Bitcoin Scripts – Simple But Powerful

For someone with programming experience and some computer science background, Bitcoin scripts are generally straightforward to understand since the language is limited and Turing incomplete. Understanding P2PKH scripts requires just a working knowledge of stack data structures and commonly used cryptographic algorithms, but no higher level programming constructs. Now you know what goes on when you send your friend some money from your Bitcoin wallet!

However, the beauty of programmable money is the power to create transactions beyond the normal flow of “Alice sends Bob some cash”. Script opens up the possibility of things like multi-signature transactions, time locked spending, and more!

EZ-Pay – Full Node vs. SPV Wallets

Overview

When discussing digital currencies, the question is often asked “where is the ‘money’ actually stored?” In the world of fiat currency (US dollars, Euros, etc.), cash stored in your physical wallet is the money. You give a $20 bill to a cashier, and they now have $20. With cryptocurrencies like Bitcoin, the actual currency is stored on a completely public, open ledger called the blockchain. The blockchain stores a complete record of every transfer between individuals in the history of Bitcoin’s existence, so a Bitcoin wallet can easily verify that you own some amount of currency and can send it to another person.

However, there’s a bit of a problem with this. The Bitcoin blockchain contains a record of every single transaction ever recorded, now over ten years of history. The blockchain is HUGE in terms of storage space – nearly 200 gigabytes these days. What if we don’t have that kind of space on our computer? What if we want to have a digital currency wallet on our phone or another capacity-limited device? Fortunately, there’s a type of wallet called an SPV wallet that fixes this problem. Let’s discuss the difference between full node and SPV technology.

Full Node vs. SPV

The Full (Node) Experience

The most full wallet experience is using what is called a full node. The strategy use by a full node wallet is very simple: the entire blockchain containing all transactions is downloaded to the machine running the wallet software.

Because the full blockchain is available to the wallet, verifying ownership of the user’s funds is simple. The wallet software looks at the blockchain ledger and traces the ownership of the currency back to the very beginning of Bitcoin. The blockchain is well secured by cryptography and proof of work, making it near impossible to forge any of these transfers. So, by storing the full blockchain, the wallet software can verify that all the previous transfers of Bitcoin leading up to the transfer to the current owner are valid and considered indisputable history. If your wallet can independently verify the blockchains transactions, it knows for sure that your Bitcoin is truly yours.

Wallets on a diet – Simplified Payment Verification or SPV

As we discussed, however, it can be problematic to download and store the entire blockchain for a wallet in many cases. What if a user with a small laptop, a mobile phone, or other limited-capacity device wants to participate on the Bitcoin network? A user may have limited storage capacity, or may also have limited bandwith for downloading the very large blockchain. But if we can’t download the blockchain, how can we independently verify that the Bitcoin in our wallet is actually ours?

Satoshi Nakamoto, the inventor of Bitcoin, brilliantly solved this problem by developing a technology called Simplified Payment Verification, or SPV. These wallets use some neat cryptographic tricks to avoid downloading the whole blockchain, at the expense of a minimal amount of trust required to verify currency ownership.

So how do SPV wallets work? When an SPV node needs to verify ownership of a user’s funds (in order to create a new transaction where they send money to someone else), the node makes special requests to full nodes it can find on the network. Instead of asking for the whole blockchain, it only asks for specific bits of information it needs to cryptographically verify that the wallet user owns their money.

SPV wallet only downloads what are called the block headers. These headers store important metadata about the transactions included in that block, including a sort of cryptographic summary of transactions called a Merkle tree. Next, the SPV wallet will ask other nodes on the network for transaction data that is relevant specifically to the user’s wallet, like previous transactions that send money to the user’s Bitcoin addresses.

By getting the basic transaction data from other nodes and the block headers, the SPV wallet can use cryptography that verifies that the transaction does indeed belong in a particular block (by verifying it belongs in the Merkle tree in the block header). The SPV node can then verify that the blockchain is valid by checking that all the block headers are valid and have sufficient “proof of work”. It turns out that if the transaction the wallet needs to verify is several blocks “deep” (that is, behind the latest block proved and added to the chain), the wallet can generally trust that the funds do indeed belong to the user without having to verify the whole blockchain!

It is important to note that in order to prevent being scammed by one rogue node on the network, SPV wallets connect to many full nodes to request transaction data. It is far less likely that all the peers an SPV node connects to a trying to scam that node with falsified transaction data, so it is generally considered secure to use SPV nodes for everyday transfers. If a user wants the most secure wallet experience, a full node is a bit better since it verifies the whole blockchain and doesn’t have to trust other parties on the network.

SPV – Wallets, simplified!

Thanks to the interesting cryptography of Merkle trees, proof of work, and block chaining, SPV wallets do not need to download the entire blockchain to securely check if a user owns their Bitcoin. By asking for specific transaction data, an SPV wallet can check that transactions sent to a user’s address belong to a block using a Merkle tree. And by verifying block chaining and proof of work, the node can trust that said transaction has been accepted as part of the Bitcoin history and is therefore owned by the user. Since SPV nodes communicate with multiple full nodes, it is generally true that SPV wallets are secure despite the fact that they do not download and validate the entire blockchain. So never fear – if you’re using a mobile phone wallet or a wallet on your netbook, you can participate in Bitcoin in a way that is secure!

What’s in Your Wallet? Understanding Private Key Control

Overview

Just like your cash, cards, and ID, your cryptocurrency assets live in something called a “wallet”. Most all forms of digital money implement this concept in some form, and understanding wallets is critical to safely storing and using your favorite digital currency.

Much like your physical wallet, your Bitcoin, Monero, or Ethereum wallet gives you direct access to the funds inside. A crypto-wallet isn’t like a credit card – if a stranger gets a hold of it, you can’t cancel it. Much like cash, the money stolen would be theirs!

But how does this work? How can a digital asset act like cash when all other forms of digital monetary transactions (credit cards, bank transfers, PayPal) can be “cancelled” if stolen? We must first understand a bit about what a “private key” is, and why who controls it is so important to the security of your cryptocurrency funds.

A word on private keys

Without getting to far into the technical details, let’s discuss a bit about what a “private key” is and why it is so important. Remember how I said that your crypto-wallet is like digital cash, and your wallet “stores” your Bitcoin or other currency? Well, that’s not quite how that works…

In reality, the amount of Bitcoin that you own is stored on a worldwide, completely public ledger/database called the “blockchain”. This ledger stores a public record of all of the Bitcoin transfers ever conducted, so anyone can see exactly who owns what. Sound scary and insecure? How can you control your digital cash if everyone has access to this open blockchain??

This is where private keys come in. Bitcoin (and other crypto currencies) use a form of cryptography called “elliptic curve cryptography” to generate the Bitcoin “addresses” people can use to send you money. The address is completely public; you can give it to anyone and they can send you funds. However, behind this address is a special “private key” used to access those funds on the blockchain. Your address is generated from this randomly generated private key by using this form of cryptography.

The cryptography used in address generation makes it so that you can’t figure out the private key by going backwards from the public key, or Bitcoin address. However, the private key is used to prove that you own the address without ever revealing it, thanks to the magic of elliptic curve cryptography. It is critical that the private key is always kept secret, because anyone with the private key can access the Bitcoin at the associated address.

Levels of Private Key control in Wallets

Now that we understand the basics of private keys and their importance, we can talk a bit about how different wallets keep these keys safe from the prying eyes of crypto-thieves. All wallets must work in some way that keeps the private keys, well, private so that control of the funds lies with their rightful owner. There are three general approaches to private key control in wallets: a full control model, a hybrid model, and a custodial model.

Full control wallets

Full control wallets offer the obvious – complete and total control of the private keys. With a full control model, the private keys are generated and stored on the user’s device, be it a desktop computer, mobile phone, or even a hardware wallet like the Trezor. With this model, the private keys never leave the user’s device in any shape or form.

The advantage to this model should be fairly obvious – it is by far the most secure model. There is no trust involved with a third party; the funds are completely controlled by you. Users should still exercise care around other security parameters (ensuring a virus-free machine, for example), but generally these wallets offer the most hardened approach to keeping private keys safe.

The disadvantage here is the lack of convenience and ease of use. This wallets require the most technical savvy of these three models, although most “power users” will have no problem understanding and securing these wallets. It is extremely important that the users of these wallets understand how to back up their private keys. If the device is fried or lost and there is no accessible backup, all funds will be lost! Fortunately again, BIP39 mnemonic backups make this tasks easier than it was with the first few Bitcoin wallets.

Hybrid wallets

Some major players in the crypto space have created an interesting hybrid model for private key storage. Web wallets like those at blockchain.info or btc.com implement this model. With hybrid wallets, private keys are generated and then encrypted on the user’s machine (usually in the web browser) before being stored on the company’s server. With these wallets, private keys are only known and accessible to the user, while the company keeps an encrypted backup safe on their servers.

With this model, security is still pretty strong. Because strong encryption is done on the user’s machine, no one with access to the company’s servers have access to the actual private keys without the decryption passphrase, which lies safely with the user. This model requires that the user trust that the company’s code (which is preferably open source) is soundly implemented and doesn’t contain secret backdoors. However, if the encryption is done right no one but the user can actually access the keys. This model is slightly less secure than a full control model,

Although there is a small amount of security tradeoff here, this model comes with increased convenience to the user. Most web wallets have a more traditional username and password login interface, so the user only needs to create and remember a secure passphrase to access their funds, with the site taking care of backups for them. This may be easier for a beginner crypto-enthusiast, and any good site will still offer mnemonic backups and private key exports for the savvy user.

Custodial wallets

The final model we’ll discuss here is the custodial wallet. Many exchanges like Coinbase offer custodial wallets. Like a hybrid wallet, all you need to do is create a username and password and log into a website to access funds. However, the critical difference is that with a custodial wallet, the user doesn’t know their private keys at all!. With custodial wallets, the website takes care of generating and storing all the private keys without revealing them to the user. No backups to manage, and no need to understand how to do much more than log in to a website to use this kind of wallet.

The security pitfalls of this model are pretty serious, in my opinion. With these wallets, the user has no way to back up their private keys. What’s more, the user must completely trust the company or individual implementing this kind of wallet. These companies must have significant security measures in place to avoid attacks on their servers, and they must be trusted to mitigate the access rogue employees could have to user’s money.

In fact, these types of wallets completely break a fundamental security principle of Bitcoin – the user controls their keys, therefore the user controls their money. Custodial wallets far more closely resemble the centralized model of traditional banks.

Don’t worry though – these wallets aren’t all scary! The cost of security comes with a large benefit – ease of use! These wallets have a much smaller learning curve for complete beginners. Just sign up for a wallet account just like you would a forum, email address, or social media account. For someone with little understanding of the world of cryptocurrencies, this type of wallet offers a gentle introduction

My only advice would be that given the security pitfalls of this model, only use custodial wallets to store small amounts or for buying and selling. Most custodial wallets live in currency exchanges, so use them to buy your crypto of choice and send the funds to a more secure wallet.

Know Your Keys

The most important takeaway from this discussion of private key control models is that it is important for a wallet user to know where their private keys are. Again, in Bitcoin and other cryptos, control over your private keys is control of your money. Anyone with access to the keys has access to your money and can spend it freely, so either keep them to yourself or make sure the holder is a wallet maker you trust.

By understanding these different models, users have more control over how they choose to secure their wallets and keep their funds safe. Understanding the pros and cons of full control, hybrid, and custodial wallets allows a cryptocurrency user to choose the best wallet for their needs. Ultimately, an understanding of these models allows for better security and comfort with digital money, because a person knows who truly owns their keys and is responsible for keeping them safe.

Playing With Blocks: The Basics of Blockchain Databases (Part 2 – Blockchain for Techies)

Overview

In our non-technical overview of blockchain, we discussed what a blockchain database is – a distributed, cryptographically secured, and immutable data structure used in applications like digital currencies. We discussed how blocks are cryptographically linked together to ensure that old records can not be changed, and why these types of databases are useful for applications where the immutability of data is key.

But how does this work technically? Let’s take a deeper look at how blockchains are secured by the application of hashing and proof-of-work.

It’s all in your the block’s head

The block header

The key to understanding blockchain lies in a data structure included in every block of every blockchain. This data structure is called the block header, and it contains several critical bits of information needed to secure the chain as it grows.

Let’s look at the Bitcoin block header as an example. Each Bitcoin block contains an 80 byte header with the following information:

  • Version – the software version of the Bitcoin protocol
  • Timestamp – expressed in seconds since the Unix Epoch
  • Merkle Root – for our purposes here, we’ll say this is a “fingerprint” of all the transactions in this block
  • Difficulty target – A 256 bit integer used in calculating proof of work
  • Nonce – The value added to the block header to demonstrate proof of work
  • Previous block hash – The SHA-256 hash of the previous block header

All of this data is important and serves a purpose in the Bitcoin protocol. However, I’ve highlighted the previous block hash because it is particularly important when discussing how the blockchain is secured!

A quick review of hashes

Before we discuss the critical role that the previous block hash section of the block header plays in securing the blockchain, let’s step back and recall what a hash function does. When “hashing” data, a special algorithm called a “hash function” takes the data and outputs a unique “fingerprint” of the input data. These functions (at least if they are implemented properly) have two very important properties.

First, a particular chunk of data always produces the same hash (or “fingerprint”) every time it is run through the function. If you run Hello through the SHA-256 hashing algorithm, the result will always be 185f8db32271fe25f561a6fc938b2e264306ec304eda518007d1764826381969.

Second, good hash functions avoid collisions, where different data results in the same hash output. Using a proven algorithm such as SHA-256 means that for every different input, a different hash is produced. Even if a single bit of input changes, the hash is radically different. “Hello” will produce a very different output than “hello”, even though only a few bits of the input are changed.

Hashes inside hashes inside hashes

Understanding these properties of hashes, we can better understand the interesting approach that blockchains take to securing the integrity of data in previous blocks when combined with proof of work.

Each time a block is generated, proof of work is generated in the form of the nonce included in the block header. This is computationally intensive and essentially proves that a miner spent a good bit of CPU power to find the answer to this cryptographic puzzle.

Now, when the nonce is included in the block header with the other data including the previous block hash, we can hash the entire block header to generate the unique fingerprint. This “block hash” is unique, and changing any data whatsoever in the block header would create a radically different block hash.

Okay, so each block has an associated hash. What’s the big deal? How does that help secure the blockchain? The magic of blockchain lies in that critical piece of data known as the previous block hash. Recall that changing any bit of data in the input radically changes the output of a hash function. So what would happen if we tried to change a transaction 2 blocks back in our blockchain?

If a node tries to broadcast a fake blockchain to the network with a fake transaction 2 blocks back, the block hashes for each subsequent block would be radically different.

Let’s look at an example. Let’s say we have a really simple blockchain with some transaction data like so (Note – these hashes are made up for demonstration purposes):

Block 2:
Time - 3000
Difficulty - 1000000000000000000000000000000000000000000000000000000000
Nonce - 5345245
Prev block hash - 33a0b89fcce723e9f41f5d756ab1c20584afbe6dfa9ea18838ff3caf0915b5f5
Transaction: Bob pays Alice 6 units

Block 1:
Time - 2000
Difficulty - 1000000000000000000000000000000000000000000000000000000000
Nonce - 2356343
Prev block hash - f4ebb8b56f590188f5824276af552cd51a48ba774e3ad1350c2800b116d8f6f5
Transaction: Alice pays Bob 5 units

Block 0:
Time - 1000
Difficulty - 1000000000000000000000000000000000000000000000000000000000
Nonce - 1232341234
Prev block hash - 0000000000000000000000000000000000000000000000000000000000000000
Transaction: Alice pays Bob 1 unit

Now let’s say Bob gets greedy and tries to say that Alice paid him 10 units in the first transaction:

Block 2:
Time - 3000
Difficulty - 1000000000000000000000000000000000000000000000000000000000
Nonce - 9987983
Prev block hash - 9ae343e333cbb96427eb333bb8c443359e3cf926c9de9845ceb583577b945afb
Transaction: Bob pays Alice 6 units

Block 1:
Time - 2000
Difficulty - 1000000000000000000000000000000000000000000000000000000000
Nonce - 390970
Prev block hash - 3f82f4cfe059b5a69a0fd5b4d34774af5ecdc672d988320d5fd186998969a645
Transaction: Alice pays Bob 5 units

Block 0:
Time - 1000
Difficulty - 1000000000000000000000000000000000000000000000000000000000
Nonce - 235235
Prev block hash - 0000000000000000000000000000000000000000000000000000000000000000
Transaction: Alice pays Bob 10 units

Notice how different the hashes are for blocks 1 and 2 in Bob’s fake blockchain. If these hashes are to be considered valid by a node in this currency’s network, then each block must also demonstrate proof of work. Since the data has changed in a block, a new nonce must be found to show that work was done.

Here is the most critical part – since the hash for block 0 has changed and is included in block 1, proof of work has to be re-done for block 1. And since block 1’s hash is included in block 2, proof of work has to be re-done for block 2. In other words, to try and fake a transaction 2 blocks back, Bob has to re-do proof of work for 3 whole blocks!! In the meantime, legitimate nodes only have to try and find a solution for the current block. It is clearly impractical, if not impossible, to “fake” a blockchain more than one or two block old, because proof of work has to be redone for many blocks in the time the legitimate network only has to prove one.

Faking a blockchain take too much work!

Due to the interesting combination of hashing and proof-of-work algorithms, it is incredibly difficult if not impossible to fake history in a blockchain database. Because each block contains the hash of the previous block, changing history blocks back means that every bit of the chain on forward must be forged. While legitimate nodes only have to prove work for one block in that span of time, an attacker would have to calculate for many. Unless a malicious party has some amount of computing power the rest of us don’t know about, it’s nearly impossible to do so.

For the extra curious, Satoshi covers the math behind this concept extensively in section 11 of the Bitcoin whitepaper. While I don’t claim to understand this math very well myself, the paper does a good job of explaining its conclusions that forging blockchain history is a fool’s errand.

Playing With Blocks: The Basics of Blockchain Databases (Part 1 – Blockchain for Everyone)

Overview

Blockchain is the latest and greatest buzzword in the information technology world. From open source, decentralized cryptocurrencies like Bitcoin to traditional financial institutions, it seems as though everyone is dying to create and release their own blockchain based applications. But what is blockchain? Why is it such a popular concept, and what is it actually good for? Let’s discuss.

What and why: Blockchain simplified

What is blockchain?

So you’ve heard that blockchain is going to revolutionize everything, but what is it exactly? Let’s cut through the hype and discuss the technical foundations of Blockchain.

A blockchain is a distributed, cryptographically secured database that focuses on making historical data immutable.

In a traditional database, information is often stored on one or a few machines, controlled by a central authority. Access is controlled by this authority (think IT administrator) and the data is kept secure by granting credentials to modify that data to a select few trusted parties. By contrast, a blockchain database is governed by what is called distributed consensus, using mechanisms such as proof-of-work. For more information on proof-of-work, you can read my series of articles on it.. The important thing to note is that (in general), no one central person or authority decides what data is “verified” in a blockchain, a community of network nodes and software does.

If anyone can modify the data in a blockchain rather than a trusted party, then how is this consensus on what is correct achieved? Again, the secret lies in the science of cryptography. Through a mechanism like proof-of-work, a cryptographic puzzle is solved by software with some incentive to do so. In Bitcoin, the node that solves this puzzle is granted new currency. The real magic, however, is the fact that any other node in the network can verify that this answer is correct in a split second, so anyone can independently verify that a block meets the cryptographic standards set by that blockchain’s protocol.

You may be wondering how the cryptography in each block keeps the overall blockchain secure. This is the question of immutability, or how easy it is to modify the history stored in the blockchain. Blockchains solve this by cryptographically “linking” each block to the previous block, thereby making each individual block a critical part of the history stored by that “chain”. Each block has a header full of useful metadata about that block – a timestamp, a “summary” of the included data or transactions, a difficulty target and nonce for mining (part of proof-of-work), and the hash of the previous block’s header. Each block header is run through a one-way, cryptographically secure function called a “hash function” that creates a unique digital fingerprint for the data.

Immutability is achieved when combining the proof-of-work consensus mechanism with this system of chaining each block together. In order to create each block, the cryptographic puzzle solved by the proof-of-work algorithm allows a unique block header hash to be generated. It is computationally difficult to get this value, but very easy to verify it is correct. Now, it’s not that hard to re-solve that hard problem in a matter of minutes…it would be easy to create a fake block at the top of the chain. But what about 10 blocks back? Well, since each block contains a hash of the previous block header that is generated by solving this hard problem, you would have to now fake history for ten whole blocks! It is exponentially more difficult to do so the further back in the chain that you go. Unless you can truly do the work required to fake history in a blockchain, any independent network node could easily see that the rest of your history on forward is invalid. The immense difficulty of “faking” history in a blockchain gives it the most important property it has, its immutability.

Cool, so why is it useful then?

By far the most important aspect of blockchain, in my opinion, is its ability to decentralize applications. With a traditional database, a central authority has to be trusted, which can be a disadvantage in applications that are controversial or have high incentives for fraud. For example, previous attempts at digital money like DigiCash had central services for issuing currency and validating transactions. These were promptly shut down by governments that didn’t like independent currencies very much.

With blockchain, it is possible to have things like completely peer-to-peer money as with Bitcoin, Litecoin, and countless others because no central government or individual has to be trusted! The network is secured by math (cryptography) rather than trust thanks to the blockchain. You don’t have to trust anyone to not defraud you of your money, because the math cannot lie about who owns what.

The other critical function of blockchains beyond decentralization are the preservation of history. Because blockchains are immutable, they can be useful for keeping things like medical records, property transactions, court histories, and more secure from malicious tampering like a traditional database. This does rely on some degree of decentralization, but even within a single company a blockchain is far harder to tamper with than a traditional database.

Cool, now I want a blockchain!

Blockchains are a fascinating and novel way to handle problems with traditional databases in certain applications. Thanks to the decentralized and cryptographically secure nature of these databases, it’s possible to create peer-to-peer applications that don’t require trusting a third party – a key problem to solve for concepts like digital money. As well, their immutability makes them useful even beyond the first few money-centric applications that existed – they may be coming do a real-estate authority, doctor’s office, or justice system near you!

Round and Round – Using Generators in Python

Overview

Most all modern programming languages support constructs for storing lists of data – think C++ arrays, Java ArrayLists, and Python lists. Any time we have a list of data, it’s often necessary to use loops to perform some operation on each item in that list. Again, most modern languages support ways of looping over these lists of data, most commonly “for” loops.

In Python, it’s trivial to iterate over a list using a for loop. However, Python lists are stored in memory. What happens if the data set you need to operate on is large and therefore memory inefficient? Python offers constructs called iterators that allow one to loop over data sets that are not stored in memory. Some iterators are built in for things like file reading, but you can easily create your own custom iterators using a concept called generators!

Iterators? Generators? Combobulators?

Traditional List Iteration

It’s easy to create a list of numbers in Python an loop over that list. Let’s say for example, we want to create a list of multiples of 2. We’ll store 5 numbers in this list. For each number in this list, we’ll just print it out to the screen for now:

def print_multiples():

    multiples = get_multiples()
    for m in multiples:
        print m

def get_multiples():

    return [ 2, 4, 6, 8, 10 ]

if __name__ == "__main__":

    print_multiples()

Let’s step through this code a bit and explain how it works. When we enter the print_multiples function from main, the first call multiples = get_multiples() assigns the return value of get_multiples() to the multiples variable.

The multiples variable now stores an in-memory list with all of our multiple of 2 values – 2, 4, 6, 8, and 10. We next go to the for loop for m in multiples:. These Python for loops are slick and fairly straightforward – for each go around the loop, the next value in the list is stored directly in the variable m. The loops continues until each value in the list is exhausted.

The output looks like this:

python test.py
2
4
6
8
10

So what’s a generator look like?

Now let’s try this code again, using Python’s generator construct. Here’s what that looks like:

def print_multiples():

    multiples = get_multiples()
    for m in multiples:
        print m

def get_multiples():

    for i in range(1, 6):
        yield i * 2

if __name__ == "__main__":
    print_multiples()

Notice our output is the same:

python test.py
2
4
6
8
10

This code requires some closer examination to understand how it works. When we assign multiples = get_multiples(), we don’t a assign a list, we assign what’s known in Python as an iterator. An iterator object exposes a next method that allows for loops to retrieve each sequential item in a list or other iterable object.

When we enter our for loop this time, we don’t iterate over the list – instead our iterator uses the get_multiples() generator function to retrieve each item one by one. The first time we go around the for loop in print_multiples, we enter the get_multiples function and enter its for loop.

Now, you’ll notice a different keyword being using in get_multiples – instead of returning an entire list, the function only yields one item, the result of i * 2. The value is passed through the iterator’s next function and printed to the screen. The next time around the for loop in print_multiples, the code goes to the same spot the value was yielded from in get_multiples. The return keyword returns code control to the caller, whereas the yield keyword only temporarily yields control back to the caller until the next time the caller needs a value from the iterator. The get_multiples function’s for loop continues, and yields the next multiple of 2. The function will continue yielding values of 2 until it’s for loop ends, yielding 10.

Cool, so why would I want to use generators instead of a list?

Our trivial example makes the use of generators pretty clear, but it doesn’t explain why they’re actually useful. Why all the extra complexity to avoid storing a whole 5 numbers in memory?. The use case of generators goes far beyond small lists of numbers.

First, what if we wanted to display the first one billion multiples of 2? In that case, it becomes much more expensive to store one billion integers in memory – it would eat up over a gigagbyte!.

In real-world software engineering applications, we often use generators to deal with large datasets even beyond simple calculations such as this. Generators can be used to operate on large database queries or data read from files on disk, where it would be inefficient or even impossible to read the information into memory.

Generate your own generators

In this article, we’ve explained how to go beyond memory-stored lists of information and create our own generators. Instead of hogging memory for large data operations, we can make our own memory-efficient iterators. Now when you have large calculations, database queries, or file reads to worry about, you can keep your memory usage low and your code easy to understand thanks to Python!

BIP39 Mnemonics Made Easy (Part 2 – The Tech of Bits to Backups)

Overview

In the last article, we discussed a high level overview of BIP39 mnemonics and their value as a simplified backup tool. Mnemonics make it much easier to take a single seed, back it up, and ensure access to an entire wallet of private keys, addresses, and transactions. But how do we go from a random set of bits to a list of words? Let’s discuss the technical side of BIP39.

Bits to Backups – The Steps for Generating a Mnemonic

First, Chaos

In order to generate a good seed, a fair amount of entropy or “randomness” is desirable. Good random number generators are hard to get, but modern OS’s like Linux do a pretty good job of sourcing entropy from the user and hard drives, and something like /dev/urandom on a daily driver machine should be sufficiently secure for generating the entropy we need.

Now how many random bits do we need? The BIP39 standard specifies 128-256 bits of entropy to be used for generating the seed. This will correspond to 12-24 words later on when we “map” the entropy to the words.

First, a warning: DO NOT USE any of the examples in this article to generate a wallet – your funds will be stolen!

With that out of the way, let’s look at an example. First, let’s generate 128 bits of entropy using os.urandom() in Python. Represented as binary, our entropy looks like this:

10111110011001010101110111001111010100011111011010110001110101111011110111000101101001100011110100010100011101000011011011100000

Next, a checksum

In order to better secure the seed, we’ll add a checksum to the end of the entropy. This makes it easier for wallet software to validate a backup seed.

To get the checksum, we’ll first take the SHA-256 hash of our entropy. Then, we take the first N/32 bits of the hash and append it to the entropy.

In our case, 128/32 bits gives us a 4 bit checksum size. In our example, the 4 bit checksum will be 0101. We’ll append that to the entropy to give us a 132 bit value:

101111100110010101011101110011110101000111110110101100011101011110111101110001011010011000111101000101000111010000110110111000000101

Dividing and our Dictionary

The final step of the process involves dividing our checksummed bits into “chunks” and mapping those chunks to the mnemonic words from the dictionary. The BIP39 standard specifies that the chunks will always be 11 bits long. So, we divide our 132 bit checksummed entropy into 12 chunks of 11 bits each:

  1. 10111110011
  2. 00101010111
  3. 01110011110

Now, each of these 11 bit chunks can be interpreted as an unsigned 11 bit integer value ranging from 0-2047. This “maps” to a word from the dictionary of 2048 words directly! These are standardized and listed in alphabetic order. So, we can take the 11 bit chunk as an index in the dictionary to extract the words we need:

  1. 10111110011 = 1523 -> salmon
  2. 00101010111 = 343 -> cliff
  3. 01110011110 = 926 -> inherit

The overall mnemonic we generate is this example turns out to be:

  1. salmon
  2. cliff
  3. inherit
  4. physical
  5. help
  6. type
  7. warfare
  8. regular
  9. dial
  10. photo
  11. asset
  12. scheme

Mnemonics – from Entropy to Dictionary Entries

The process of generating a mnemonic seed is both ingenious and straightforward. One can easily create a secure wallet seed of 12-24 words by generating some entropy, checksumming the data, and mapping to a standard dictionary.

I’ve written a project called MnemonicGen that generates mnemonics using these steps. Take a look at this project to see these steps implemented in Python. This code should be considered academic/experimental – use it to create wallets at your own risk. Other proven implementations such as Ian Coleman’s BIP39 are also available to study.

Happy generating!

BIP39 Mnemonics Made Easy (Part 1 – Backups, Simplified!)

Overview

A critical component of cryptocurrency security is the ability for users to easily and efficiently backup the private keys that control access to their funds on the blockchain. Without one’s private keys, any funds in a user’s addresses are irrecoverably lost.

However, the nature of early wallets made backing up one’s private keys a regularly-scheduled necessity, unwieldy and annoying for most users. BIP39 and associated Bitcoin Improvement Proposals have thankfully simplified private key backup by introducing HD wallets and mnemonics.

Newer wallets explained

HD? So like, High Definition? No, Hierarchical Deterministic!

In the early days of Bitcoin and other cryptocurrencies, address generation was done non-deterministically. For each new address needed, a private key would be randomly generated and stored in the wallet’s backup file. Most wallets would pre-generate some addresses in the initial wallet file, but every new private key/address introduced into the wallet meant a new backup would be needed.

For privacy reasons, it is recommended to use a new address for each transaction. And for every new address generated since the last backup, a user would need to create a new backup to avoid losing recent funds in the event of a wallet resortation. Even for “power users”, backups became an unwieldy and annoying task.

Enter BIPs (Bitcoin Improvement Proposals) 32 and 44. In summary, these proposals introduce HD (“Hierarchical Deterministics Wallets”). These wallets only require one seed to be randomly generated. And from that seed, all the private keys and addresses a wallet needs can be derived in a tree structure; all associated with the initial seed. Since the private keys and addresses can be regenerated from the seed, one only has to back up the seed to recover all of their private keys, addresses, and transactions for a wallet. Much better!

Introducing Mnemonics – Simplifying Seed Backups

The ability to generate an entire wallet from one seed drastically simplified wallet backups, and therefore has improved the ease by which users can keep their funds safe. However, a seed is still just a random binary value. Represented in hex or Base64 encoding, it is still fairly easy to misread/miswrite a character and accidentally create a useless wallet backup.

To truly simplify the task of backing up a wallet seed, some developers in the Bitcoin space proposed a system that allows the translation of the binary seed value into English words that can be more easily transcribed or even memorized to secure access to one’s funds. This proposal, given the designation BIP39, was written by Marek Palatinus, Pavol Rusnak, Aarone Voisine, and Sean Bowe.

What does a mnemonic look like?

Mnemonics don’t use just any set of words. These words are carefully chosen to avoid ambiguity and make transcription easy, so that a user doesn’t accidentally create an incorrect backup.

There are a total of 2048 words in the dictionary, and a wallet mnemonic contains 12-24 words. The last word contains a checksum validating the other words in the list, making it easy for wallets to validate a backup.

Here is a sample Bitcoin or Bitcoin Cash BIP39 mnemonic:

  1. army
  2. van
  3. defense
  4. carry
  5. jealous
  6. true
  7. garbage
  8. claim
  9. echo
  10. media
  11. make
  12. crunch

WARNING, DO NOT use this seed for a wallet. A seed must remain private, and your funds will be stolen! This mnemonic is excerpted from Andreas’ Antonopoulos’ Mastering Bitcoin to further discourage its use – millions of people have access to this wallet.

Now how do you get one of these fancy mnemonics for a wallet? Most modern wallet software will generate this for you when you create a wallet. Then, all you need to do is write down the phrase and store it in a secure location to backup access to your funds if your wallet device is lost or stolen.

Alternatively, a mnemonic can be generated by a separate tool and imported as a backup into the wallet software. I’ve written a generator called MnemonicGen that produces standard phrases that can be imported into any modern HD wallet that supports BIP39. Keep in mind that this particular project is meant to be academic/experimental and may not be sufficiently secure for your needs. But other mnemonic generators like Ian Coleman’s are widely used and well-vetted.

Mnemonics – Backups Made Better

With BIP39 mnemonics, Bitcoin newbies and power users alike can easily create and backup secure wallets without the need to keep a schedule or deal with unwieldy binary data encoding. This modern standard is implemented in widely-used and accessible wallets like the Bitcoin.com wallet, Electron Cash, Blockchain.info, and more. I would personally advise upgrading your cryptocurrency experience by using an HD wallet, simplifying your security practices and keeping your funds safe!

In the next article, we’ll discuss the technical workings of BIP39, showing how we can go from a random seed to a set of words in a few fairly straightforward steps.

Proof of Work, Explained (Part 2 – A Hash Bash for Techies)

Overview

In the last article, we looked at the overall idea of proof of work and its applications. That article covered the origins of this concept, how it works at a high level, and some of its applications. Now, let’s take a look at the technical inner workings of these algorithms.

In a nutshell, proof of work involves the use of hash functions. These one way functions form the basis for a difficult to solve, but easily verifiable computational puzzle as a way to prove that one did some amount of desired computing work.

Proof of Work, the Technical Perspective

Hash Functions

First, we need to understand a bit about hash functions and why they form the basis for proof of work. A hash function is a one-way function that takes some input of any size and outputs a consistently sized set of bits. The two most important characteristics of hash functions are that they:

  • Are one-way – you cannot take an output and find the input without brute force guessing
  • Have unique outputs for every possible input (if the hash function is a good one!)

These two properties are critical for proof of work. First, the one-way nature of these makes it so that brute-force is required to find some desired output. Second, the desired one-to-one input/output property makes it so we can easily verify the solution once we have one.

An Overview of the Algorithm

Hashing and Binary and Difficulty Targets, Oh My!

Proof of work builds on top of the properties of hash functions by realizing that as a stream of bits, hash outputs actually represent binary numbers. For example, an 8 bit hash 00001000 represents the decimal number “8”. Now remember that hash outputs can only be matched to a particular input by using brute force to guess.

Using these properties, proof of work takes a pretty ingenious approach to making a user do some amount of predetermined work – it makes them look for a hash that, interpreted as a number, is less than some target value!

This is where the idea of difficulty comes in. Let’s say you want the user to find some input where the hash value, when representing an 8 bit number, has two zeros in the front (00101010, for example). Now imagine you want the user to find an input that gives a hash with four zeros in front (00001011). Which one takes more guesses to compute? It turns out that the smaller the “difficulty target” value, the more guesses (and more computing time) it takes to find an input that gives the desired hash output. This is the fundamental basis for proof of work. It can be statistically predicted that a certain difficulty target will take roughly some amount of guesses (and therefore computing time) to find. So the smaller the difficulty target, the harder the puzzle.

The nonce value

Now since hash outputs map one-to-one with some input, how can we prevent the worker from just using a dictionary to find an input that meets the difficulty? Here is what makes proof of work truly proof – we always create a hash input unique to the problem we’re trying to solve, using an applicable message and a random guess we call a nonce.

See, for our proof of work to truly require the desired amount of work, we always start with a unique message for the problem. In the case of Bitcoin, our message is what is called the “block header” – a chunk of data containing information about the transactions included in the current block. In an anti-spam application, this message would be something like a forum post or the contents of an email message. Since this message is unique, a dictionary cannot be used to guess a hash that meets the difficulty target.

The worker then has to take the message plus a random number guess called the nonce, and run that combined string through the hashing function. If the hash output isn’t less than the target number, then the worker increments that random number guess concatenated to the message and tries again – and again and again until the right output is found.

Verifying the nonce

Once a nonce is found, another node or server can very easily verify that the solution (the nonce) is correct. All the verifying party has to do is take the original message plus the nonce value found by the worker and run that through the hash function. Since a hash input will always give the same output, it only takes one step to verify that the worker’s nonce is in fact a correct solution to the proof of work problem.

A Practical Example

Let’s take a look at an example anti-spam proof of work problem. Let’s say a user wants to contact a site owner with the message “Hello”, and the site owner wants the user to do some proof of work before sending that email. The site owner specifies a difficulty target of 2^240. This difficulty target can be any 8 bit number in this case, but a power of two is easy to work when building an application. This system uses the compute-intensive SHA-256 hashing algorithm for its proof of work. Here’s what the steps would look like:

Worker (Client)

  1. Uses “Hello” as the message
  2. Starts guessing a nonce with 0 – the hash input is the string “Hello0”
  3. The SHA-256 output of “Hello0” (in hexadecimal format) is 80878c5b013ba72c0d2b7e8f65868649cbdb1e7e7a8c8a07537d6b3619e4e32f
  4. Clearly, this output is greater than the difficulty target of 2^240, which would have three prepending 0’s in hexadecimal: 0001000000000000000000000000000000000000000000000000000000000000
  5. Increment nonce to 1, and try again. This continues until an appropriate nonce is found
  6. The client finally finds a nonce that works with the value 9172. The SHA-256 hash of “Hello9172” is 00001f2e9f8f74117b4178eb04b368c807f906ae2a07bece562266cbc9adff3c, which is less than the difficulty target of 0001000000000000000000000000000000000000000000000000000000000000 (2^240)
  7. Since the client has a nonce guess that meets the difficulty target for this unique message, it now has proof that it did all that computing work!

Verifying Party (Server)

  1. Take the message for this problem, “Hello”, plus the client’s found nonce, “9172” and pass “Hello9172” through the SHA-256 hash function
  2. Since hash functions produce the same output for any input, we get the same output the client found: 00001f2e9f8f74117b4178eb04b368c807f906ae2a07bece562266cbc9adff3c.
  3. Since the above output is indeed less than the difficulty target 2^240, the server has now verified that the client did the desired amount of computing work to find the nonce. The message can now be sent.

Proof of Work – Hashing for a Cause

These algorithms put the properties of hashing algorithms to new and innovative uses, particularly in the incredible space of cryptocurrencies. Proof of work takes the one-to-one input/output and irreversible properties of hash functions and uses them to create difficult to solve, easy to verify computing problems. This simple but interesting bit of math and computer science powers new approaches to interesting challenges. Proof of work can be used to help prevent spam in a new and unique way – by making large-volume spam uneconomical for its propagators. Arguably at its most revolutionary, proof of work powers the transaction verification and currency issuance components of cryptocurrencies like Bitcoin and Litecoin, allowing for an entirely new form of money free from centralized institutions.