In Part 1, we'll cover how to store the blockchain data and generate an initial block, how a node can sync up with the local blockchain data, how to display the blockchain (which will be used in the future to sync with other nodes), and then how to go through and mine and create valid new blocks. For this first post, there are no other nodes. There are no wallets, no peers, no important data. Information on those will come later.
A big thing to mention here is that there are differences in a basic blockchain like the one described here and a ‘professional’ blockchain. This chain will not create a crypto currency. Blockchains do not require producing coins that can be traded and exchanged for physical money. Blockchains are used to store and verify information. Coins help incentive nodes to participate in validation but don’t need to exist.
Quick Summary of Blockchain
Here's a quick summary about how a blockchains work. If you already know how they work, feel free to skip this section. If you'd like to learn about the blockchain in more detail, refer to: The Ultimate Guide to the Blockchain.
At a super high level, a blockchain is a database where everyone participating in the blockchain is able to store, view, confirm, and never delete the data.
On a somewhat lower level, the data in these blocks can be anything as long as that specific blockchain allows it. For example, the data in the Bitcoin blockchain is only transactions of Bitcoins between accounts. The Ethereum blockchain allows similar transactions of Ether’s, but also transactions that are used to run code.
Slightly more downward, before a block is created and linked into the blockchain, it is validated by a majority of people working on the blockchain, referred to as nodes. The true blockchain is the chain containing the greatest number of blocks that is correctly verified by the majority of the nodes. That means if a node attempts to change the data in a previous block, the newer blocks will not be valid and nodes will not trust the data from the incorrect block.
If you want to look at the code, check out the part 1 branch on Github.
Step 1 — Classes and Files
Step 1 for me is to write a class that handles the blocks when a node is running. I’ll call this class Block. Frankly, there isn’t much to do with this class. In the __init__ function, we’re going to trust that all the required information is provided in a dictionary. If we were writing a production blockchain, this wouldn’t be smart, but it’s fine for the example where we're the only one writing all the code. We also want to write a method that spits out the important block information into a dict, and then have a nicer way to show block information if we print a block to the terminal.
class Block(object):def __init__(self, dictionary):'''We're looking for index, timestamp, data, prev_hash, nonce'''for k, v in dictionary.items():setattr(self, k, v)if not hasattr(self, 'hash'): #in creating the first block, needs to be removed in futureself.hash = self.create_self_hash()def __dict__(self):info = {}info['index'] = str(self.index)info['timestamp'] = str(self.timestamp)info['prev_hash'] = str(self.prev_hash)info['hash'] = str(self.hash)info['data'] = str(self.data)return infodef __str__(self):return "Block" % (self.prev_hash, self.hash)
When we’re looking to create a first block, we can run the simple code.
def create_first_block():# index zero and arbitrary previous hashblock_data = {}block_data['index'] = 0block_data['timestamp'] = date.datetime.now()block_data['data'] = 'First block data'block_data['prev_hash'] = Noneblock = Block(block_data)return block
Nice. The final question of this section is where to store the data in the file system. We want this so we don’t lose our local block data if we turn off the node.
In an attempt to somewhat copy the Etherium Mist folder scheme, we're going to name the folder with the data ‘chaindata’. Each block will be allowed its own file for now where it’s named based on its index. We need to make sure that the filename begins with plenty of leading zeros so the blocks are in numerical order.
With the code above, this is what we need to create the first block.
#check if chaindata folder exists.chaindata_dir = 'chaindata'if not os.path.exists(chaindata_dir):#make chaindata diros.mkdir(chaindata_dir)#check if dir is empty from just creation, or empty beforeif os.listdir(chaindata_dir) == []:#create first blockfirst_block = create_first_block()first_block.self_save()
Step 2 — Syncing the blockchain, locally
When you start a node, before you’re able to start mining, interpreting the data, or send / create new data for the chain, you need to sync the node. Since there are no other nodes, we're only talking about reading the blocks from the local files. In the future, reading from files will be part of syncing, but also talking to peers to gather the blocks that were generated while you weren’t running your own node.
def sync():node_blocks = []#We're assuming that the folder and at least initial block existschaindata_dir = 'chaindata'if os.path.exists(chaindata_dir):for filename in os.listdir(chaindata_dir):if filename.endswith('.json'): #.DS_Store sometimes screws things upfilepath = '%s/%s' % (chaindata_dir, filename)with open(filepath, 'r') as block_file:block_info = json.load(block_file)block_object = Block(block_info) #since we can init a Block object with just a dictnode_blocks.append(block_object)return node_blocks
Nice and simple, for now. Reading strings from a folder and loading them into data structures doesn’t require super complicated code. For now this works. But in future posts when we allow different nodes to communicate, this sync function is going to get a lot more complicated.
Step 3 — Displaying the blockchain
Now that we have the blockchain in memory, we want to start being able to show the chain in a browser. Two reasons for doing this now. First is to validate in a browser that things have changed. Second, we want to use the browser in the future to view and act on the blockchain. Like sending transactions or managing wallets.
We use Flask here since it’s super easy to start.
Here’s the code to show the blockchain json. We'll ignore the import requirements to save space here.
node = Flask(__name__)node_blocks = sync.sync() #inital blocks that are synced@node.route('/blockchain.json', methods=['GET'])def blockchain():'''Shoots back the blockchain, which in our case, is a json list of hasheswith the block information which is:indextimestampdatahashprev_hash'''node_blocks = sync.sync() #regrab the nodes if they've changed# Convert our blocks into dictionaries# so we can send them as json objects laterpython_blocks = []for block in node_blocks:python_blocks.append(block.__dict__())json_blocks = json.dumps(python_blocks)return json_blocksif __name__ == '__main__':node.run()
Run this code, visit localhost:3000/blockchain.json, and you’ll see the current blocks spit out.
Step 4 — “Mining”, also known as block creation
We only have that one genesis block, and if we have more data we want to store and distribute, we need a way to include that into a new block. The question is how to create a new block while linking back to a previous one.
In the Bitcoin whitepaper, Satoshi describes it as the following. Note that ‘timestamp server’ is referred to as a ‘node’:
The solution we propose begins with a timestamp server. A timestamp server works by taking a hash of a block of items to be timestamped and widely publishing the hash... The timestamp proves that the data must have existed at the time, obviously, in order to get into the hash. Each timestamp includes the previous timestamp in its hash, forming a chain, with each additional timestamp reinforcing the ones before it.
Here’s a screenshot of the picture below the description.
A quick summary of the above: in order to link the blocks together, we create a hash of the information of a new block that includes the time of block creation, the hash of the previous block, and the information in the block. We’ll refer to this group of information as the block’s ‘header’. In this way, we’re able to verify a block’s truthfulness by running through all the hashes before a block and validating the sequence.
For our case, the header we're creating is adding the string values together into a giant string. The data we're including is:
- Index, meaning which number of block this will be
- Previous block’s hash
- The data, in this case is just random strings. For bitcoin, this is referred to as the Merkle root, which is info about the transactions
- The timestamp of when we’re mining the block
def generate_header(index, prev_hash, data, timestamp):return str(index) + prev_hash + data + str(timestamp)
Before getting confused, adding the strings of information together isn’t required to create a header. The requirement is that everyone knows how to generate a block’s header, and within the header is the previous block’s hash. This is so everyone can confirm the correct hash for the new block, and validate the link between the two blocks.
The Bitcoin header is much more complex than combining strings. It uses hashes of data, times, and deals with how the bytes are stored in computer memory. But for now, adding strings suffices.
Once we have the header, we want to go through and calculate the validated hash, and by calculating the hash. In our hash calculation, we're going to be doing something slightly different than Bitcoin’s method, but we're still running the block header through the sha256 function.
def calculate_hash(index, prev_hash, data, timestamp, nonce):header_string = generate_header(index, prev_hash, data, timestamp, nonce)sha = hashlib.sha256()sha.update(header_string)return sha.hexdigest()
Finally, to mine the block we use the functions above to get a hash for the new block, store the hash in the new block, and then save that block to the chaindata directory.
node_blocks = sync.sync()def mine(last_block):index = int(last_block.index) + 1timestamp = date.datetime.now()data = "I block #%s" % (int(last_block.index) + 1) #random string for now, not transactionsprev_hash = last_block.hashblock_hash = calculate_hash(index, prev_hash, data, timestamp)block_data = {}block_data['index'] = int(last_block.index) + 1block_data['timestamp'] = date.datetime.now()block_data['data'] = "I block #%s" % last_block.indexblock_data['prev_hash'] = last_block.hashblock_data['hash'] = block_hashreturn Block(block_data)def save_block(block):chaindata_dir = 'chaindata'filename = '%s/%s.json' % (chaindata_dir, block.index)with open(filename, 'w') as block_file:print new_block.__dict__()json.dump(block.__dict__(), block_file)if __name__ == '__main__':last_block = node_blocks[-1]new_block = mine(last_block)save_block(new_block)
Tada! Though with this type of block creation, whoever has the fastest CPU is able to create a chain that’s the longest which other nodes would conceive as true. We need some way to slow down block creation and confirm each other before moving towards the next block.
Step 5 — Proof-of-Work
In order to do the slowdown, we're throwing in Proof-of-Work as Bitcoin does. Proof of Stake is another way you’ll see blockchains use to get consensus, but for this we'll go with POW.
The way to do this is to adjust the requirement that a block’s hash has certain properties. Like bitcoin, we're going to make sure that the hash begins with a certain number of zeros before you can move on to the next one. The way to do this is to throw on one more piece of information into the header — a nonce.
def generate_header(index, prev_hash, data, timestamp, nonce):return str(index) + prev_hash + data + str(timestamp) + str(nonce)
Now the mining function is adjusted to create the hash, but if the block’s hash doesn’t lead with enough zeros, we increment the nonce value, create the new header, calculate the new hash and check to see if that leads with enough zeros.
NUM_ZEROS = 4def mine(last_block):index = int(last_block.index) + 1timestamp = date.datetime.now()data = "I block #%s" % (int(last_block.index) + 1) #random string for now, not transactionsprev_hash = last_block.hashnonce = 0block_hash = calculate_hash(index, prev_hash, data, timestamp, nonce)while str(block_hash[0:NUM_ZEROS]) != '0' * NUM_ZEROS:nonce += 1block_hash = calculate_hash(index, prev_hash, data, timestamp, nonce)block_data = {}block_data['index'] = int(last_block.index) + 1block_data['timestamp'] = date.datetime.now()block_data['data'] = "I block #%s" % last_block.indexblock_data['prev_hash'] = last_block.hashblock_data['hash'] = block_hashblock_data['nonce'] = noncereturn Block(block_data)
Excellent. This new block contains the valid nonce value so other nodes can validate the hash. We can generate, save, and distribute this new block to the rest.
Summary
And that’s it! For now. There are tons of questions and features for this blockchain that we haven’t included.
For example, how do other nodes become involved? How would nodes transfer data that they want included in a block? How do we store the information in the block other than just a giant string Is there a better type of header that doesn’t include that giant data string?