The Web3 Video Stack

Charting the infrastructure for a decentralized mediaverse.

Felipe

Published in

Token Economy

14 min readAug 16, 2018

Written with Doug Petkanics and Jelle Gerbrandy.

There are dozens of projects around the world working to enable decentralized video applications. Why? For two types of reasons.

First are the short-to-mid-term cost-efficiency gains that can arise from “distributing work”. Peer-to-peer video streaming has already been around for some time, and can increase cost-efficiency of content delivery, in certain cases. It has even spawned a few novel businesses. Skype bootstrapped itself off p2p video-conferencing. Peer5 saves publisher’s costs by making every video player a peer that adds up to a CDN. It turns out decentralisation can improve cost-efficiency in other layers of the video stack, too. For example, research from the Livepeer team anticipates GPU “video miners” can earn a significant income boost by doing transcoding services, while still charging an order of magnitude less than centralised providers.

Second are the longer-term, societal improvements that the vision of Web3 brings along. Web3 is not about speed, performance, or convenience. It is about open, common infrastructure for applications that can deliver usability while not compromising on transparency, equality of rights and trustlessness.

From capture to delivery, streaming video requires a complex thread of systems. As more foundational protocols go live throughout 2018, the build-out phase of infrastructure for video dApps approaches a maturing point. Autonomous media applications are likely years away from their “iPhone moment”. But the path to get there is becoming clearer.

The protocol stack

The stack model is useful for developers to chart dependencies for a given application, avoiding the implementation of duplicate functionality. If a protocol provides a single feature reliably, the one above can leverage it, and focus on optimising something else.

It’s still early days for code that carries money around, though. Decentralized architectures are being devised by teams working, at the same time, on varied layers of an emergent stack.

There has been some effort, lately, to map the current state of this stack. Most approach the issue from a generic perspective. We thought it’d be useful to offer a depiction of the web3 stack as needed to support video applications, specifically. (For other diagrams of the generic “web3 stack”, see the appendix¹). Without further ado:

A somewhat minimalist depiction of “the web3 video stack”.

1. NETWORKING + TRANSPORT

Web3 is being built on the same networking/transport stack we all know (TCP/IP). But it does represents a radical rethinking of the client-server model. In such, a client application communicates with small number of previously known services (a database server, perhaps a CDN network, or some third party service that tracks your cookies and serves ads) over a set of known protocols, that are chosen by the developers. In an open p2p network, an application typically communicates with peers that are not known in advance (i.e. need to be “discovered”), have varying quality (in terms of speed, reliability), and may use a variety of communication protocols. In web3, new libraries and APIs are needed to navigate such complexities.

Libp2p is a networking suite that bundles building blocks (e.g. peer discovery, stream multiplexing, encryption, PubSub) into modules with common interfaces, while also allowing one to choose among multiple transports (TCP, WebRTC, WebSockets, UDP or others).

Devp2p, Ethereum’s native networking layer, eschews multi-transport support and modularity in favor of a simple implementation that meets the needs of server-run Ethereum nodes. The tradeoffs are that you can’t get an Ethereum node running in a web browser talking to server-based Ethereum nodes over devp2p, but this is backed up by the assumption that browsers make poor Ethereum full nodes anyway, since they can’t sync the chain.

via Giphy

WebRTC is worth a mention. Its transport protocol and libraries give web browsers and mobile apps the powers of peer-to-peer real-time communication, allowing for video streaming directly from browser to browser. With webRTC, not only dedicated apps, but also also in-browser applications (e.g. web sites) can be first-class citizens in a p2p network. It is already used by companies such as Peer5 to accelerate existing CDN solutions.

2. STATE MACHINE + CONSENSUS

Decentralized applications need a common ledger to read/write relevant state changes, together with a mechanism to ensure consensus on the order and validity of transactions. This is especially the case when it comes to services provided by networks of competing agents, where fraud proofs, dispute resolution, accounting or settlement are ultimately done on chain.

There are notable tradeoffs at this level. With only a few thousand daily users of dApps across all smart contract platforms combined, it’s still very early to point to any “winning” platform.

The EVM and ethereum (which are leveraged by Livepeer, Paratii, PopChest);
Rootstock + Bitcoin (Flixxo’s heading this way);
Steem (Viewly, DTube, DLive);
EOS;

Worth noting, the blockchains cited above don’t currently scale well. Layer 2 scaling solutions are hereby treated as an extension to this layer, and explored more thoroughly in the appendix².

3. ORIGIN SERVICES

In industry there’s a layer referred to as video “origin services” — ingesting, processing, and seeding the content to a CDN. Often times a piece of software called a media server provides all these. A media server has all of the video specific application knowledge that makes streaming video different from delivering other types of static content over the internet:

How to ingest live video as it is being streamed into the server over potentially different protocols;
How to encode the video into the required outputs (more below in “Video Processing”);
How to serve the video out to different device types or content delivery networks that request it via different delivery formats;
Any additional video specific features like closed captioning support, DVR services, analytics, ad insertion, image detection, etc.

Ingesting footage — an artistic representation (via Giphy)

The basis for this functionality is an open source media server that can be distributed to any node on a network. For such network to be truly robust at origin services, location matters, as proximity to a broadcaster can make a tremendous difference in stream quality, and reduce buffering. Ideally, a protocol can incentivize the running of these nodes across different regions, for broadcasters and streamers over the world to have high performance and low latency.

Red5 Media Server (open source media server not focused on decentralized use cases);
LPMS (Livepeer media server, can be used standalone or in a decentralized node).

4. VIDEO PROCESSING

“Processing” is a generic term for a set of jobs very specific to the realm of A/V streaming. Transcoding, included there along with transrating and transmuxing, is the computationally heavy step in a video pipeline. In human terms, the purpose of transcoding is to take the input video stream from a phone, camera, or laptop, and convert it into all the formats and bitrates necessary to reach every device on the planet, at every connection speed.

Most open transcoding solutions, like ffmpeg, are very low level, requiring significant engineering to scale into production systems. Most commercial transcoding services are closed source, proprietary, and expensive.

By leveraging decentralization of compute resources, and cryptoeconomically incentivized protocols, it’s possible to create a network with the convenience and scalability of cloud transcoding services, but the cost profile of running on bare metal hardware. This would allow anyone to send video into the network to be transcoded, and for its nodes to compete in processing the video into all the required outputs, for barely more than the cost of electricity.

A key feature to enable, here, is Adaptive Bitrate Streaming (ABS)³. By converting one stream into a number of different bitrates, proper transcoding allows smart video players to switch between different versions of the video depending upon the available bandwidth and connection speed. When you notice your Netflix or Youtube video quality increase or decrease automatically as you’re watching, this is often an example of ABS in action. Without ABS, you leave viewership on the table, as users with slower connections simply can not access the original stream at high quality.

Livepeer (protocol for incentivized transcoding for live streams and video on demand built on Ethereum);
VideoCoin (blockchain that supports encoding transaction types).

5. STORAGE

At this point, differences between live streaming and video-on-demand accentuate. Live only needs to care about the source of the stream being broadcast, in principle. VoD apps need somewhere (or some unique identifier) to fetch files from — “the hard drive”s of web3.

Peer-to-peer storage’s public perception have swung violently in recent past. Going from the “let’s recycle all the unused disk space in the world” frenzy of a couple years ago to the realisation that “AWS & co. are just really, really cheap and hard to compete against”, even for the average “under-construction” dApp. The advantages of p2p storage come in place when one needs true uncensorability, and doesn’t need the ability to erase files or that of handling granular access permissions.

Today, developers can store videos on a range of p2p file systems. The differences in the networks below come down to how privacy is handled; how prices are determined; the way proofs are structured; and else. These are all working, yet none has got their native incentives in fully functional mode.

IPFS (uniquely addressed chunks, differently than torrent files, making for a more “granular market”; block-level deduplication; ability to set up clusters; and…);
Filecoin (… a native incentive system, which is not live yet. Interestingly enough, both are decoupled, and the first can exist without the second);
Swarm (Ethereum’s original native storage system);
Storj (emphasis on end-to-end encryption and security);
Sia (similar to the above, has recently put forth a tutorial on streaming).

Decentralised Storage Networks are complemented/extended by technologies that provide access control, and allow applications to enforce permissions (e.g. DRM or pay-per-view schemes). Unlock is an example of a project working on this matter.

6. DELIVERY

Content delivery is handled by CDNs like Akamai, Cloudfront, and Fastly. They’ve built up networks of servers placed crucially at edge locations around the world, and infrastructure to distribute videos to these during times of demand. These companies charge broadcasters by the Gigabyte of data delivered. So, as more viewers watch content, the delivery charges go up.

Recently, peer to peer technology has been disruptive here. Companies like Peer5 and Streamroot can offset up to 85% of the traffic (and cost) during peak events like the World Cup. While this is a big improvement, it still requires a traditional CDN to handle the remaining 15–50% of the traffic.

Both these CDNs and “P2P accelerators” are operating as centralized companies that charge for bandwidth. Protocols like Bittorrent, and its web based implementation, WebTorrent, replace pay-for-bandwidth with tit-for-tat bandwidth accounting, where viewers provide bandwidth in exchange for content. This works reasonably well for popular on-demand content, but suffers for low-latency cases (like live streaming), and long tail content.

Projects who are attempting to decentralize CDNs for both general purpose file delivery, and video streaming (live + on demand), have the uphill battle of incentivizing guaranteed file availability, performance, and delivery, on a global p2p network. While Swarm and IPFS/Filecoin have made great progress on the on-demand use case, in which an entire file is requested to be downloaded before playback, for streaming and live use cases there’s still a missing layer in the stack: an origin point for serving live content alongside decentralized p2p content accelerators.

BitTorrent (the original, permissionless content-addressed delivery network);
JoyStream (paid seeding for Bittorrent);
Filecoin (could theoretically serve as an origin point for VOD content, but unclear about live content);
Swarm (Ethereum’s web3 infrastructure project with a working group track for streaming);
Theta Labs (video blockchain with PoW certificates that are used to pay for bandwidth).

7. INDEXING

Web3 needs solutions for indexing and query optimisation — equivalent to the built-in indexing functionality of DB systems like postgres or mongodb, or standalone solutions such as Solr/Lucene. The EVM implementations come already with a primitive indexing system based on events, but it’s certainly not enough for, say, full-text search in large data sets on IPFS or Swarm.

In the context of dApps, it’s been common practice for teams to build their own indexing servers based on classic technology. Blockchain explorers such as etherscan.io or etherchain.org are typical examples - they pull data from Ethereum, store it in a database, and expose it over an API.

These solutions effectively reintroduce the defects of centralized technology web3 is meant to overcome. In particular, the need to trust a 3rd party to not censor, or falsely represent your data. We need decentralized indexing solutions that provide results that are provably correct.

Proving correctness of query results consists of two parts: the results must all be actually in the original data set, and they must be complete (no results omitted). Proving correctness is relatively easy: for example, if the data are stored in a Merkle tree, each individual result comes with a Merkle proof of its existence in the tree. Proving completeness is another story.

The Graph (OS indexing engine for Ethereum and IPFS, evolving into a distributed query protocol);
Fluence (database network focused on querying);
TxHash.com (contract tracking / indexing as a service);
Supermax (contract tracking / indexing as a service);
Cleargraph (GraphQL ethereum index, also offers indexing as a service);
BigchainDB (“decentralized database”).

8. RELEVANCE / REPUTATION

On today’s web, relevance engines (assessing the “value” of content in a given context) and reputation systems (assessing the “value” of users) are generally closed source and under constant change. For reference, see the weird phenomena arising from YouTube’s recommendation algorithms.

Excessive moderation suffocates public online environments; loose moderation allows for toxicity to spread. Video platforms of all kinds strive to find a balance in between.

There is a wide range of “curation tasks” one must be supplied with: maintaining generic white-or-black-lists; organizing these into feeds; filtering comments; providing context-specific recommendations.

Scaling these activities involves hard tradeoffs, even on centralized systems. Facebook outsources most of its curation to underpaid workers in the Philippines; YouTube has reportedly just hired some thousands of moderators after polemics with their live streaming feature.

There’s no go-to solution for dApp developers to filter the content they stream. It’s unclear what kind of (crypto-economic?) mechanisms are suitable for the tasks at hand. It seems natural, though, that incentivizing an unbounded network of curators and offering easy pluggability to upper-level applications is the way to go. Distributed curation design patterns are frequently criticised for not taking reputation and sybil-resistance into account — their intertwining is our reason for grouping both here.

Steem (has a curation protocol built-in its blockchain);
TCRs (a promising design pattern that deserves some testing before being taken as an actual primitive);
Relevant (still focused on building distributed curation work for its flagship application, in the realm of social news);
Paratii (idem, for videos and playlists).

9. APPLICATION

Apparently, no shortage of attempts to build “the decentralized YouTube”.

Applications tie access to the relevant protocols below them, and present an experience to end-users in the form of interface plus logic.

The immaturity of the space makes it tempting for app developers to swallow all challenges ahead, and build some functionality that could be outsourced to another protocol. Also, go-to-market strategizing implies lower-level protocols cannot launch in isolation — most of them deploy at least an initial implementation to prove technical relevance and attract developers. Examples are Livepeer, Theta, and Videocoin, launching with Livepeer.tv, SLIVER.tv, and Live Planet, respectively.

A list and mapping of video-oriented applications in the space can be found here (dated Q4 2017):

The State of Decentralised Video Q4 2017

Tubes and Flixes will never be the same. Here’s why.

medium.com

Application developers owe a lot to libraries and frameworks that make their processes easier. These can significantly speed up the journey of getting a dApp to production, facilitating the management of keys, connection to nodes, importing/exporting wallets, etc:

📝 Final Notes

Web3 isn’t a fixed future, more than it is a chance to reform pieces of the internet. There’s no definite depiction of its stack. We tried to offer a useful mental model, while deliberately keeping it simple. It may help developers starting to build video dApps, product teams looking to “outsource part of their video pipeline”, or analysts familiarising themselves with the space.

Worth remembering, decentralisation is not an end in itself. Neither it is binary. A little decentralisation can go a long way. Compare how much better the online experience became when we “suddenly” had four major browsers instead of just one.

The goal of web3 shouldn’t be to replace specific incumbents, but rather ensure that developers have access to tooling & infrastructure for applications that don’t depend on monopolistic agents to operate. This exercise was made not to speculate on which layers are more likely to accrue value, but mostly to shine light on those where there are still big gaps to fill. The ecosystem will thrive as fast as its capacity to coordinate efforts.

The diagrams and descriptions above are non-exhaustive, and certainly deserve critique. Please shout back your feedback.

🎬 Appendix

¹ - Previous depictions of the web3 stack

Some great thinkers in the space have previously tinkered with depictions of the web3 stack, mostly from a generic (as opposed to industry-specific) perspective. A few recommended reads are Stephan Tual’s “Web 3.0 Revisited”, BigChainDB’s “Blockchain Infrastructure Framing: A First Principles Framing”, and Kyle Samani’s “The Web3 Stack”.

The Web3 Foundation, whose mission is to steward technologies and applications in the fields of decentralized web software protocols, has a comprehensive wiki where one can dive into individual pieces of a generalised stack:

² - Layer 2

Blockchains are referred to as the “base-layer” for this novel breed of applications, even though, technically, there’s much more underneath. Layer 2 is a term for scaling solutions that relieve the root chain by trying to use it as “a last resource” when it comes to settling economic interactions, as much as possible.

Layer 2 scaling comes in the form of design patterns, standard implementations and even service providers, extending blockchain protocols without touching their core features. Value capture at this level is still a pretty much open question: are all “core components” of a blockchain ultimately meant to be free and standardised? Do inherent tradeoffs mean we are heading towards scalability as a service? Are relayer-like entities going to play a pivotal role here? Note most efforts below are not production-ready.

Plasma Chains / Cash (Loom Network, OmiseGO, Bankex);
Sharding (Prysmatic Labs, Drops of Diamond, Status, PegaSys are all pursuing different approaches — e.g. mobile-oriented and enterprise-oriented, for the last two);
State / payment channels (SpankChain, FunFair; Perun; L4; Machinomy provides a micropayments SDK; Connext is piloting a program to set up & access state channel hubs);
Off-chain verified computation (TrueBit; Oraclize; Intel SGX).

³ - ABS

Adaptive Bitrate Streaming can be seen as an extension to the processing layer. ABS is a need, in practice, for most end-user, real-world applications. It consists of constantly optimising content delivery, throughout a stream, to get a device the best possible bitrate for its connection, at every given moment. Of course, this also implies proper work has been done at the encoding level. That’s just the surface of what eventually became a very complicated science.

The challenge grows when there’s no central place to get all these bitrates from, but rather a heterogeneous mesh where some of your peers may have the same file in very different formats. The ideal bitrate does not depend solely on you (e.g. speed of connection), but on the speeds and availabilities of peers nearby.

Written with Doug Petkanics and Jelle Gerbrandy.