URL Scheme

URLs in DAPP browsers

URLs should contain all allowable urls in browsers and all http(s) urls that resolve in a usual browser must resolve the same way.

All urls not conforming to the existing urls scheme must still resemble the current urls scheme.

<protocol>://<source>/<path>

Irrespective of the main protocol, <source> should be resolved with our version of DNS (NameReg (ename registration contract on ethereum) and/or via swarm signed version stream.

In the special case of the bzz protocol, <source> must resolve to a Swarm hash of the content (in other words, the root key of the content). This content is assumed to be of mime type application/bzz-sitemap+json the only mime-type directly handled by Swarm.

Swarm manifests

A Swarm manifest is a json formatted description of url routing. The swarm manifest allows swarm documents to act as file systems or webservers. Their mime type is application/bzz-sitemap+json Manifest has the following attributes:

  • entries: an array of route configurations
  • host: eth host name registered (or to register) with NameReg
  • number: position index (increasing integers) of manifest within channel,
  • auth: devp2p cryptohandshake public key(s), signed number
  • first: root key of initial state of the stream
  • previous: previous state of stream

A route descriptor manifest entry json object has the following attributes:

  • path: a path relative to the url that resolved to the manifest (optional, with empty default)
  • hash: key of the content to be looked up by swarm (optional)
  • link: relative path or external link (optional)
  • contentType: mime type of the content (optional, application/bzz-server by default)
  • status: optional http status code to pass back to the server (optional, 200 by default)
  • cache: cache entry, etag? and other header options (optional)
  • www: alternative old web address that the route replicates: e.g., http://eth:bzz@google.com (optional)

If path is an empty string or is missing, the path matches the document-root of the DAPP. If contentType is empty or missing, manifest if assumed by default.

(NOTE: Unclear. When no path matches and there is no fallback path e.g. a root / path with hash specified, it should return a simple 404 status code)

Url resolution

Given

 bzz://<source>/<path>

in the browser, the following steps need to happen:

  • the browser sees that its bzz protocol <source>/<path> is passed to the bzz protocol handler,
  • the handler checks if <source> is a hash. If not it resolves to a hash via NameReg and signed version table, see below
  • the bzz protocol handler first retrieves the content for the hash (with integrity check) which it interprets as a manifest file (application/bzz-sitemap+json),
  • this manifest file is then parsed, read and the json array element with the longest prefix p of <path> is looked up. I.e., p is the longest prefix such that <path> == p'/p''. (If the longest prefix is 0 length, the row with <path> == "" (or left out) is chosen.)
  • as a special case, trailing forward slashes are ignored so all variants will match the directory,
  • the protocol then looks up content for p' and serves it to the browser together with the status code and content type.
  • if content is of type manifest, bzz retrieves it and repeats the steps using p'' to match the manifest’s <path> values against,
  • the url relative path is set to p''
  • if the url looked up is an old-world http site, then a standard http client call is sufficient.

Example 1

{
   entries: [
     {
        "path": "cv.pdf",
        "contentType": "document/pdf",
        "hash": "sdfhsd76ftsd86ft76sdgf78h7tg", 
      }
   ]
}

where the hash is the hash of the actual file cv.pdf.

If this manifest hashes to dafghjfgsdgfjfgsdjfgsd, then bzz://dafghjfgsdgfjfgsdjfgsd/cv.pdf will serve cv.pdf

Now you can register the manifest hash with NameReg to resolve my-website the file as follows:

   http://my-website/cv.pdf 

serves cv.pdf

Example 2

Imagine you have a DAPP called chat and host it under
your local directory <dir> looks like this:

  index.html
  img/logo.gif
  img/avatars/fefe.jpg
  img/avatars/index.html

the webserver has the following routing rules:

  -> <dir>/index.html 
  <unkwown> -> <dir>/index.html # where <unknown> != index.html
  img/logo.gif -> <dir>/img/logo.gif 
  img/avatars -> <dir>img/avatars/index.html
  img/avatars/fefe.jpg -> <dir>/img/avatars/fefe.jpg
  img/avatars/<unknown>.jpg <dir>/img/avatars/index.html # where <unknown> != fefe.jpg

Now you can alternatively host your app in Swarm by creating the following manifest:

{ 
  "entries": [
  { "hash": HASH(<dir>/index.html) },
  { "path": "index.html", "hash": HASH(<dir>/index.html) },
  { "path": "img/logo.gif", "hash": HASH(<dir>/img/logo.gif) },
  { "path": "img/avatars/", "hash": HASH(<dir>/img/avatars/index.html) },
  { "path": "img/avatars/fefe.jpg", "hash": HASH(img/avatars/fefe.jpg) }
  ]
}

Swarm webservers

Swarm webservers are simply bzz site manifest files routing relative paths to static assets. Manifest route entries specify metadata: http header values, etag, redirects, links, etc.

In a typical scenario, the developer has a website within a working copy directory on their dev environment and they want to create a decentralised version of their site.

They then register the host domain with ethereum NameReg or swarm signed version stream, upload all desired static assets to swarm, and produce a site manifest.

In order to facilitate the creation of the manifest file for existing web projects, a native API and a command line utility are provided to automatically generate manifest files from a directory.

ArcHive API

A native API and a command line utility are provided to automatically swarmify document collections. constructor parameters:

  • template: manifest template: the entries found in the directory scan are merged into this template to yield the resulting site-map. Note that this template can be considered a config file to the archiver.

The archiver can be called multiple times scanning multiple directories.

runtime parameters:

  • path: path to directory relative routes in the template matched against directory paths under path (optional, ‘.’ by default).
  • not-found: errorchange to be used when asset is not found: for 404, (optional, index.html)
  • register-names use eth NameReg to register public key and this version is pushed to swarm mutable store (optional, false)
  • without-scan only consider paths given in template (optional, by default false: in template, scan directory and add/merge all readable content to manifest)
  • without-upload: files are not uploaded, only hashes are calculated and manifest is created (optional, false, upload every asset to swarm)

If both without-scan and without-upload are omitted then path is used to associate files, extend the manifest entries, and upload content.

if register-names is set all named nodes.

Examples

{
   "entries": [
      { 
         "path": "chat",
         "hash": "sdfhsd76ftsd86ft76sdgf78h7tg",
         "status": 200,
         "contentType": "document/pdf"
      },
      ...
   ]
}

Without swarm, the zip fallback

namereg resolution:

contentOf('eth/wallet') -> 324234kj23h4kj2h3kj423kj4h23

This name reg has also a urlOf where it can find the file (e.g. from a raw pastebin)

It then downloads the file, extracts it and resolves all relative/absolute paths, based on the manifest it finds in it.

For the developer, the upload mechanism in mix will be the same, as he chooses a folder and can provide a serverconfig.json (or manfiest)

The only difference is the lookup and where it gets the files from.

swarm -> content hashes
before swarm -> zip file content

And both are resolved through the same manifest scheme

Server config examples:

URL: bzz://dsf32f3cdsfsd/somefolder/other Same as: eth://myname.reggae/somefolder/other

We should also map folder with and without “/” so that the path lookup for path: “/something/myfolder” is the same as “/something/myfolder/”

{
  previous: 'jgjgj67576576576567ytjy',
  first: 'ds564rh5656hhfghfg',
  entries:[{
    // Custom error page
    path: '/i18n/',
    file: '/errorpages/404.html',
    // parses "file" when processing the folder and add: hash: '7685trgdrreewr34f34', contentType: 'text/html'
    status: 404

  },{
    // custom fallback file for this folder: "/images/sdffsdfds/"
    path: '/images/sdffsdfds/',
    file: '/index.html',
    // parses "file" when processing the folder and add: hash: '345678678678678678tryrty', contentType: 'text/html'

  },{
    // custom fallback file with custom header.
    path: '/',
    file: '/index.html',
    // parses "file" when processing the folder and add: hash: '434534534f34k234234hrkj34hkjrh34', contentType: 'text/html'
    status: 500

  },{
    // redirect (changing url after?)
    path: '/somefolder/',
    redirect: 'http://google.com'

  },{
    // linking?
    path: '/somefolder/other/',
    link: 'bzz://43greg45gerg5t45gerge/chat/' // hash to another manifest

  },{
    // downloading a file by pointing to a folder
    path: '/somefolder/other/',
    file: '/mybook.pdf',
    // parses "file" when processing the folder and add: hash: '645325ytrhfgdge4tgre43f34', BUT no contentType, as its already present
    contentType: 'application/octet-stream' // trigger a download in the browser for this link)

  },{
    // downloading
    path: '/test.html',
    file: '/test.html',
    // parses "file" when processing the folder and add: hash: '645325ytrhfgdge4tgre43f34', BUT no contentType, as its already present
    contentType: 'application/octet-stream' // trigger a download in the browser for this link)

  // automatic generated files
  },{
    path: '/i18n/app.en.json',
    hash: '456yrtgfds43534t45',
    contentType: 'text/json',
  },{
    path: '/somefolder/other/image.png',
    hash: '434534534f34khrkj34hkjrh34',
    contentType: 'image/png',
  },{
    path: '/somefolder/other/343242.png',
    hash: '434534534f34k234234hrkj34hkjrh34',
    contentType: 'image/png',
  },{
    path: '/somefold/frau.png',
    hash: 'sdfsdfsdfsdfsdfsdfsd',
    contentType: 'image/png',
  }]
}