Aragon Network IPFS Pinning

#1

While it is evident that all Aragon apps require IPFS pinning for app assets, the following Aragon apps will require an IPFS pinning solution for user-generated content:

  • Pando
  • Projects
  • Home
  • Espresso
  • P2P models Wiki
  • Discussions
  • Profiles

…and the list will grow. Right now the go-to pinning solution for some of these projects is Infura as they have an open API, but this isn’t a sustainable solution. While Aragon One has a self-managed IPFS node, there is no public API for additional Flock, Nest or third-party teams to utilize.

A Potential Solution

Requirements

  • Locally pinned files stay pinned when a user exits their session
  • Files are accessible, preferably with a fast load time
  • Files are verifiable, so that a user can verify the data retrieved is the data asked for
  • The architecture can handle outages - if an IPFS node goes down, we should be able to continue giving access to our files
  • Limit security vulnerabilities and expenses - only approved clients should be able to pin data on our node
  • Data adheres to open standards

An initial architecture could look like:

Client applications send requests to our backend server, where information gets pinned across multiple providers. Our backend validates the data adheres to specific open standards (schema.org to begin with) when applicable and then pins the files/data.

Fallback Procedures If Our Node Goes Down

It’s conceivable we’ll have some bumps along the road running our own node. We need to be able to handle those situations:

  1. Active monitoring services like http://pm2.keymetrics.io/ to ensure our node is up and running
  2. A queueing system to track unpinned files while nodes are down, and re-pin files once node is back online
  3. Modular infrastructure - the IPFS node will be hosted on its own (isolated from the REST API), so that the node going down will not cause the REST API to go down

Authentication

Initially, we can’t protect against every attack, so we’ll attempt to deter them as much as possible:

  1. Require JWT authentication with an ethereum signature, so we can block any single ethereum address from pinning too many files at once
  2. A developer must obtain a developer token from us to pin data on our node (this idea needs to be fleshed out more)
  3. Possibly whitelisting domains and/or IP addresses (although this would make it hard to handle DAO’s running locally on the browser or on a desktop app).
  4. Strict schema validation - we will only pin data that adheres to verified open standards when applicable. We won’t pin data that does not fit the schema validation. (maybe there are some caveats here, so people can use our pinning service but just not our node). The community can submit PRs to include new MongoDB schema types, which will automatically be checked for validation and added to possible pin types. (i.e. - an Aragon Network team can submit a new schema for a type of data they want pinned. Once approved, the backend will accept this schema and pin data that fit this schema appropriately).

Querying / Verifiability

It’s conceivable that applications will want to query specific terms or vocabularies within metadata and files. Our backend system can offer that reliably by caching files in MongoDB without jeopardizing the validity of the information.

Internally, MongoDB documents will use CID as their ID schema, so that content requests made by clients can be based on specific hashes. The clients can therefore verify the content by either re-computing these hashes, or fetching the same hash from the IPFS network and comparing results.

IPFS-Cluster

It seems like IPFS Cluster could be used in two productive ways:

  1. Open Work Labs/Autark runs a node cluster instead of a single node

Benefits:

  • Allows us to run more nodes and add extra pinning space (~7000 entries in pinset?)
  • Provides fault tolerance - if we replicate files across more than one node in the cluster, pinned files can stay available if/when a single node in the cluster goes down
  • Easier to scale as the network grows

Questions:

  • What additional expenses get incurred (both economically and computationally) when running a cluster versus a single node?
  • Can you turn a previously running node into a new cluster administrator without it going down?
  • Are there any additional devops complications associated with running a cluster? More things to break? Harder to monitor? Link
  • How hard is it to get up and running? Running cluster on Kubernetes, Local cluster setup and pinning example
  • What happens if the cluster’s leader goes down? (presumably the rest of the cluster stays alive perfectly fine)
  • How difficult would it be to scale from running our own cluster (and each individual node in the cluster) into option number 2 (below)…
  • How could we remove pinned data from the cluster that was pinned by a node whose access was revoked?
  1. Aragon community uses IPFS cluster as a pinning ring to allow Nest and Flock teams to collaboratively pin files together

Benefits:

  • The community has assurance that important files are pinned - without waiting for filecoin, paying for external services, or relying too heavily on centralized services
  • More teams participating and strengthening the IPFS network!

Questions:

  • (All the q’s above apply here too)
  • Who would be the cluster administrator? How hard would this responsibility be?
  • What additional security concerns need to be thought out?
  • How would we share secret keys with new partners?
  • How would we develop trust models among peers in the cluster? How would we revoke trust?
  • How large can we scale a cluster today? (this is currently not stress tested much)
  • How hard would it be to build services on top of the cluster (for example, database querying with mongo. More on this below)

Future

Allowing other external node providers to get involved in our cluster would also provide some complications when it comes to managing the MongoDB - we would have to listen to changes to the cluster pinset that weren’t emitted by our backend, and then update our db. Conversely, if a GET request is received for a file that’s not in our mongoDB, we could resolve the content through the network, and then store it in mongo.

Textile Cafes also provide a lot of great offline services for pinning nodes. The problems they’ve solved in the mobile world have similarities to that of DAO’s - local DAO’s signing on and offline // pinning files locally and backing them up on an “always on” pinning node. If each DAO ran a textile node, and we hosted a cluster of textile cafes, it’s likely that we would have a very robust and distributed architecture. This would take a significant amount of dev work so something to consider for the future. In the meantime we’ll see if we can use cafes on their own as a better pinning node.

Here’s what an architecture with Textile could look like:

Filecoin

We are very interested in building a filecoin integration in the future. A backend server to manage filecoin seems like a great place to start because it abstracts a lot of the UX and technical complications away from Aragon clients - no key pair management, no buy/ask spreads…etc. It’s conceivable that a user friendly initial solution is to build a meta-transaction layer on top of filecoin - where DAOs can pay for filecoin storage and retrieval through our backend layer, where most of the complexities are handled by us.

A DAO app that manages filecoin payments in a fully distributed manner seems like an interesting future idea too. It makes sense to look into integrations between ethereum smart contracts and the filecoin blockchain, to see if any additional requirements would be burdened on the user (like a Filecoin metamask requirement for example). Something definitely worth exploring.

Appendix with more information and research

Reference links








3 Likes

Proposed design for a DAO's home page within Aragon
#2

Very interesting suggestion Jon. Can you elaborate further on the role of MongoDB in the suggested architecture?

Introducing another stateful service such as MongoDB in addition to IPFS increases operational complexity and along with that the risk. Additionally MongoDB is by design a centralised solution. If it ever becomes a source of truth, we’ve defeated the goal of decentralisation by design. I haven’t worked with IPFS Cluster, but it appears to solve some of the requirements mentioned.

I have a couple of high level questions:

  1. Both caching and pinning are mentioned. Could you clarify a bit what you mean? For example If an IPFS node is guaranteed to pin a file, would you ever add another “caching” layer on top? If so, what type of caching would that be? an in-memory to save a trip to the file system? That’s something that MongoDB won’t solve.
  2. What would the API allow querying? Does it include the actual contents of the pinned files?

Perhaps we could decouple the two goals/requirements at play, namely:

  • Ensuring high availability and safe replication of user content in a way that:
    • Distributes the storage/computational burden
    • Allows for network growth, e.g. Aragon Black can easily
  • Provide an aggregation/caching/querying layer to IPFS content. This seems along the lines of The Graph.

What additional expenses get incurred (both economically and computationally) when running a cluster versus a single node?

I think the most salient variable is the amount of data/content replicated. Running multiple IPFS nodes is “cheap”, barring the need for separate nodes in order to ensure high availability (protecting from node down time).

How hard is it to get up and running?

Any self operated stateful distributed system requires specialised knowledge to run and operate. These days with Kubernetes it’s much easier. This applies to running a highly available Mongo replica set too.

A pinning ring seems very close to our collective needs barring the second requirement (aggregation/caching).

Who would be the cluster administrator? How hard would this responsibility be?

It’d be interesting if we had multiple rings for each team to manage while allowing any IPFS node to join as many rings. Not sure if that’s currently possible technically but it’s an interesting idea. By analogy: every team publishes a torrent feed and who ever has mutual interests replicates/seeds those torrents.

4 Likes

#3

Hey !

Happy to see the IPFS pinning issue discussed :slight_smile: Here are a couple of remarks / questions.

EDIT. I didn’t see @Daniel post before I write that post. There is some redundancy but I leave my post here as it is as a +1 to Daniel’s comments :slight_smile:

IPFS Cluster

I think the use of an IPFS Cluster is a must-go here. This is something we have already researched a little on our side - mostly for Pando - and we were on our way to deploy a IPFS Cluster node and propose A1 / Autark / AA to join forces :slight_smile: We can coordinate on this!

Authentication

This is something which has already been discussed during the All Devs Call so I lay it out there publicly for discussion purposes. It would be great to have the ‘subscription’ to this service handle at the DAO level. This would allow DAOs to pay fees to have their files pinned. I’m not sure such is a system is that easy to design, though. It would require a standard DAO-membership system to check whether an Ethereum key is a member of a DAO …

Server API

My main concern in the current state of the proposal is the use of a custom API. I feel like the Aragon IPFS service should be 100% compatible with the IPFS API. This way switching the service on / off would be as simple as modifying the gateway URL in the Aragon Client.

Also: I don’t see the rationale of using MongoDB here. I feel like it introduces a very centralized additionnal layer … Moreover I remember there are plans in the Aragon roadmap to provide Graph Protocol as caching / querying layer. Thus, I think the query layer of the IPFS service could somehow rely on Graph Protocol to avoid relying on tons of different API.

Client API

For now, each Aragon app has to deal with its own IPFS settings. This leads to a lot of redundancy both in terms of local storage and of library import: it’s a non-necessary overhead to re-import the js-ipfs-api tens of times :slight_smile: Maybe the IPFS API could be exposed directly at the wrapper / app level. This way, user would just have to configure their IPFS settings once - at the client level, as it is now - and all apps would be up-to-date.

5 Likes

#4

Hey @daniel and @osarrouy - thank you for the feedback!

Before diving into any specifics, I just want to point out a few high level points.

First, this spec is framed as a near-term solution (achievable within the next 1-3 months) and is intentionally a partially distributed (aka “web2.5”) solution. The spec would probably look much different if a pinning solution was the only project our team was working on, and/or time was less of a factor :).

Second, @daniel your multiple pinning ring idea and @osarrouy your pay to pin files idea are both awesome!! I see these as longer term strategies because they are a bit more involved. Along the way I don’t feel like it’s bad to add some centralized components and remove them over time as new technologies like Filecoin and Textile mature. This allows us to ship apps quicker, provide greater user / developer experiences, and aim for increased adoption.

Onto the specifics:

IPFS cluster

I’m hazy on how IPFS cluster authentication could integrate with the Aragon architecture. There isn’t too much written on IPFS cluster auth, and I’m not sure of the requirements on the Aragon side of things.

Some q’s:

  1. Are pinning solutions determined at the DAO or Aragon application level? For example - should the developers of the discussions application be responsible for figuring out a pinning solution for their application? Or should the owner of a DAO that uses the discussion app be responsible for figuring out the pinning solution for it? Or a combination of both? From today’s call, it seemed desirable to give users the option to choose their own provider.
  2. What additional challenges stem from users running a DAO locally? Should they too be able to pin data to the cluster?
  3. What steps are required in order to permit the next user, application, or DAO to pin data to the cluster? If this involves initializing a new node, is that ideal for each entity? Is there a simpler solution?

Relevant links:




The REST API is easier to build (in just a day or two I got most of the file and dag pinning stuff working) and provides more flexible and familiar (to me) authentication schemes. I feel much more confident in getting this work done within our grant timeline. Most importantly however, it preserves some super important benefits of the distributed web:

  • All data is content addressable - clients can easily verify the data they pinned and fetched hasn’t been tampered with by rehashing content or requesting it from another gateway.
  • Multiple node providers can be used to ensure high availability and safe replication of user content - we can even use nodes like Infura that aren’t part of a cluster.
MongoDB

Sorry for any confusion - I could’ve been more clear. The idea behind using mongo as a caching or (more specifically) querying layer is more relevant to my work with Autark on the discussions app.

Users should be able to query pinned content by parameters other than its hash, which requires database functionality. I assumed that this type of metadata querying would be useful for other apps as well.

For specifics on how we would use it, when we receive a request to pin JSON data (the dag/put endpoint), we:

  • Make sure the metadata is formed properly
  • Compute the hash without pinning
  • Find or create an entry in the database (where the appropriate fields are indexed and the content hash is the mongoDB identifier used to access this JSON blob)
  • Pin the blob
  • Send back the cid to the client

Still hacking on this so pardon any messy code or patterns :slight_smile:

endpoint for dag/put
handler for image metadata

As an example, a client would use this endpoint for pinning their 3box profile picture metadata or a photo in a discussion post. There’s no redis caching or anything performance related (apart from being able to send back pinned data if our node goes down).

If [mongo] ever becomes a source of truth, we’ve defeated the goal of decentralisation by design

Agreed. Although by removing any arbitrary ID schema in mongo and relying on content identifiers, I think the “global source of truth” is the content itself. Every piece of content that you receive, regardless of where it came from, can be hashed and/or requested directly from the IPFS network. Since the data will be pinned, any IPFS node (including local nodes on your computer) can retrieve it (although it could take some time), meaning it never has to come from our backend.

The Graph seems most helpful for querying on-chain data which requires content hashes (and even content itself that you wish to query by) to be stored on chain. Specifically talking about Discussions, it doesn’t seem like this is the right solution because each discussion post would have to be an ethereum event. Of course, there are other solutions that would require some more complex ethereum scaling logic, but that seems like overkill to get a discussion app out the door quickly.

2 Likes

#5

Hey @Schwartz10,

Thanks for the answer! Here are my reactions.

I can’t tell for Texture but when it comes to Filecoin I think the ability to pay for storage from / trough an Ethereum contract is very far down the road. So I wouldn’t expect too much on that side in the mid-term :slight_smile:

My personal take is that I find it tricky to start with a centralized solution and expect to decentralize it latter [because then you have to update / re-factor every components of your stack relying on it and it ends up being a mess :slight_smile: ]. I would personally prefer starting from scratch with a decentralized [and eventually less powerful] solution that we could improve over time: it’s always easier to add features than to remove / refactor features :slight_smile:

But I don’t know what the rest of the community think.

AFAIK there is no authentication enforced at the IPFS Cluster level. There is authentication between the cluster nodes through the use of a CLUSTER_SECRET but I don’t think there’s a way to authorize whitelisted users only to pin to the cluster [excepts on an IP address basis].

An IPFS Cluster endpoint is a daemon mimicking the IPFS API but enforcing a replication strategy before it forwards requests to an IPFS node:

User -> IPFS Cluster [replication strategy] -> IPFS Node

The way I see it, the Aragon IPFS Service nodes should enforce authentication through an additional front facing daemon fowarding the request [if authentication succeeds] to the IPFS Cluster [itself forwarding it to the IPFS node].

User -> Aragon IPFS Node [authentication strategy] -> IPFS Cluster [replication strategy] -> IPFS Node

If the Aragon IPFS Node exposes exactly the same interface than regular IPFS Nodes [with additional optional parameters like the ability to send an ETH signed message handled by the Aragon Client in the HTTP header] then the process can be absolutely transparent for both users and developers.

I think this should be handled at the DAO level: you pay or you don’t. Then it’s up to the end user to pick the right IPFS gateway in its settings tab. If he chooses to use the Aragon IPFS service gateway, and the DAO has paid, he’s fine. If he wanna choose a personnal gateway and make sure everything is pinned by its own means, that’s his problem :slight_smile:

If it’s just a matter of what IPFS gateway you use [localhost or Aragon IPFS Service] then I guess it should work for Aragon Desktop too.

Once again, it could just be a matter for the end user to pick the right IPFS gateway [and make sure the DAO has paid]

TheGraph has the ability to fetch - and index - IPFS content. I’m not sure there is a need for this content to be referenced on-chain - maybe there is now but I don’t think this is a technical limitation so they could enable it pretty quickly given how open they are to provide features requested by their users.

I think we could pretty easily use PubSub channel to push new data to be indexed. The only problem there could be here would be about the ordering of data given than PubSub itself does not provide CRDT [OrbitDB, built on top of pubsub, does though].

4 Likes

#6

I don’t think 2-3 Aragon Network companies running purely IPFS pinning nodes is any different than 2-3 Aragon Network companies running IPFS pinning nodes with an additional mongo (or db of choice layer). The fact that if the organizations that run pinning nodes go offline that data can be lost, means that both are centralized solutions.

Additionally, I do not think updating a couple lines of javascript to point to a different endpoint is something one would call refactoring (even if that is a requirement). I guess it really matters how the end points are initially developed. (I think…?) there can be ways to implement such solutions where it isn’t hard coded into the system, and it can be something that the DAO turns on or off. For example if your gateway is “https://ultrapinning.pin.me/ipfs/” the gateway knows you have requested the content to be pinned, as it is a centralized domain. Hence it can pin it to IPFS and do whatever else it wants with the data.

I don’t think Aragon app frontends need to necessarily serve content from such systems, but I do see the importance of having such systems especially for cases like issue management, discussion, etc until the tech that is in R&D takes off.

I actually think this will be interesting to A/B test – release products that have an on/off switch for the mongo (or SQL or whatever service) in addition to the IPFS and see what people choose, and see how much of a difference there is in performance. My hypothesis is that people will choose the solution that provides them with the more optimal user experience. I don’t think there is any near-term threat of data that is being both stored in IPFS and another storage system is at risk of being seized (or why that is necessarily considered more centralized than purely IPFS systems).

In reading more about the Graph, it seems they achieve that by Graph nodes needing to run postgres. Additionally, if you don’t have nodes hosting your subgraph, well I don’t think it really works. At the end of the day, you have to eventually pay (or rely) on some people or organizations to host / index / query your data.

This press release is also interesting:

“The company plans to offer both centralized and decentralized services in 2019.”

"Today, The Graph is publishing the spec for the hybrid decentralized protocol. This version allows anyone to run a node to provide indexing and query processing services to the network in exchange for payment,”

Fully decentralized capital allocation is so much more important in what we are building in Aragon. In the end: I think accelerating decentralization of non-financial systems is a goal to strive towards (and not an easy reality today), and being more open minded to hybrid solutions will get us there faster (whether it’s on an app-by-app basis, or an org-by-org basis).

2 Likes

#7

Yes. But if the system relies on an IPFS-compatible API users can always switch to another IPFS node / pinning service. If the system relies on a MongoDB-specific API - as the proposal stand for if I understood well - then users can’t do much if Aragon’s nodes go down because their app won’t even be compatible with the IPFS API.

This is only a couple of lines of javascript if the MongoDB-specific API is compatible with the IPFS API. Otherwise when we want to switch back from the MongoDB REST API to a more decentralized IPFS API we have to rewrite all the fetching / querying logic …

I agree that are two use cases here:

  1. Pinning IPFS files for app frontends or even DAOs manifestos, etc whose CID is referenced elsewhere [on-chain].
  2. Providing a query layer for data whose CID is not indexed on-chain [a database system].

The first problem can be solved easily by providing a default pinning IPFS Cluster to Aragon’s users. The second one is trickier because there exists no perfect off-the-shelf solution for a solid decentralized database system. My point is that there are two path regarding this indexation problem.

  1. We do use an [unperfect] IPFS-based solution like OrbitDB and try to improve it over time as Colony is doing.
  2. We decide to give-up on that for now because it’s too complicated / time consuming [it may be the case] and rely on a centralized database system.

The current proposal seem to go in the second direction. It can be a perfectly a acceptable choice but then I don’t see the point of building the database on top of IPFS. The reason is that the only way for users to access content will be to query the database anyhow, because there will be no other way for them to know what the CID of the content they wanna browse is.

Let’s take an example. If, as a user, I wanna read the content of discussion #34 in my app: how will I know to which CID the discussion refers to ? I can’t except by querying the MongoDB layer. So if the MongoDB layers goes down, I loose everything. Maybe the actual content of this discussion will still be pinned somewhere but I will never know what its CID is and how to access it …

The rationale is that the MongoDB layer will send back CID of contents so that users can check the integrity of the data through their hash. But still, as I have no way to actually know what was the CID of my data outside of a MongoDB query, what I am supposed to compare the returned CID to ?

I’m asking all this questions to try to clarify the path we would take there:

If it’s just to pin files users already know the CID of because they are referenced on chain then I don’t think we need a database layer [the pinning layer will be enough]. If it’s to provide indexation / search features to non-reference data then in the end I feel like the proposed architecture is not very different [from a security / centralization perspective] than just providing a regular MongoDB database [which can indeed be a temporary solution].

For now you can both run your own node or rely on their hosted services to do so. The good news is that they are working on a consensus system to run these nodes in a decentralized way at some point. I feel like this could be an acceptable solution because it would allow us to build on top of an already existing API whose underlying logic is supposed to improve [from a decentralization perspective] over time. So it would be like delegating the tricky decentralization R&D to The Graph :slight_smile: Of course their would always be a risk that they do not deliver their promise and we get stuck …

3 Likes

#8

Hey @osarrouy! Thank you for the well thought out responses. Yalda told me to take care of responding.

Lots to talk about here - I feel like it could be productive if we jumped on a call to talk through some of these questions and make an action plan. We could record and post it here for others to see.

I like how you divided things into two use cases:

  1. Pinning IPFS files for app frontends or even DAOs manifestos, etc whose CID is referenced elsewhere [on-chain].
  2. Providing a query layer for data whose CID is not indexed on-chain [a database system].

We’re mostly in-sync on the requirements and strategies behind #1 - providing a default pinning IPFS Cluster to Aragon’s users sounds great. My biggest questions are around immediately usable auth strategies.

The way I see it, the Aragon IPFS Service nodes should enforce authentication through an additional front facing daemon

Could you elaborate on this?

AFAIK there is no authentication enforced at the IPFS Cluster level. There is authentication between the cluster nodes through the use of a CLUSTER_SECRET but I don’t think there’s a way to authorize whitelisted users only to pin to the cluster [excepts on an IP address basis].

When configuring the service.json file of a cluster node, there’s a restapi key that takes a basic_auth_credentials object, which takes a username and password to authenticate requests to the REST API endpoints. https://github.com/ipfs/ipfs-cluster/issues/621

There’s a great description of running a pinning ring in one of ipfs-cluster’s PRs. I think the biggest challenge is handling each local instance of a DAO. I’m having trouble finding the link, but if I remember correctly, I believe Protocol Labs scaled the cluster up to like 10-12 peers and pinned ~7000 files. Is this a good immediate solution that we can easily scale when needed? How many users would want to connect their own peers?

I also remember reading they are looking for a partner / case study on something like this, which is super interesting.

If it’s just a matter of what IPFS gateway you use [localhost or Aragon IPFS Service] then I guess it should work for Aragon Desktop too. … Once again, it could just be a matter for the end user to pick the right IPFS gateway [and make sure the DAO has paid]

How do you distinguish between a DAO member and a hacker, both running on localhost with the same ports? The browser and desktop seem to pose different challenges - the desktop can run an ipfs-cluster node which could be manually added to the cluster, but how would we handle that in browser through the REST API with users who aren’t running nodes on their machine?

In terms of timing, I see advantages to both starting decentralized from scratch and improving, and building web2.5 solutions that scale to a fully distributed model over time. Might it be beneficial to work in parallel on storage solutions for Pando and Discussions rather than a single solution from the start? My top priority is committing to something I’m confident we can finish by the end of July. It seems like we should take a temperature check of what the community thinks, should we:

(a) use a more centralized layer on top of distributed storage for faster/smarter features (like querying) and reliability (things like Orbit and 3box aren’t production ready)

or

(b) start completely decentralized and build slower/more expensive (computation, gas fees) features

I tend to lean towards (a) because I think iterating quickly on the application level with user feedback will help inform the decisions we make at the protocol (storage) level. We may not know all the requirements yet. However, I’m also happy to go with (b) if that’s what the community prefers.

This is only a couple of lines of javascript if the MongoDB-specific API is compatible with the IPFS API. Otherwise when we want to switch back from the MongoDB REST API to a more decentralized IPFS API we have to rewrite all the fetching / querying logic …

Agreed. Although there will likely be required refactoring regardless of what database solution we choose. For example, if you choose to use OrbitDB instead, you run the risk of OrbitDB breaking their APIs (which they’ve warned about doing), or dealing with new bugs. Some of these newer technologies take extra time to become compatible with Aragon apps too due to the environment. Today, Orbit is not compatible with Aragon Apps because of the iframe sandboxing restrictions.

The current proposal seem to go in the second direction. It can be a perfectly acceptable choice but then I don’t see the point of building the database on top of IPFS. The reason is that the only way for users to access content will be to query the database anyhow …… I feel like the proposed architecture is not very different [from a security / centralization perspective] than just providing a regular MongoDB database [which can indeed be a temporary solution].

It’s beneficial and more decentralized to build the database in tandem with IPFS for a few reasons:

  1. We’re still using content hashes as identifiers to store and resolve content. This makes the refactor process way easier because client identification schema is also based on content addresses and not using arbitrary id’s.
  2. We could easily extend this architecture to log database snapshots as one big dag in an ethereum event. For example, every hour or so, we could gather all entries in specific tables, hash them together into an IPLD node, and send that data as a transaction. This provides alternative fetching mechanisms and integrates well with The Graph. Tools like Textile Threads will also help us construct and keep track of these dags in the near future.
  3. From a security perspective, using OrbitDB and/or 3Box is seemingly the most dangerous. Access control in OrbitDB is not very developer friendly and without it, anyone can post to anyone else’s OrbitDB store. It’s also easy to lose private keys with orbitDB once you clear your application cache. There was no account recovery mechanism as of a month or two ago.

The rationale is that the MongoDB layer will send back CID of contents so that users can check the integrity of the data through their hash. But still, as I have no way to actually know what was the CID of my data outside of a MongoDB query, what I am supposed to compare the returned CID to ?

This is true within the very short term, but if we implemented a service like I mentioned in point #2 above, users could compare what they receive from the database with the latest dag node stored on-chain.

An additional layer of verification can be provided by users signing cids of their discussion posts (which could encompass the time of posting). That way people can verify the messages and their relative orders. On a front-end client, we could implement a friendly UX that shows users which discussion posts are awaiting confirmation on the blockchain.

So if the MongoDB layers goes down, I lose everything.

True, in the very short term. But again, if we logged database snapshots in ethereum events we could defend against this.

The Graph has the ability to fetch - and index - IPFS content. I’m not sure there is a need for this content to be referenced on-chain - maybe there is now but I don’t think this is a technical limitation so they could enable it pretty quickly given how open they are to provide features requested by their users.

The graph sounds super interesting, and I need to explore it in more depth. It wasn’t immediately obvious to me how to use the ipfs store to store off-chain data in a database-like way. Are we able to do that with their current technology or are you familiar of any dapps in production that are using the Graph for use cases like ours?

Will follow up in PMs to set up a call!

2 Likes

#9

Hey @Schwartz10

Thanks a lot for taking the time to provide such a comprehensive answer! That’s super interesting!

I’m in the middle of an Apiary rush these days but I will definitely take the time to answer more deeply in the coming days. In the meanwhile i’m super open for a call whenever you can.

Talk to you soon [here or in a chat!].

2 Likes

#10

I hope that DAOs will easily host and back-up their data, otherwise they would have to trust a 3rd party! :open_mouth:

Here are my thoughts on this issue, please let me know if my assumptions are correct:

  • As far as app developers are concerned, we can and we must (for security reasons, so apps cannot read each other’s data) abstract away the data storage component. We should expose a Storage API from the wrapper side so that apps don’t worry about it.
  • We are forced to use the blockchain to reference these IPFS hashes, otherwise you cannot prove the date of a discussion, who has permissions to see read/write which files, etc. (Apps built entirely on IPFS will always have to assume a high trust environment)
  • There should be an app where users can easily inspect this data
  • Apps should be able to read each other’s data, if permitted

I think we already have a solution to this problem: Aragon Drive, Aragon Datastore, although I’m not sure (it might need some more iterations).

I kind of see it as:

  • Aragon Drive -> File explorer/Finder/Nautilus (application layer)
  • Aragon Datastore -> the base for the OS layer, which the Wrapper can use to create a Filesystem API
  • IPFS -> Hardware layer

Kinda off-topic, but I made some diagrams in the process:

  • Low-trust environment, using the Aragon Network

  • High-trust environment

A high-trust environment could be a Family DAO or a “Personal DAO”. In this scenario I would store sensitive information that I would not want to be backed-up anywhere else. The nodes could even be disconnected from the internet and only sync when they are on the same location.

A low-trust environment would be a Business DAO where I need a 3rd party provider like the Aragon Network to bail me out in case I get “cheated” on, where money is at stake.
In this scenario the Aragon Network would need to be a validator of the “private” eth node and also a node in the IPFS cluster, because without them the jurors cannot know for sure what has been said and when.

Since the Network will be “forced” to host some data of the customers, I guess we need a way to measure storage, bandwidth, etc. I think it would also not make sense not to provide a general hosting service, since it needs the architecture to do so anyway (plus there’s an extra revenue source).

2 Likes

#11

Hey! Thanks for joining the conversation.

We’re going to be speaking about this on Wednesday the 24th at 10:30 AM ET time. Would you be able to join?

2 Likes

#12

my 2c (as yr asking:)
‘b’ please.

2 Likes

#13

Yes!
I will try to reach out to the espresso team as well.

2 Likes

#14

Hey @Schwartz10,
I’ve been skimming through the thread and it is a very interesting conversation. While I don’t have expertise to add up to the debate, I’m curious to know more and would be happy to join the call as well if possible. Also, If we can find a way to record it and post in this thread afterwards it would be awesome :relaxed:

3 Likes

#15

Hey all! IPFS Cluster dev here.

Reading quickly above, it seems interesting to tell you that IPFS Cluster is soon going to launch what we call “collaborative clusters”. This is essentially a way to run a Cluster where some peers are “trusted peers” (can control what’s on the pinset) and the others are “followers” (pin the things but cannot tell others to pin/unpin anything). This also comes with the flexibility of having peers/followers join or depart any time without this affecting the cluster or having to do anything (as now happens with Raft), and the potential scaling to hundreds of peers. This is actually the final crystallization of the pinning rings usecase linked before.

I don’t have a lot of time to dig into Aragon’s architecture right now but I’ll hang around here to answer any questions about Cluster and to note down any feedback as to how it can be useful. Cheers!

7 Likes