Aragon Network IPFS Pinning

While it is evident that all Aragon apps require IPFS pinning for app assets, the following Aragon apps will require an IPFS pinning solution for user-generated content:

  • Pando
  • Projects
  • Home
  • Espresso
  • P2P models Wiki
  • Discussions
  • Profiles

ā€¦and the list will grow. Right now the go-to pinning solution for some of these projects is Infura as they have an open API, but this isnā€™t a sustainable solution. While Aragon One has a self-managed IPFS node, there is no public API for additional Flock, Nest or third-party teams to utilize.

A Potential Solution

Requirements

  • Locally pinned files stay pinned when a user exits their session
  • Files are accessible, preferably with a fast load time
  • Files are verifiable, so that a user can verify the data retrieved is the data asked for
  • The architecture can handle outages - if an IPFS node goes down, we should be able to continue giving access to our files
  • Limit security vulnerabilities and expenses - only approved clients should be able to pin data on our node
  • Data adheres to open standards

An initial architecture could look like:

Client applications send requests to our backend server, where information gets pinned across multiple providers. Our backend validates the data adheres to specific open standards (schema.org to begin with) when applicable and then pins the files/data.

Fallback Procedures If Our Node Goes Down

Itā€™s conceivable weā€™ll have some bumps along the road running our own node. We need to be able to handle those situations:

  1. Active monitoring services like http://pm2.keymetrics.io/ to ensure our node is up and running
  2. A queueing system to track unpinned files while nodes are down, and re-pin files once node is back online
  3. Modular infrastructure - the IPFS node will be hosted on its own (isolated from the REST API), so that the node going down will not cause the REST API to go down

Authentication

Initially, we canā€™t protect against every attack, so weā€™ll attempt to deter them as much as possible:

  1. Require JWT authentication with an ethereum signature, so we can block any single ethereum address from pinning too many files at once
  2. A developer must obtain a developer token from us to pin data on our node (this idea needs to be fleshed out more)
  3. Possibly whitelisting domains and/or IP addresses (although this would make it hard to handle DAOā€™s running locally on the browser or on a desktop app).
  4. Strict schema validation - we will only pin data that adheres to verified open standards when applicable. We wonā€™t pin data that does not fit the schema validation. (maybe there are some caveats here, so people can use our pinning service but just not our node). The community can submit PRs to include new MongoDB schema types, which will automatically be checked for validation and added to possible pin types. (i.e. - an Aragon Network team can submit a new schema for a type of data they want pinned. Once approved, the backend will accept this schema and pin data that fit this schema appropriately).

Querying / Verifiability

Itā€™s conceivable that applications will want to query specific terms or vocabularies within metadata and files. Our backend system can offer that reliably by caching files in MongoDB without jeopardizing the validity of the information.

Internally, MongoDB documents will use CID as their ID schema, so that content requests made by clients can be based on specific hashes. The clients can therefore verify the content by either re-computing these hashes, or fetching the same hash from the IPFS network and comparing results.

IPFS-Cluster

It seems like IPFS Cluster could be used in two productive ways:

  1. Open Work Labs/Autark runs a node cluster instead of a single node

Benefits:

  • Allows us to run more nodes and add extra pinning space (~7000 entries in pinset?)
  • Provides fault tolerance - if we replicate files across more than one node in the cluster, pinned files can stay available if/when a single node in the cluster goes down
  • Easier to scale as the network grows

Questions:

  • What additional expenses get incurred (both economically and computationally) when running a cluster versus a single node?
  • Can you turn a previously running node into a new cluster administrator without it going down?
  • Are there any additional devops complications associated with running a cluster? More things to break? Harder to monitor? Link
  • How hard is it to get up and running? Running cluster on Kubernetes, Local cluster setup and pinning example
  • What happens if the clusterā€™s leader goes down? (presumably the rest of the cluster stays alive perfectly fine)
  • How difficult would it be to scale from running our own cluster (and each individual node in the cluster) into option number 2 (below)ā€¦
  • How could we remove pinned data from the cluster that was pinned by a node whose access was revoked?
  1. Aragon community uses IPFS cluster as a pinning ring to allow Nest and Flock teams to collaboratively pin files together

Benefits:

  • The community has assurance that important files are pinned - without waiting for filecoin, paying for external services, or relying too heavily on centralized services
  • More teams participating and strengthening the IPFS network!

Questions:

  • (All the qā€™s above apply here too)
  • Who would be the cluster administrator? How hard would this responsibility be?
  • What additional security concerns need to be thought out?
  • How would we share secret keys with new partners?
  • How would we develop trust models among peers in the cluster? How would we revoke trust?
  • How large can we scale a cluster today? (this is currently not stress tested much)
  • How hard would it be to build services on top of the cluster (for example, database querying with mongo. More on this below)

Future

Allowing other external node providers to get involved in our cluster would also provide some complications when it comes to managing the MongoDB - we would have to listen to changes to the cluster pinset that werenā€™t emitted by our backend, and then update our db. Conversely, if a GET request is received for a file thatā€™s not in our mongoDB, we could resolve the content through the network, and then store it in mongo.

Textile Cafes also provide a lot of great offline services for pinning nodes. The problems theyā€™ve solved in the mobile world have similarities to that of DAOā€™s - local DAOā€™s signing on and offline // pinning files locally and backing them up on an ā€œalways onā€ pinning node. If each DAO ran a textile node, and we hosted a cluster of textile cafes, itā€™s likely that we would have a very robust and distributed architecture. This would take a significant amount of dev work so something to consider for the future. In the meantime weā€™ll see if we can use cafes on their own as a better pinning node.

Hereā€™s what an architecture with Textile could look like:

Filecoin

We are very interested in building a filecoin integration in the future. A backend server to manage filecoin seems like a great place to start because it abstracts a lot of the UX and technical complications away from Aragon clients - no key pair management, no buy/ask spreadsā€¦etc. Itā€™s conceivable that a user friendly initial solution is to build a meta-transaction layer on top of filecoin - where DAOs can pay for filecoin storage and retrieval through our backend layer, where most of the complexities are handled by us.

A DAO app that manages filecoin payments in a fully distributed manner seems like an interesting future idea too. It makes sense to look into integrations between ethereum smart contracts and the filecoin blockchain, to see if any additional requirements would be burdened on the user (like a Filecoin metamask requirement for example). Something definitely worth exploring.

Appendix with more information and research

Reference links

3 Likes

Very interesting suggestion Jon. Can you elaborate further on the role of MongoDB in the suggested architecture?

Introducing another stateful service such as MongoDB in addition to IPFS increases operational complexity and along with that the risk. Additionally MongoDB is by design a centralised solution. If it ever becomes a source of truth, weā€™ve defeated the goal of decentralisation by design. I havenā€™t worked with IPFS Cluster, but it appears to solve some of the requirements mentioned.

I have a couple of high level questions:

  1. Both caching and pinning are mentioned. Could you clarify a bit what you mean? For example If an IPFS node is guaranteed to pin a file, would you ever add another ā€œcachingā€ layer on top? If so, what type of caching would that be? an in-memory to save a trip to the file system? Thatā€™s something that MongoDB wonā€™t solve.
  2. What would the API allow querying? Does it include the actual contents of the pinned files?

Perhaps we could decouple the two goals/requirements at play, namely:

  • Ensuring high availability and safe replication of user content in a way that:
    • Distributes the storage/computational burden
    • Allows for network growth, e.g. Aragon Black can easily
  • Provide an aggregation/caching/querying layer to IPFS content. This seems along the lines of The Graph.

What additional expenses get incurred (both economically and computationally) when running a cluster versus a single node?

I think the most salient variable is the amount of data/content replicated. Running multiple IPFS nodes is ā€œcheapā€, barring the need for separate nodes in order to ensure high availability (protecting from node down time).

How hard is it to get up and running?

Any self operated stateful distributed system requires specialised knowledge to run and operate. These days with Kubernetes itā€™s much easier. This applies to running a highly available Mongo replica set too.

A pinning ring seems very close to our collective needs barring the second requirement (aggregation/caching).

Who would be the cluster administrator? How hard would this responsibility be?

Itā€™d be interesting if we had multiple rings for each team to manage while allowing any IPFS node to join as many rings. Not sure if thatā€™s currently possible technically but itā€™s an interesting idea. By analogy: every team publishes a torrent feed and who ever has mutual interests replicates/seeds those torrents.

4 Likes

Hey !

Happy to see the IPFS pinning issue discussed :slight_smile: Here are a couple of remarks / questions.

EDIT. I didnā€™t see @Daniel post before I write that post. There is some redundancy but I leave my post here as it is as a +1 to Danielā€™s comments :slight_smile:

IPFS Cluster

I think the use of an IPFS Cluster is a must-go here. This is something we have already researched a little on our side - mostly for Pando - and we were on our way to deploy a IPFS Cluster node and propose A1 / Autark / AA to join forces :slight_smile: We can coordinate on this!

Authentication

This is something which has already been discussed during the All Devs Call so I lay it out there publicly for discussion purposes. It would be great to have the ā€˜subscriptionā€™ to this service handle at the DAO level. This would allow DAOs to pay fees to have their files pinned. Iā€™m not sure such is a system is that easy to design, though. It would require a standard DAO-membership system to check whether an Ethereum key is a member of a DAO ā€¦

Server API

My main concern in the current state of the proposal is the use of a custom API. I feel like the Aragon IPFS service should be 100% compatible with the IPFS API. This way switching the service on / off would be as simple as modifying the gateway URL in the Aragon Client.

Also: I donā€™t see the rationale of using MongoDB here. I feel like it introduces a very centralized additionnal layer ā€¦ Moreover I remember there are plans in the Aragon roadmap to provide Graph Protocol as caching / querying layer. Thus, I think the query layer of the IPFS service could somehow rely on Graph Protocol to avoid relying on tons of different API.

Client API

For now, each Aragon app has to deal with its own IPFS settings. This leads to a lot of redundancy both in terms of local storage and of library import: itā€™s a non-necessary overhead to re-import the js-ipfs-api tens of times :slight_smile: Maybe the IPFS API could be exposed directly at the wrapper / app level. This way, user would just have to configure their IPFS settings once - at the client level, as it is now - and all apps would be up-to-date.

5 Likes

Hey @daniel and @osarrouy - thank you for the feedback!

Before diving into any specifics, I just want to point out a few high level points.

First, this spec is framed as a near-term solution (achievable within the next 1-3 months) and is intentionally a partially distributed (aka ā€œweb2.5ā€) solution. The spec would probably look much different if a pinning solution was the only project our team was working on, and/or time was less of a factor :).

Second, @daniel your multiple pinning ring idea and @osarrouy your pay to pin files idea are both awesome!! I see these as longer term strategies because they are a bit more involved. Along the way I donā€™t feel like itā€™s bad to add some centralized components and remove them over time as new technologies like Filecoin and Textile mature. This allows us to ship apps quicker, provide greater user / developer experiences, and aim for increased adoption.

Onto the specifics:

IPFS cluster

Iā€™m hazy on how IPFS cluster authentication could integrate with the Aragon architecture. There isnā€™t too much written on IPFS cluster auth, and Iā€™m not sure of the requirements on the Aragon side of things.

Some qā€™s:

  1. Are pinning solutions determined at the DAO or Aragon application level? For example - should the developers of the discussions application be responsible for figuring out a pinning solution for their application? Or should the owner of a DAO that uses the discussion app be responsible for figuring out the pinning solution for it? Or a combination of both? From todayā€™s call, it seemed desirable to give users the option to choose their own provider.
  2. What additional challenges stem from users running a DAO locally? Should they too be able to pin data to the cluster?
  3. What steps are required in order to permit the next user, application, or DAO to pin data to the cluster? If this involves initializing a new node, is that ideal for each entity? Is there a simpler solution?

Relevant links:

https://cluster.ipfs.io/documentation/configuration/#the-api-section

The REST API is easier to build (in just a day or two I got most of the file and dag pinning stuff working) and provides more flexible and familiar (to me) authentication schemes. I feel much more confident in getting this work done within our grant timeline. Most importantly however, it preserves some super important benefits of the distributed web:

  • All data is content addressable - clients can easily verify the data they pinned and fetched hasnā€™t been tampered with by rehashing content or requesting it from another gateway.
  • Multiple node providers can be used to ensure high availability and safe replication of user content - we can even use nodes like Infura that arenā€™t part of a cluster.
MongoDB

Sorry for any confusion - I couldā€™ve been more clear. The idea behind using mongo as a caching or (more specifically) querying layer is more relevant to my work with Autark on the discussions app.

Users should be able to query pinned content by parameters other than its hash, which requires database functionality. I assumed that this type of metadata querying would be useful for other apps as well.

For specifics on how we would use it, when we receive a request to pin JSON data (the dag/put endpoint), we:

  • Make sure the metadata is formed properly
  • Compute the hash without pinning
  • Find or create an entry in the database (where the appropriate fields are indexed and the content hash is the mongoDB identifier used to access this JSON blob)
  • Pin the blob
  • Send back the cid to the client

Still hacking on this so pardon any messy code or patterns :slight_smile:

endpoint for dag/put
handler for image metadata

As an example, a client would use this endpoint for pinning their 3box profile picture metadata or a photo in a discussion post. Thereā€™s no redis caching or anything performance related (apart from being able to send back pinned data if our node goes down).

If [mongo] ever becomes a source of truth, weā€™ve defeated the goal of decentralisation by design

Agreed. Although by removing any arbitrary ID schema in mongo and relying on content identifiers, I think the ā€œglobal source of truthā€ is the content itself. Every piece of content that you receive, regardless of where it came from, can be hashed and/or requested directly from the IPFS network. Since the data will be pinned, any IPFS node (including local nodes on your computer) can retrieve it (although it could take some time), meaning it never has to come from our backend.

The Graph seems most helpful for querying on-chain data which requires content hashes (and even content itself that you wish to query by) to be stored on chain. Specifically talking about Discussions, it doesnā€™t seem like this is the right solution because each discussion post would have to be an ethereum event. Of course, there are other solutions that would require some more complex ethereum scaling logic, but that seems like overkill to get a discussion app out the door quickly.

2 Likes

Hey @Schwartz10,

Thanks for the answer! Here are my reactions.

I canā€™t tell for Texture but when it comes to Filecoin I think the ability to pay for storage from / trough an Ethereum contract is very far down the road. So I wouldnā€™t expect too much on that side in the mid-term :slight_smile:

My personal take is that I find it tricky to start with a centralized solution and expect to decentralize it latter [because then you have to update / re-factor every components of your stack relying on it and it ends up being a mess :slight_smile: ]. I would personally prefer starting from scratch with a decentralized [and eventually less powerful] solution that we could improve over time: itā€™s always easier to add features than to remove / refactor features :slight_smile:

But I donā€™t know what the rest of the community think.

AFAIK there is no authentication enforced at the IPFS Cluster level. There is authentication between the cluster nodes through the use of a CLUSTER_SECRET but I donā€™t think thereā€™s a way to authorize whitelisted users only to pin to the cluster [excepts on an IP address basis].

An IPFS Cluster endpoint is a daemon mimicking the IPFS API but enforcing a replication strategy before it forwards requests to an IPFS node:

User ā†’ IPFS Cluster [replication strategy] ā†’ IPFS Node

The way I see it, the Aragon IPFS Service nodes should enforce authentication through an additional front facing daemon fowarding the request [if authentication succeeds] to the IPFS Cluster [itself forwarding it to the IPFS node].

User ā†’ Aragon IPFS Node [authentication strategy] ā†’ IPFS Cluster [replication strategy] ā†’ IPFS Node

If the Aragon IPFS Node exposes exactly the same interface than regular IPFS Nodes [with additional optional parameters like the ability to send an ETH signed message handled by the Aragon Client in the HTTP header] then the process can be absolutely transparent for both users and developers.

I think this should be handled at the DAO level: you pay or you donā€™t. Then itā€™s up to the end user to pick the right IPFS gateway in its settings tab. If he chooses to use the Aragon IPFS service gateway, and the DAO has paid, heā€™s fine. If he wanna choose a personnal gateway and make sure everything is pinned by its own means, thatā€™s his problem :slight_smile:

If itā€™s just a matter of what IPFS gateway you use [localhost or Aragon IPFS Service] then I guess it should work for Aragon Desktop too.

Once again, it could just be a matter for the end user to pick the right IPFS gateway [and make sure the DAO has paid]

TheGraph has the ability to fetch - and index - IPFS content. Iā€™m not sure there is a need for this content to be referenced on-chain - maybe there is now but I donā€™t think this is a technical limitation so they could enable it pretty quickly given how open they are to provide features requested by their users.

I think we could pretty easily use PubSub channel to push new data to be indexed. The only problem there could be here would be about the ordering of data given than PubSub itself does not provide CRDT [OrbitDB, built on top of pubsub, does though].

4 Likes

I donā€™t think 2-3 Aragon Network companies running purely IPFS pinning nodes is any different than 2-3 Aragon Network companies running IPFS pinning nodes with an additional mongo (or db of choice layer). The fact that if the organizations that run pinning nodes go offline that data can be lost, means that both are centralized solutions.

Additionally, I do not think updating a couple lines of javascript to point to a different endpoint is something one would call refactoring (even if that is a requirement). I guess it really matters how the end points are initially developed. (I thinkā€¦?) there can be ways to implement such solutions where it isnā€™t hard coded into the system, and it can be something that the DAO turns on or off. For example if your gateway is ā€œhttps://ultrapinning.pin.me/ipfs/ā€ the gateway knows you have requested the content to be pinned, as it is a centralized domain. Hence it can pin it to IPFS and do whatever else it wants with the data.

I donā€™t think Aragon app frontends need to necessarily serve content from such systems, but I do see the importance of having such systems especially for cases like issue management, discussion, etc until the tech that is in R&D takes off.

I actually think this will be interesting to A/B test ā€“ release products that have an on/off switch for the mongo (or SQL or whatever service) in addition to the IPFS and see what people choose, and see how much of a difference there is in performance. My hypothesis is that people will choose the solution that provides them with the more optimal user experience. I donā€™t think there is any near-term threat of data that is being both stored in IPFS and another storage system is at risk of being seized (or why that is necessarily considered more centralized than purely IPFS systems).

In reading more about the Graph, it seems they achieve that by Graph nodes needing to run postgres. Additionally, if you donā€™t have nodes hosting your subgraph, well I donā€™t think it really works. At the end of the day, you have to eventually pay (or rely) on some people or organizations to host / index / query your data.

This press release is also interesting:

ā€œThe company plans to offer both centralized and decentralized services in 2019.ā€

"Today, The Graph is publishing the spec for the hybrid decentralized protocol. This version allows anyone to run a node to provide indexing and query processing services to the network in exchange for payment,ā€

Fully decentralized capital allocation is so much more important in what we are building in Aragon. In the end: I think accelerating decentralization of non-financial systems is a goal to strive towards (and not an easy reality today), and being more open minded to hybrid solutions will get us there faster (whether itā€™s on an app-by-app basis, or an org-by-org basis).

2 Likes

Yes. But if the system relies on an IPFS-compatible API users can always switch to another IPFS node / pinning service. If the system relies on a MongoDB-specific API - as the proposal stand for if I understood well - then users canā€™t do much if Aragonā€™s nodes go down because their app wonā€™t even be compatible with the IPFS API.

This is only a couple of lines of javascript if the MongoDB-specific API is compatible with the IPFS API. Otherwise when we want to switch back from the MongoDB REST API to a more decentralized IPFS API we have to rewrite all the fetching / querying logic ā€¦

I agree that are two use cases here:

  1. Pinning IPFS files for app frontends or even DAOs manifestos, etc whose CID is referenced elsewhere [on-chain].
  2. Providing a query layer for data whose CID is not indexed on-chain [a database system].

The first problem can be solved easily by providing a default pinning IPFS Cluster to Aragonā€™s users. The second one is trickier because there exists no perfect off-the-shelf solution for a solid decentralized database system. My point is that there are two path regarding this indexation problem.

  1. We do use an [unperfect] IPFS-based solution like OrbitDB and try to improve it over time as Colony is doing.
  2. We decide to give-up on that for now because itā€™s too complicated / time consuming [it may be the case] and rely on a centralized database system.

The current proposal seem to go in the second direction. It can be a perfectly a acceptable choice but then I donā€™t see the point of building the database on top of IPFS. The reason is that the only way for users to access content will be to query the database anyhow, because there will be no other way for them to know what the CID of the content they wanna browse is.

Letā€™s take an example. If, as a user, I wanna read the content of discussion #34 in my app: how will I know to which CID the discussion refers to ? I canā€™t except by querying the MongoDB layer. So if the MongoDB layers goes down, I loose everything. Maybe the actual content of this discussion will still be pinned somewhere but I will never know what its CID is and how to access it ā€¦

The rationale is that the MongoDB layer will send back CID of contents so that users can check the integrity of the data through their hash. But still, as I have no way to actually know what was the CID of my data outside of a MongoDB query, what I am supposed to compare the returned CID to ?

Iā€™m asking all this questions to try to clarify the path we would take there:

If itā€™s just to pin files users already know the CID of because they are referenced on chain then I donā€™t think we need a database layer [the pinning layer will be enough]. If itā€™s to provide indexation / search features to non-reference data then in the end I feel like the proposed architecture is not very different [from a security / centralization perspective] than just providing a regular MongoDB database [which can indeed be a temporary solution].

For now you can both run your own node or rely on their hosted services to do so. The good news is that they are working on a consensus system to run these nodes in a decentralized way at some point. I feel like this could be an acceptable solution because it would allow us to build on top of an already existing API whose underlying logic is supposed to improve [from a decentralization perspective] over time. So it would be like delegating the tricky decentralization R&D to The Graph :slight_smile: Of course their would always be a risk that they do not deliver their promise and we get stuck ā€¦

3 Likes

Hey @osarrouy! Thank you for the well thought out responses. Yalda told me to take care of responding.

Lots to talk about here - I feel like it could be productive if we jumped on a call to talk through some of these questions and make an action plan. We could record and post it here for others to see.

I like how you divided things into two use cases:

  1. Pinning IPFS files for app frontends or even DAOs manifestos, etc whose CID is referenced elsewhere [on-chain].
  2. Providing a query layer for data whose CID is not indexed on-chain [a database system].

Weā€™re mostly in-sync on the requirements and strategies behind #1 - providing a default pinning IPFS Cluster to Aragonā€™s users sounds great. My biggest questions are around immediately usable auth strategies.

The way I see it, the Aragon IPFS Service nodes should enforce authentication through an additional front facing daemon

Could you elaborate on this?

AFAIK there is no authentication enforced at the IPFS Cluster level. There is authentication between the cluster nodes through the use of a CLUSTER_SECRET but I donā€™t think thereā€™s a way to authorize whitelisted users only to pin to the cluster [excepts on an IP address basis].

When configuring the service.json file of a cluster node, thereā€™s a restapi key that takes a basic_auth_credentials object, which takes a username and password to authenticate requests to the REST API endpoints. REST API: fine-grained authorization Ā· Issue #621 Ā· ipfs-cluster/ipfs-cluster Ā· GitHub

Thereā€™s a great description of running a pinning ring in one of ipfs-clusterā€™s PRs. I think the biggest challenge is handling each local instance of a DAO. Iā€™m having trouble finding the link, but if I remember correctly, I believe Protocol Labs scaled the cluster up to like 10-12 peers and pinned ~7000 files. Is this a good immediate solution that we can easily scale when needed? How many users would want to connect their own peers?

I also remember reading they are looking for a partner / case study on something like this, which is super interesting.

If itā€™s just a matter of what IPFS gateway you use [localhost or Aragon IPFS Service] then I guess it should work for Aragon Desktop too. ā€¦ Once again, it could just be a matter for the end user to pick the right IPFS gateway [and make sure the DAO has paid]

How do you distinguish between a DAO member and a hacker, both running on localhost with the same ports? The browser and desktop seem to pose different challenges - the desktop can run an ipfs-cluster node which could be manually added to the cluster, but how would we handle that in browser through the REST API with users who arenā€™t running nodes on their machine?

In terms of timing, I see advantages to both starting decentralized from scratch and improving, and building web2.5 solutions that scale to a fully distributed model over time. Might it be beneficial to work in parallel on storage solutions for Pando and Discussions rather than a single solution from the start? My top priority is committing to something Iā€™m confident we can finish by the end of July. It seems like we should take a temperature check of what the community thinks, should we:

(a) use a more centralized layer on top of distributed storage for faster/smarter features (like querying) and reliability (things like Orbit and 3box arenā€™t production ready)

or

(b) start completely decentralized and build slower/more expensive (computation, gas fees) features

I tend to lean towards (a) because I think iterating quickly on the application level with user feedback will help inform the decisions we make at the protocol (storage) level. We may not know all the requirements yet. However, Iā€™m also happy to go with (b) if thatā€™s what the community prefers.

This is only a couple of lines of javascript if the MongoDB-specific API is compatible with the IPFS API. Otherwise when we want to switch back from the MongoDB REST API to a more decentralized IPFS API we have to rewrite all the fetching / querying logic ā€¦

Agreed. Although there will likely be required refactoring regardless of what database solution we choose. For example, if you choose to use OrbitDB instead, you run the risk of OrbitDB breaking their APIs (which theyā€™ve warned about doing), or dealing with new bugs. Some of these newer technologies take extra time to become compatible with Aragon apps too due to the environment. Today, Orbit is not compatible with Aragon Apps because of the iframe sandboxing restrictions.

The current proposal seem to go in the second direction. It can be a perfectly acceptable choice but then I donā€™t see the point of building the database on top of IPFS. The reason is that the only way for users to access content will be to query the database anyhow ā€¦ā€¦ I feel like the proposed architecture is not very different [from a security / centralization perspective] than just providing a regular MongoDB database [which can indeed be a temporary solution].

Itā€™s beneficial and more decentralized to build the database in tandem with IPFS for a few reasons:

  1. Weā€™re still using content hashes as identifiers to store and resolve content. This makes the refactor process way easier because client identification schema is also based on content addresses and not using arbitrary idā€™s.
  2. We could easily extend this architecture to log database snapshots as one big dag in an ethereum event. For example, every hour or so, we could gather all entries in specific tables, hash them together into an IPLD node, and send that data as a transaction. This provides alternative fetching mechanisms and integrates well with The Graph. Tools like Textile Threads will also help us construct and keep track of these dags in the near future.
  3. From a security perspective, using OrbitDB and/or 3Box is seemingly the most dangerous. Access control in OrbitDB is not very developer friendly and without it, anyone can post to anyone elseā€™s OrbitDB store. Itā€™s also easy to lose private keys with orbitDB once you clear your application cache. There was no account recovery mechanism as of a month or two ago.

The rationale is that the MongoDB layer will send back CID of contents so that users can check the integrity of the data through their hash. But still, as I have no way to actually know what was the CID of my data outside of a MongoDB query, what I am supposed to compare the returned CID to ?

This is true within the very short term, but if we implemented a service like I mentioned in point #2 above, users could compare what they receive from the database with the latest dag node stored on-chain.

An additional layer of verification can be provided by users signing cids of their discussion posts (which could encompass the time of posting). That way people can verify the messages and their relative orders. On a front-end client, we could implement a friendly UX that shows users which discussion posts are awaiting confirmation on the blockchain.

So if the MongoDB layers goes down, I lose everything.

True, in the very short term. But again, if we logged database snapshots in ethereum events we could defend against this.

The Graph has the ability to fetch - and index - IPFS content. Iā€™m not sure there is a need for this content to be referenced on-chain - maybe there is now but I donā€™t think this is a technical limitation so they could enable it pretty quickly given how open they are to provide features requested by their users.

The graph sounds super interesting, and I need to explore it in more depth. It wasnā€™t immediately obvious to me how to use the ipfs store to store off-chain data in a database-like way. Are we able to do that with their current technology or are you familiar of any dapps in production that are using the Graph for use cases like ours?

Will follow up in PMs to set up a call!

3 Likes

Hey @Schwartz10

Thanks a lot for taking the time to provide such a comprehensive answer! Thatā€™s super interesting!

Iā€™m in the middle of an Apiary rush these days but I will definitely take the time to answer more deeply in the coming days. In the meanwhile iā€™m super open for a call whenever you can.

Talk to you soon [here or in a chat!].

3 Likes

I hope that DAOs will easily host and back-up their data, otherwise they would have to trust a 3rd party! :open_mouth:

Here are my thoughts on this issue, please let me know if my assumptions are correct:

  • As far as app developers are concerned, we can and we must (for security reasons, so apps cannot read each otherā€™s data) abstract away the data storage component. We should expose a Storage API from the wrapper side so that apps donā€™t worry about it.
  • We are forced to use the blockchain to reference these IPFS hashes, otherwise you cannot prove the date of a discussion, who has permissions to see read/write which files, etc. (Apps built entirely on IPFS will always have to assume a high trust environment)
  • There should be an app where users can easily inspect this data
  • Apps should be able to read each otherā€™s data, if permitted

I think we already have a solution to this problem: Aragon Drive, Aragon Datastore, although Iā€™m not sure (it might need some more iterations).

I kind of see it as:

  • Aragon Drive ā†’ File explorer/Finder/Nautilus (application layer)
  • Aragon Datastore ā†’ the base for the OS layer, which the Wrapper can use to create a Filesystem API
  • IPFS ā†’ Hardware layer

Kinda off-topic, but I made some diagrams in the process:

  • Low-trust environment, using the Aragon Network

  • High-trust environment

A high-trust environment could be a Family DAO or a ā€œPersonal DAOā€. In this scenario I would store sensitive information that I would not want to be backed-up anywhere else. The nodes could even be disconnected from the internet and only sync when they are on the same location.

A low-trust environment would be a Business DAO where I need a 3rd party provider like the Aragon Network to bail me out in case I get ā€œcheatedā€ on, where money is at stake.
In this scenario the Aragon Network would need to be a validator of the ā€œprivateā€ eth node and also a node in the IPFS cluster, because without them the jurors cannot know for sure what has been said and when.

Since the Network will be ā€œforcedā€ to host some data of the customers, I guess we need a way to measure storage, bandwidth, etc. I think it would also not make sense not to provide a general hosting service, since it needs the architecture to do so anyway (plus thereā€™s an extra revenue source).

3 Likes

Hey! Thanks for joining the conversation.

Weā€™re going to be speaking about this on Wednesday the 24th at 10:30 AM ET time. Would you be able to join?

2 Likes

my 2c (as yr asking:)
ā€˜bā€™ please.

2 Likes

Yes!
I will try to reach out to the espresso team as well.

2 Likes

Hey @Schwartz10,
Iā€™ve been skimming through the thread and it is a very interesting conversation. While I donā€™t have expertise to add up to the debate, Iā€™m curious to know more and would be happy to join the call as well if possible. Also, If we can find a way to record it and post in this thread afterwards it would be awesome :relaxed:

3 Likes

Hey all! IPFS Cluster dev here.

Reading quickly above, it seems interesting to tell you that IPFS Cluster is soon going to launch what we call ā€œcollaborative clustersā€. This is essentially a way to run a Cluster where some peers are ā€œtrusted peersā€ (can control whatā€™s on the pinset) and the others are ā€œfollowersā€ (pin the things but cannot tell others to pin/unpin anything). This also comes with the flexibility of having peers/followers join or depart any time without this affecting the cluster or having to do anything (as now happens with Raft), and the potential scaling to hundreds of peers. This is actually the final crystallization of the pinning rings usecase linked before.

I donā€™t have a lot of time to dig into Aragonā€™s architecture right now but Iā€™ll hang around here to answer any questions about Cluster and to note down any feedback as to how it can be useful. Cheers!

10 Likes

Happy Monday everyone! Looking forward to our call this Wednesday at 10:30AM ET - anyone is welcome! We can use this link: https://meet.jit.si/Autark, and I will record the call so others can watch. I set up a preliminary agenda here, but we can obviously adjust and add things if people want. It would be ideal to leave this meeting with concrete next steps to take, so we can keep things moving.

@danielconstantin - Aragon Drive and Aragon Datastore look awesome!! These seem very usable for storing documents about a DAO like the manifesto, code of conductā€¦etc, but I still want to dig a little deeper. A couple points we could discuss in the call:

we must (for security reasons, so apps cannot read each otherā€™s data) abstract away the data storage component.

Do you feel this way about inherently public information? Does all information need to be private?

We are forced to use the blockchain to reference these IPFS hashes, otherwise you cannot prove the date of a discussion, who has permissions to see read/write which files, etc

Iā€™m wondering if there are other ways to do this - because in an extreme example (like a chat app) users shouldnā€™t have to continuously pay for transactions to post. Have you looked into solutions like Textile threads and/or 3box p2p communication protocol? Where do these solutions fall short on their own? How could we utilize a blockchain at a minimum to achieve the same desired security (for example, every half hour we could log the HEAD of a thread or orbitDB database in an ethereum event). I think there are opportunities to get creative, achieve a favorable level of security, and provide great user experiences.

Please reach out to the espresso team as well to join the call!

@Julian - thank you for weighing in. I think it would be a great idea to take a temperature check or ā€œinformal voteā€ on the upcoming call to see where people stand on this. If the community prefers (b), I would feel really good about using 3boxā€™s p2p communication protocol to start. Michael Sena, one of their team members, told me he feels 3box makes up for a lot of orbitDBā€™s deficiencies and thinks the combination of the two are stable and secure. They seem eager to help out on this initiative too.

@hector, appreciate you getting involved! Creating a collaborative cluster sounds like exactly what we need. Do you have a timeline on this feature launch? I would vote for us to get started on this asap, and itā€™s (as of now) the first topic of our call on Wednesday. You are more than welcome to join, but if not, we will hopefully produce some questions for you and can start a new forum thread to discuss. Super excited to see this in action!

2 Likes

Iā€™m tempted to say 2 weeks, but Iā€™ll say, realistically 4 weeks, until this is part of a tagged release. We will consider this experimental at the beginning and will have to figure out some UX, but the bigger parts of it are merged already.

Iā€™ll try to come to your meeting!

1 Like

Sounds good - certainly 3box seems a popular solution to many current conundrums it seems :grinning:
Iā€™d take the number of likes for possible solutions posted on here as a solid straw poll.

Here, take one for the above :yum:

1 Like

Nice, Textile currently has nicely flushed out IPFS nodes + added app utilities for collaborative pinning, developer tokens, encryption, rest based decrypting gateways, and more goodies that could make MVP dev really fast. A basic overview here, https://docs.textile.io/concepts/cafes/

Iā€™d be happy to join the call if you think there will be any questions about how Textile cafes work or what they solve.

1 Like