Hey @osarrouy! Thank you for the well thought out responses. Yalda told me to take care of responding.
Lots to talk about here - I feel like it could be productive if we jumped on a call to talk through some of these questions and make an action plan. We could record and post it here for others to see.
I like how you divided things into two use cases:
- Pinning IPFS files for app frontends or even DAOs manifestos, etc whose CID is referenced elsewhere [on-chain].
- Providing a query layer for data whose CID is not indexed on-chain [a database system].
We’re mostly in-sync on the requirements and strategies behind #1 - providing a default pinning IPFS Cluster to Aragon’s users sounds great. My biggest questions are around immediately usable auth strategies.
The way I see it, the Aragon IPFS Service nodes should enforce authentication through an additional front facing daemon
Could you elaborate on this?
AFAIK there is no authentication enforced at the IPFS Cluster level. There is authentication between the cluster nodes through the use of a CLUSTER_SECRET but I don’t think there’s a way to authorize whitelisted users only to pin to the cluster [excepts on an IP address basis].
When configuring the
service.json file of a cluster node, there’s a
restapi key that takes a
basic_auth_credentials object, which takes a
password to authenticate requests to the REST API endpoints. https://github.com/ipfs/ipfs-cluster/issues/621
There’s a great description of running a pinning ring in one of ipfs-cluster’s PRs. I think the biggest challenge is handling each local instance of a DAO. I’m having trouble finding the link, but if I remember correctly, I believe Protocol Labs scaled the cluster up to like 10-12 peers and pinned ~7000 files. Is this a good immediate solution that we can easily scale when needed? How many users would want to connect their own peers?
I also remember reading they are looking for a partner / case study on something like this, which is super interesting.
If it’s just a matter of what IPFS gateway you use [localhost or Aragon IPFS Service] then I guess it should work for Aragon Desktop too. … Once again, it could just be a matter for the end user to pick the right IPFS gateway [and make sure the DAO has paid]
How do you distinguish between a DAO member and a hacker, both running on localhost with the same ports? The browser and desktop seem to pose different challenges - the desktop can run an ipfs-cluster node which could be manually added to the cluster, but how would we handle that in browser through the REST API with users who aren’t running nodes on their machine?
In terms of timing, I see advantages to both starting decentralized from scratch and improving, and building web2.5 solutions that scale to a fully distributed model over time. Might it be beneficial to work in parallel on storage solutions for Pando and Discussions rather than a single solution from the start? My top priority is committing to something I’m confident we can finish by the end of July. It seems like we should take a temperature check of what the community thinks, should we:
(a) use a more centralized layer on top of distributed storage for faster/smarter features (like querying) and reliability (things like Orbit and 3box aren’t production ready)
(b) start completely decentralized and build slower/more expensive (computation, gas fees) features
I tend to lean towards (a) because I think iterating quickly on the application level with user feedback will help inform the decisions we make at the protocol (storage) level. We may not know all the requirements yet. However, I’m also happy to go with (b) if that’s what the community prefers.
Agreed. Although there will likely be required refactoring regardless of what database solution we choose. For example, if you choose to use OrbitDB instead, you run the risk of OrbitDB breaking their APIs (which they’ve warned about doing), or dealing with new bugs. Some of these newer technologies take extra time to become compatible with Aragon apps too due to the environment. Today, Orbit is not compatible with Aragon Apps because of the iframe sandboxing restrictions.
The current proposal seem to go in the second direction. It can be a perfectly acceptable choice but then I don’t see the point of building the database on top of IPFS. The reason is that the only way for users to access content will be to query the database anyhow …… I feel like the proposed architecture is not very different [from a security / centralization perspective] than just providing a regular MongoDB database [which can indeed be a temporary solution].
It’s beneficial and more decentralized to build the database in tandem with IPFS for a few reasons:
- We’re still using content hashes as identifiers to store and resolve content. This makes the refactor process way easier because client identification schema is also based on content addresses and not using arbitrary id’s.
- We could easily extend this architecture to log database snapshots as one big dag in an ethereum event. For example, every hour or so, we could gather all entries in specific tables, hash them together into an IPLD node, and send that data as a transaction. This provides alternative fetching mechanisms and integrates well with The Graph. Tools like Textile Threads will also help us construct and keep track of these dags in the near future.
- From a security perspective, using OrbitDB and/or 3Box is seemingly the most dangerous. Access control in OrbitDB is not very developer friendly and without it, anyone can post to anyone else’s OrbitDB store. It’s also easy to lose private keys with orbitDB once you clear your application cache. There was no account recovery mechanism as of a month or two ago.
The rationale is that the MongoDB layer will send back CID of contents so that users can check the integrity of the data through their hash. But still, as I have no way to actually know what was the CID of my data outside of a MongoDB query, what I am supposed to compare the returned CID to ?
This is true within the very short term, but if we implemented a service like I mentioned in point #2 above, users could compare what they receive from the database with the latest dag node stored on-chain.
An additional layer of verification can be provided by users signing
cids of their discussion posts (which could encompass the time of posting). That way people can verify the messages and their relative orders. On a front-end client, we could implement a friendly UX that shows users which discussion posts are awaiting confirmation on the blockchain.
So if the MongoDB layers goes down, I lose everything.
True, in the very short term. But again, if we logged database snapshots in ethereum events we could defend against this.
The Graph has the ability to fetch - and index - IPFS content. I’m not sure there is a need for this content to be referenced on-chain - maybe there is now but I don’t think this is a technical limitation so they could enable it pretty quickly given how open they are to provide features requested by their users.
The graph sounds super interesting, and I need to explore it in more depth. It wasn’t immediately obvious to me how to use the ipfs store to store off-chain data in a database-like way. Are we able to do that with their current technology or are you familiar of any dapps in production that are using the Graph for use cases like ours?
Will follow up in PMs to set up a call!