Proposal - Monitoring Tool to Help Identify and Mitigate Recirculating GIVbacks

The Proposal

TrueBlocks proposes assisting Giveth in its ongoing mission of building “a culture of giving” by helping establish a system for monitoring the platform against misbehavior. Specifically, we envision a short-term project whose primary goal is to automate the identification of participants who may be recirculating GIVbacks.

TrueBlocks is a local-first blockchain indexer, data scraper, and account explorer providing “user-centered” access to blockchain data. Our tools allow users to filter, transform, and customize on-chain data enabling monitoring and accounting for individual addresses as well as whole ecosystems.

As an example of our work, we recently built an unaffiliated data pouch (our term) that facilitates the analysis of on-chain Gitcoin data in support of its FDD (Fraud Detection and Defense) working group. While we understand that Giveth’s requirements are different, we believe the work we’ve done for GitCoin can be leveraged to address Giveth’s needs as well. To better understand the GitCoin work, please see this website: http://tokenomics.io/gitcoin.

Proposed Project

We propose a short-term (8-10 week long) project to assist Giveth in its efforts to detect and deter fraudulent or inappropriate behavior. The primary goal of the project is to experiment with automating the monitoring and tracking of donated funds in order to assure that only “first touch” donations receive GIVbacks.

We propose that the project consists of three steps:

  1. Define the “Recirculation” Problem and Identify Potential Methods of Detecting It and Establish Expectations:
  • At project inception, work with appropriate Giveth team members to define the problem more clearly and to identify ways to programmatically address the issue.
  • We envision working with the team to complete the attached document which we’ve begun but is in very rough form.
  • This first step:
    • serves to determine what, if any, modifications need to be made to TrueBlocks;
    • helps establish appropriate expectations (given the fact that a perfectly generalized solution to the “fraud detection” problem is likely impossible).
  • We anticipate this process taking 3-5 hours, including a few calls as well as some preparatory work prior to each call.
  • There would be no charge for completing this first part.
  1. Complete the Port of TrueBlocks to the Gnosis Chain:
  • TrueBlocks currently works with Ethereum Mainnet. In the past month, we’ve begun the process of adding support for other chains (Gnosis chain in particular). To ensure a complete set of data from the Giveth ecosystem (which runs on both Mainnet and Gnosis), we would need this work to be completed and fully tested.
  • In the spirit of full disclosure, we plan on adding the multi-chain feature regardless of this proposal, however, if this proposal is accepted, we will prioritize this work.
  • We anticipate this work to take a few weeks to complete.
  1. Provide Solutions to the Recirculation Problem Identified in Step 1 using Mainnet and Gnosis Chain Data from Step 2:
  • Collect relevant verified Giveth addresses.
  • Modify the existing GitCoin data pouch (mentioned above) to accommodate Giveth’s two sources of chain data resulting in a site similar to the linked-to site for GitCoin. This “monitoring system” would reside temporarily at https://tokenomics.io/giveth.
  • Extend the data pouch scraper to export data to Giveth’s database/data pipeline. This might include something we call a Dynamic Traverser (explained here) which would allow for customizations as decided in step 1.
  • Help define methods for Giveth to bring the above function in-house or find a dedicated node endpoint for ongoing monitoring.
  • Iterate with the Giveth team as necessary to ensure all requirements are met.
  • We anticipate this work to take an additional 3 to 4 weeks.

Proposed Fee

Assuming we complete the steps as outlined above, we would respectfully request $30,000 US dollars worth of xDAI or equivalent GIV tokens. (That’s $3,740 per week.) If either or both parties decide not to proceed after Step 1, there will be no charge.

Team

The team dedicated to this project are:

Dawid Szlachta is TrueBlocks’ lead developer. Before joining TrueBlocks, Szlachta spent eight years sharpening his skills on various large-scale web applications. He holds a bachelor’s degree in Philology from the University of Warsaw and lives with his wife in Kraków, Poland.

Thomas Jay Rush is the lead software architect at TrueBlocks and CEO whose favorite thing to do, apparently, is falling down rabbit holes. Variously an oil-well roustabout, a computer science researcher at IBM’s Thomas Watson Research 3-D Computer Graphics Lab, an early Internet Entrepreneur, a poet, a furniture designer, and a crypto enthusiast, Jay holds a Master of Fine Arts, Poetry from Rosemont College and a Master of Science in Computer Science from the University of Pennsylvania. He lives with his wife and children in Philadelphia, PA, USA.

For more information and links to our social media presence, please see this website: https://trueblocks.io.

Ethereum Address

TrueBlocks Ethereum Address:

0xf503017d7baf7fbc0fff7492b751025c6a78179b

trueblocks.eth

Conclusion

Thank you for considering our proposal. We look forward to answering any questions you may have. We’ve been longtime supporters of the Giveth constellation of projects, so we’re very excited and interested to participate in any way possible.

One final word – we’ve always been greatly inspired by the Giveth & Commons Stack vision. While this proposal describes a short-term project for Giveth, we look forward to a continued relationship and would love to discuss how TrueBlocks might fit into a broader picture. We think TrueBlocks is uniquely positioned to help accomplish one of Elinor Ostrom’s key tenets: that of building effective community-based monitoring run by the community for a given commons. Help us build that.

5 Likes

I’m thrilled to see this proposal finally hit the forum! How will the final product take form for the Giveth team to utilize? Will it be able to be managed by non-developers? How customizable will it be? For example the list of verified projects in our database is ever-changing - Will Giveth be able to modify this themselves or will we require TrueBlocks to manage the tool?

4 Likes

Happy to see this proposal and to support it! Thanks @tjayrush :slight_smile:

In addition to Mitch questions I’d like to know if multichain support integration on Giveth would be charged additionally? I understood that multichain support to Trueblocks is something you will be adding regardless, but I wanted to clarify if any integration to future chains on “our side” would be treated as a separate work?

Also, would we be able to give that final product to other projects to use in their environment? Maybe that’s not even possible so pardon my ignorance.

Thanks again!

2 Likes

My preference would be for your team and ours to work together to stand up a system that works independently of us. In other words, something that can be “installed and just works.”

The first deliverable would be a ‘data scraper’ that extracts all the available on-chain data (transactions, logs, neighbors, balances, etc.) about whatever set of addresses you provide (if you have an API listing the addresses, we can read directly from there).

Other than making sure it’s continually running (which is a DevOps issue), my hope is it would require little-to-no management.

If the list of addresses you’re interested in comes from an API, then this would be a non-issue. If the list is not an API, then a simple .csv text file would suffice.

As far as customizing the output of the scrape, that is also possible. This would be a ‘configuration’ issue. Changing that configuration–currently–would involve a developer editing a config file.

Summary:

My goal is to make something that can be “installed and just works,” so you don’t have to do anything (other than making sure it stays running). It’s important to note, that we’re proposing a solution that runs on your hardware. We do not provide this as a service. For this reason, and in answer to Marko’s question below, what we’re building is something that any other project (such as a DAO, other donation sites, etc.) with similar needs can stand up a “monitoring” system without relying on us.

The first deliverable would ‘scrape’ the data and ‘put it somewhere.’ ‘Somewhere’ being wherever you want it to go: a database endpoint, flat .csv files on a server, under the carpet…literally anywhere. You can help us understand where you would want the ‘scraped’ data to go.

The second deliverable of the project (after consultation with you’all and some data designing) would be to figure out how best to analyze the data to meet your needs. There’s a link in the proposal where I started discussing this issue.

Thanks for your questions. If you have any more, please ask. Happy to clarify anything.

3 Likes

Hi Marko,

Multi-chain support is included. We have some work to do to finalize support for multiple chains, but we added rudimentary support recently. That work needs hardening/testing to make it production-ready. This hardening/testing is part of the proposal.

Concerning your second question about more broad usefulness for other projects–the answer is “yes.” Please see my answer above to Mitch’s question where I detail this a bit more.

1 Like

Thank you, it’s perfectly clear now. No further questions for now.

2 Likes

I have seen what this tool can do… its amazing! Thank you so much for your collaborative efforts this far and for all of the work that your team has already done on this front.

Super happy to see this proposal up in the forum and I am looking forward to a time that recirculation review using human tracking and spreadsheets is a thing of the past.

3 Likes

Hi. So I’m wondering what the next steps are.

I found this: Governance Process.

This is now true:

Proposals must remain on the forum, open
for Advice Process, for a minimum of 5 days.

It also says:

proposals can move on into either the GIVgarden
or the rDAO DApps for voting

Is this something I do or someone else?

I’m ready to get started in earnest.

Thanks for your help.

Yep, it will be useful for us to state somewhere for the GIV tokens you’re requesting (The GIVgarden only disburses GIV) - If we take the $30,000 USD (30,000 DAI) in your proposal at the current price of GIV we arrive at 67419.138948269 GIV requested.

You can proceed to the next step by creating a proposal on the GIVgarden

Feel free to DM on discord if you need further assistance!

3 Likes

Okay. Proposal submitted to the Giv Garden. Gardens.

Really smooth experience. Kudos to that.

2 Likes

You deleted it cause it was set to use wxDai… when will you repost?

I’m trying to repost now. The 5,000 deposit from the previous post seems to have come back into my account (I can see it at the top of the GivGarden), but when I go to create a new proposal, it says I don’t have enough Giv tokens.

At the top of the page:

image

When I go into create a new proposal:

Thanks for everyone’s patience. I had to remove the previous proposal and make another one. Here’s the new proposal: Gardens

I’d really love your support if at all possible.

1 Like

Not sure of the protocol here (sorry if I’m violating it), but I wanted to bump my proposal which appears to have one more day of voting. Here’s the proposal: Gardens. We’d love your support if you’re willing to give it.

Hello all,

I would like to formally thank the Giveth community for awarding the TrueBlocks team a grant [above]. We look forward to working with you to develop a system to identify recirculating GIVbacks.

While we’re just getting started, I wanted to update you on our progress thus far and discuss the next steps.

As a reminder, our proposal puts forward three primary steps:

  1. Work with the Giveth team to better understand the recirculation problem in order to identify ways to programmatically address the issue.

    Update as of April 1, 2022:

  • During the proposal process, we joined in on a few calls with team members and participated in two Friday Fraud Review calls. At this point, we feel we have a better understanding of the problem and how it’s currently being addressed through a manual process. This will inform how we can automate this process.
  • Our understanding of the problem will grow over time, but at first blush, the problem seems relatively straightforward, so we can proceed to the next step.
  1. The second part of our proposal was to complete a “port” of TrueBlocks to the Gnosis Chain.

    Update as of April 1, 2022:

  • We’ve made good progress on this task even while waiting for this proposal to pass. We now have a feature called “Multi-Chain Support” (including Gnosis chain). We have a currently-working version of TrueBlocks running against the Gnosis Chain on our local systems. This work is not complete, however.
  • The portions of the work that are not complete are:
    • Testing – our Gnosis Chain port is working, but not well tested. We’re actively building more robust testing.
    • Documentation – if, as is our hope, Giveth (and other projects) run their own TrueBlocks instance (self-hosted on a dAppNode, for example), there needs to be better documentation. See here for current progress: Multi chain - TrueBlocks.
    • Publication of the Unchained Index – A very important part of TrueBlocks is what we call the Unchained Index. This is a method by which we not only create an index of “every appearance of every address anywhere on a chain,” but wherein we publish access to that index in a way that makes it impossible for us to ‘withhold’ or ‘censor’ it. This feature (the Unchained Index) already works on Mainnet Ethereum, and there is some work involved in bringing this feature to Gnosis.
  1. The third part of our proposal was to (a) stand up a monitoring/data pipeline process for Giveth against both the Gnosis Chain and Mainnet Ethereum Chain. Part of that work was to help the Giveth team stand up their own pipeline in the future so that they can independently maintain their own monitoring; (b) identify and collect relevant grant recipient addresses to do the analysis; (c) write the code needed to automate the identification of potential recirculating grant funds, (d) analyze the results and iterate.

    Update as of April 1, 2022:

  • During the proposal period, we expanded on existing work we had done on the GitCoin ecosystem (https://tokenomics.io/gitcoin). We will be standing up a similar system for Giveth on our servers next week. We also hope to have further discussions with the Giveth DevOps team to ensure an anticipated integration with existing systems. Those conversations can begin now.
  • Once this monitoring system is standing up on our servers, we can start analyzing the data to effectuate the solution. We’ve mentioned in the past that we’ve worked with a great data scientist (who we’ve worked with at Gitcoin–his name is Richard_). He would like to participate in this Giveth work and potentially become a contributor to the DAO. I’m in touch with hanners717 in your group regarding this.
  • We anticipate iterating with your team on the specifics of the data analysis, the output data generated, and how the ongoing monitoring system might work in the future.

Summary: We’re ready to start working on this project in earnest next week. Scraping and monitoring against both Gnosis Chain and MainNet Ethereum are “mostly” working but need refinement (an ongoing process). The “data pouch” (https://tokenomics.io/) will provide similar data to the GitCoin site and should be in its first version in a week or two. The process of finding a good data scientist to help us analyze the data has begun.

Thanks for reading.

If you have any questions or would like to have a call to discuss, please let me know. I hope you find this helpful or if you have any suggestions on how to improve our progress reporting, please let us know. Our plan is to provide a formal update like this every other week.

Post Script: My co-worker, Dawid Szlachta, who’s been working with us for about a year, will be helping with this work. He’s an excellent resource and is available on our Discord.

1 Like

Hello Giveth Community,

Since our last update, we’ve been very busy. Please see the original proposal for more information.

The original proposal had three parts:

Part I – Discussions Pertaining to the Recirculation Issue

This part of the project, as previously reported, is complete.

Part II – Support for Gnosis Chain and Creation of Tokenomics Data Dump

The second part of our original proposal had two subtasks. The first subtask was to extend TrueBlocks indexing to the Gnosis chain. The second subtask was to use that new capability to create a Tokenomics Data Dump of the entire Giveth ecosystem.

Part II-Subtask 1. Support for Gnosis Chain

This subtask is complete, and, as we expected, was easy to extended to other EVM-based chains. In the future, if Giveth were to support other chains (such as Polygon), our indexer will easily work with those other chains.

We’ve been running the TrueBlocks indexer on both the Ethereum mainnet and Gnosis since early April. Henceforth, anyone with access to the UnchainedIndex smart contracts (which is everyone with an RPC) can retrieve the chain’s entire index without permission (from IPFS).

We’ve also released a pre-alpha version of a dAppNode package that uses the UnchainedIndex to build a local copy of the index. In the future, we would like to see Giveth (and other projects) run this dAppNode package. This will provide Giveth with unprecedented decentralized–and super fast–access to its own data.

An official version of the dAppNode package should be available shortly (Note this work will be completed on our own time, not Giveth’s).

Part II-Subtask 2. – Enable a Tokenomics.io Data Dump Website

The second subtask of part II of our proposal required the first subtask. With multi-chain support, we can now build a Tokenomics Data Dump website for Giveth (Data Pouch - Version III).

This subtask is “complete enough” (that is, it’s far enough along to move on to the primary project of identifying recirculating donations).

We spent much of our time this month on this subtask. This demonstration website now includes a full data extraction of every one of the nearly 1,260 addresses on Giveth’s purple list (although only about 120 addresses have activity).

Please note the unique nature of this dataset. Most datasets one sees from an Ethereum project are either (a) generated by an ad-hoc process (many of which rely on pay-wall gated APIs), or (b) use theGraph (which is free for now, but won’t be forever). Regardless of the method, neither of those two processes produces a dataset similar to the one presented here. The unique aspect of the TrueBlocks dataset is that it includes full transactional histories of every involved address. (We call this Ecosystem accounting).

This dataset includes not only transactions related to the Giveth, but all transactions of all the addresses that have ever received donations on Giveth. This depth of detail is required for the third and final part of the project.

How Does TrueBlocks Work

A short diversion to help explain what happens under the covers.

TrueBlocks creates “monitors” which can be run however often as you wish (in Giveth’s case, we run them every five minutes):

These monitors read a simple list of addresses (in Giveth’s case, a list of addresses we’ve scraped from your purple list).

Given this list of addresses, TrueBlocks watches for new transactions.
Whenever an address transacts on-chain, TrueBlocks pulls seven different types of data from that transaction. (This list of data types to pull is also customizable per project.) In Giveth’s case, we pull…

…appearances, transactions, logs, balance histories, neighbors (more below), etc. and then we combine and compress the data. (Data types are defined here: tokenomics.io/giveth/exports at master · TrueBlocks/tokenomics.io · GitHub)
This produces separate datasets for each address for each data type. The data structure, which is per-chain, looks similar to this:

image

This process of monitoring and extracting data from a collection of addresses is currently running on our server, and we are happy to continue to provide this service free of charge to the Giveth community, however, our ultimate goal would be for you to stand this capability up for yourselves on a dAppNode or similar system.

Summary for Part II: We’ve completed two significant portions of this work with the help of Giveth’s funding: (a) support for multi-chain, and (b) multi-chain data extraction for a large (1,260) collection of recipient addresses.

Part III – Identification of Recirculated Donations

What remains?

All of the above is pre-amble to the actual work which we are hoping to complete in the next few weeks.

We now have the ability to scrape multiple chains. We also have the ability to extract full transactional histories for a large (1,260) collection of address. And—important to us—we have the ability to do this in a fully decentralized way on a dAppNode.

We’re now ready to begin the task of identifying recirculated transactions.

We’ve written a few preliminary ideas here:
Recirculation on Giveth Platform - Google Präsentationen. This thinking is out of date. One of our tasks is to extend this document and fill in missing details.

One of our colleagues, @Richard, has done some preliminary proof of concept work using Dune Analytics. Dune.

This work uses Dune which will serve as a double check to our results. Note that while this work is fairly clear, it suffers from a few shortcomings. (1) it is not automated (the addresses need to be copied in), (2) it doesn’t scale into the full transactional history of each address. This work digs only two levels deep into each address’es history. For our proposes, we will need to dig much deeper. This does help us understand two things, though. (1) there do appear to be recirculation’s, and (2) they happen relatively shallowly in the transaction history of some address. Interesting.

We’ve also spent time working on a Dynamic Traverser. This technique is documented here: https://tjayrush.medium.com/dynamic-traversers-in-trueblocks-7e2215cb1af9.

Dynamic Traversers are extremely powerful. They allow us to efficiently produce a list of ‘neighbors’ to a given address. Not only that, we can recursively traverse the list of neighbors. It is this ability that will allow us to solve this problem.

One note: searching the transaction history of all previous senders is probably an intractable problem. The tree of historical transactions grows unboundedly (at least as far back as the first block). We have a number of ideas in mind to short-circuit this search, and will document them, to make this intractable problem tractable.

This will be our focus for the remainder of the project.

We hope to have the first working version of this ready in the next few weeks and will report back here when we have further results.

Cheers.

5 Likes

Amazing progress! can’t wait to see the final product!

2 Likes

Hello Giveth Community,

Executive Summary: It’s been a while – sorry for that, but starting next week, we’re full-time Giveth until we’re done.

Work Since Last Update

One of the biggest outstanding issues in the original proposal is the issue of “How does TrueBlocks deliver a solution without becoming a web 2.0 ‘solutions provider.’” If you’re aware of our work at all, you’re aware that we are “maxis” when it comes to the issue of decentralization.

At the end of our most recent update, our “heroes” were describing a ‘Data Pouch’ (Data Pouch - Version III) that we built. If you look closely at that site, you’ll see that the data has not been updated since May 5. In other words, our hero’s existing solution was not robust. Our goal with the “data pouch” was that it could be installed and be run unattended, continually freshening its data. That did not work as planned. Furthermore, you may notice that providing a website that delivers data contradicts the previous point I made that TrueBlocks does not want to become a “service provider.”

How do we reconcile those two seemingly opposing views?

Answer: dAppNode

Instead of solving the robustness issue with the current data pouch, we turned our focus to completing a dAppNode version of TrueBlocks based on Docker. This has the happy consequence that the same solution we provide on dAppNode (i.e. Docker) can be used on Tokenomics for the data pouch. Killing two birds with one stone, as it were.

Since we last reported, we’ve been very focused on:

  1. Building a Docker version of the trueblocks-core, and
  2. Producing a dAppNode package based on Docker.

This will allow us, in the coming weeks, to stand up a more robust version of the data pouch, and moving forward, to provide a solution to Giveth (and anyone else who’s interested) to stand up the same tool on a dAppNode – thereby finally realizing our desire to produce decentralized data access directly from the EVM client software.

Setbacks

There was another issue that set us back a few weeks during the last month. There was a fairly serious bug in Erigon. Because of the unique nature of our software (historical indexing), few others encountered this issue. While it would be better practice for us to use a more stable branch of Erigon, we’ve chosen to use their “bleeding edge.” About six weeks ago, they had a bug that our code didn’t notice until about three weeks ago. This had the unfortunate effect of injecting invalid data into our index. We’re still working on a solution to that issue, and this has taken precedence over the rest of our work. We hope to complete this “recovery” work soon.

Future Work

In the next few weeks, our focus will be:

  1. Finish docker version
  2. Use the docker version for the existing tokenomics/giveth data pouch
  • This will automate the production of the Giveth’s data
  • This will make the existing tokenomics systems more robust
  1. At the same time, write the code for the custom traverser that identifies “recirculating” donations.

    As part of this work, we will add a new column to the display showing different labels for each address. For example, “new” donors vs. “existing” donors, “grants” vs. “donors”, etc.

    Of particular interest will be the “recirculated” tag which will identify addresses that have recirculation behavior in their history.

  2. Notes:

  • The user will be able to click on a tag to filter the data. For example, clicking on …recirc… will filter out only the addresses identified as having suspicious activity.
  • The tags are “predicates” (either true or false), so it will be easy to any number of additional tags that may be of interest to Giveth.
  1. Produce the same exact display (and all the necessary back-end integrations) on the dAppNode.

Once step 5 is complete, we will consider the project complete, however, we recognize that Giveth will have an ongoing issue maintaining this system. As you’ll see in the original proposal, we discussed the need to “Help define methods for Giveth to bring the above function in-house or find a dedicated node endpoint for ongoing monitoring”. We will remain engaged until this is resolved. There are three possible paths:

  1. Giveth can run the Tokenomics monitoring code (and a version of Erigon) in-house,
  2. Enter into a for-pay relationship with TrueBlocks to maintain the Tokenomics website,
  3. Install and run TrueBlocks and Erigon on a dAppNode and run the data in a decentralized manner.

This third option is of most interest to us, as this is our primary path for our development efforts. It would also allow us to exercise the muscles needed to provide such a piece of software to end users. Something we think we will be fully ready to do once this project is completed.

Conclusion

In short, we’ve come a long way, but there is more work to be done. We hope to have substantial completion of the project by mid to late July.

Thanks for reading and we welcome any questions/comments.

5 Likes

Wow! i have been wanting a dappnode package for years. This is great news!

1 Like

This is a very happy solution, love seeing how projects in the Giveth Galaxy rise into ever greater collaboration for improving the ecosystem as a whole!

Looking deeper into how the monitoring tool gives us data for identifying recirculation, I think about the mitigation portion of this proposal.

As someone who makes a lot of donations AND helps a lot of project owners raise/disburse funds - if there is any way I can support the process including identifying best practices that are revealed through data analysis, particularly around communicating them to donors and/or project owners please let me know.

3 Likes