Key Insights on Knowledge Graphs and Human Transcendence

I’ve been thinking a lot about knowledge graphs, largely motivated by my own communication deficits and challenges finding conceptual alignment with people.

Feeling misunderstood, struggling to convey an idea in the time budget allotted, finding new insights that exist and are merely inaccessible, and the profound impacts of technology on our society – This topic has civilization scale impacts on all of us. It’s nothing new; Vannevar Bush clearly articulated this problem of expanding human knowledge and our diminishing capacity to individually comprehend it in 1946, and imagined the Memex.

We’ve come a long way with technology, but in many ways this has only made the problem worse – With fake news, reduced attention spans, and practically unbridled communication bandwidth; In many ways we hear more, and know less than ever before. Vannevar Bush’s Memex is in some sense closer than ever, and also much much further than he had imagined. We have had civilization-scale reckonings to some degree with agriculturalization, industrialization, and nuclearization. We’ve not yet reckoned on a civilization-scale with information technology. It’s coming.

From time to time there are opportunities to shift the course of humanity, and I believe we must seize upon them, and manifest these opportunities. One of these in my view is space travel. The other is transcendence of our individuality to recognize and create a more perfect civilization (with some reeally important caveats about privacy and agency of course.) This all hinges on knowledge. How do we develop it, how to we transfer it, and how to we conjure insights from it? Barring a very unlikely, drastic and rapid diminution of communication bandwidth in the near term (and commensurate return to an agrarian society) our existing systems will prove increasingly feeble and destructive to humanity without some significant and thoughtful revisions.

In my view, the evolutionary pressure of such unbridled communication channels exposes subtle, yet profound flaws in both the specific implementations of many modern communication channels, and in the constructs of linear human-language narriative. “Ludicrous!” You might say “We have a huge body of expressive and marvelous vocabulary, a treasure trove of literature, we have poetry, and rap, and so on!”

Ultimately, as storied and artful as these renderings may be, human language evolved for discussion with promximate individuals, monologue to others, and only very occasionally by comparison: high-latency collaboration over long distance.

Even in the best of scenarios, our communication channels and our linear linguistic renderings within them pose uniquely new problems in the face of the drastically increased human connectivity of the information age:

Modern communication contexts are not conducive to productive collaborations in real-time, or at internet scale.

For instance: identify a New York Times article on a given topic within your domain of expertise.

You may notice that, while overall a good article, there are a couple factual errors in there.  At this point you have a hard choice to make:

Will I be brave enough, thoughtful and quixotic enough to write a response in the comments section? – Hoping that some brave soul has grokked the whole article, and your thoughtful point-by-point critique, then linked them together perfectly in their heads? That seems really unlikely to work.

I could say “screw it” and just not care, resigning to let the subtle falsehoods live on unchallenged, but that’s not very good.

Alternately one could provide ordinary users with a way to annotate individual passages of the NYT article with thoughtful refutations and constructive criticisms. However, this seems every bit as likely to devolve into name calling, ad hominem, and degeneracy of discourse as your average internet comment section – perhaps worse still, because they could interject and derail the reader in the first place! Thousands of people would comment on random passages according to their own agendas and perceptions, and it would either be completely unreadable, or you would have to carefully curate, assess quality/factuality, and then suppress a vast majority of them, leaving only the best, or officially sanctioned comments. That’s got all kinds of problems.

What if instead of this, we created a way to manually (and admittedly quite painstakingly a first) have users ”decorate” passages with abstract knowledge graph assertions?

Much prior art exists for the creation of such knowledge graphs, creating granular assertions and relationships, typically in something like “subject, predicate, object” notation – IE: “cat” isa “mammal”, albeit with some crucially problematic limitations, at least for the knowledge graphs I’ve seen:

1. They aren’t collaborative in-band with the context into which one would like to link them

2. Many previous knowledge graph attempts had focused on the representation of “facts” as knowledge, when in actuality we were recording someone’s personal opinion, and often a reductive one at that.

(more on this in a later post)

So, the core of the idea: let’s not simply curate insightful textual annotations and comments – we’ve tried this. It’s very hard to do in a trustworthy and “unbiased” fashion, and I daresay practically impossible to do at webscale.

Instead, let’s allow users to annotate passages of a text through the creation of fine-grained knowledge graph nodes. These nodes would finely and precisely describe (and perhaps eventually subsume) the meaning of the original passage itself, PLUS the assertions being made by the annotators, referencing concepts or assertions defined elsewhere, and by others.

Of course, this will be a great mess, completely impossible to display this massive tangle of graph-structured annotations intelligibly, right? Not so!

A brief digression: I would posit that humans –even the ones with which we vehemently disagree – innately have much more in common with each other in terms of core belief systems than the ways in which they differ. I’m not talking about politics, I’m talking about daily reality. The sky is blue, dogs are in the Kanine genus, aspirin contains acetylsalicylic acid, condensation happens when the air gets colder, etc. Just about all logical argumentation is built from reference, however distantly, of some reasonably well-understood, well-agreed-upon basis. For most people, challenges with comprehension and credibility come when that person fails to see, or understand, or is insufficiently patient to understand the relevant chains of references back to their core beliefs. In many cases, subtle though they may be, this comes down to the context in which they are viewing the information. Is it a walled garden? For instance a web page containing a one-sided narriative? Are these walls inadvertent, or deliberate? When I shout at my TV, no one hears me. When I comment on that NYT article, maybe someone hears me, but mostly I’m shouting into the void. Communication context and it’s collaborative mechanics, in whatever form, matter very much.

Anyhow, under an open-access knowledge-graph system, there will be a great body of common concepts such as these predefined, which can be leveraged to build an immensely articulate, and perhaps most importantly: convergent (!) cluster of knowledge assertions from your fellow annotatiors.

This way, the system can then programmatically generate a human language rendering of the most connected clusters of that graph. This should in theory be biased not toward populist sentiment, but rather toward a convergence of knowledge which is most strongly connected with the baseline norms we all agree upon.

Of course some motivated folks will try very hard to game this system, creating their own cluster of convenient mis-assertions. If the system is implemented carefully, this approach will struggle to produce results for the attacker, because their island of mis-asserions will show a very low degree and strength of connectivity versus those clusters of non-adversarially entered assertions, whose connections to the broader knowledge graph are much deeper, stronger, and better argumented.

Simply put: Although ignorant and malicious actors may successfully create superficially inaccurate assertions within the knowledge graph, there nonetheless exist significant and powerful structural forces within the public body of knowledge that would cause the connection strength of those accurate assertions to be vastly higher.

In time, much more elegant and sophisticated data-entry and rendering techniques could be developed, beginning with very simple interfaces based on present technology, and ultimately culminating –prudently or otherwise– with high-fidelity BCI (Brain-Computer Interface). Machine Learning could readily be employed to develop novel insights and learnings from the overall corpus of knowledge, especially cross-disciplinary knowledge! Perhaps this is how we could begin to realize the promise of the Memex.

Suffice to say, while quite complex, this topic is very Important to me, and frankly I think important to the protection of the modern epistemological order. I intend to write much more, and act to the greatest extent of my ability to further such an endeavor, starting with a draft concept I am calling, and potentially in concert with For now, here (pictured) are some ideations on an organization which could potentially be fundable – Hopefully bringing vast numbers of people together in what might otherwise be an increasingly angry and confused world, and maybe, just maybe addressing a civilization-scale threat In the process.

Daniel NormanComment