Home ยป semantic search

Tag: semantic search

Building the Tesseract: The Archive Learns to Search, See, and Talk Back

Sharing this with the NET-ART and CUNY Commons community because, underneath the build, it is a teaching question: what becomes possible when you can point AI at your own work, or a whole course, and search it, see it, and talk back to it, locally and for free? There is a section below written specifically for teachers. Originally published at ryanseslow.com.

 

Twenty years of my scattered work, pulled into one living archive that talks back, holds a huge portion of my creative life, opens a door for machines, and now shows you its face. A field report, with the unflattering parts included..

Yes, this is unapologetically long.. good thing our attention spans are ready for it!

This started with a simple, slightly uncomfortable question:

What is hacking, really, and could I hack myself?

Not in the “Hollywood” sense. In the original sense: to understand a system well enough to make it do something it was not expected to do. I wanted to point that lens at the one system I have the most access to and the least honest view of, my own patterns. So I asked an AI working session to do something most of us never let anything do: read my actual behavior off my own machine. Not the story I tell about myself, but the evidence. The files. The time-stamps. The folders and files that I start and abandon. The things I save and never reopen..

What came back changed how I see my own work, and over a handful of sessions it turned into something I had wanted for twenty years and never finished. This post is the whole story, start to finish. If you have been following along, you already know the early chapters: I let an AI read my entire twenty-year WordPress archive and asked what would happen. This is where it lands. The archive learned to talk back, then it went public, then it opened a door for machines, and just now it opened its eyes.

Here is the twist I did not expect: every version that worked was the one I made smaller. The early builds were ambitious and intricate. The versions that shipped are deliberately minimal, the standard library, one database file, a couple of small local models. The distillation was the breakthrough.

I stopped elaborating and started finishing..

Hacking Myself: The Loop I Could Not See

I let the session look at the shape of my digital life: my Desktop, my 50GB+ iCloud archive, my Google Drive, my live website. Not to read my private thoughts, to read the patterns. The structure. The geology.

The finding was humbling and precise. Across every archive, the same loop repeated at every scale:

A vision ignites, I erupt in prolific output, I get the high of the birth, the next idea pulls me away, the work is left where it landed, it quietly entombs, and months later the same idea is reborn under a new name..

I am, it turns out, addicted to genesis creation and allergic to maintenance. I start brilliantly and rarely return. My iCloud held a heroic consolidation of my career, built between 2015 and 2018, then abandoned and never reopened. My Desktop held twenty live project threads in ten weeks, nothing filed. And the same core idea, an AI trained on my own art and writing, had been born three separate times under three different names, each one starting over from zero. That is seriously funny!

The missing spot was not disorganization. It was that nothing I made was ever allowed to compound, because compounding requires returning, and returning never gave me the hit that starting did.

The Correction: “No Content Available”

Here is where it got sharp. The session found that I had, years ago, (2024 in AI time is like 10 tears ago in todays time) haha, already started building the AI-trained-on-me dataset. I had even exported my entire website into per-year training files. For a moment it looked like the project was most of the way done.

Then we actually opened the files. And almost every single record said the same thing:

{"prompt": "Describe the artwork titled 'DSC06448' created in 2009.",
 "completion": "No content available."}

The training data for the AI version of me was empty. The pipeline had pulled image filenames, DSC06448, but never captured a word of my actual writing. I had built the exciting structure of the idea, run it once, gotten back rows that literally read “No content available,” and walked away before the unglamorous extraction work.

I want to sit with how perfect that is. The empty file was the whole diagnosis in plain text. The content is available. It is all over my live site. I just stopped before capturing it. Genesis got done. Maintenance did not. Even my self-portrait-as-AI had abandoned itself at the hard part.

So we changed the plan: stop trying to out-discipline the loop, and build a layer that does the maintenance automatically, routing every future idea into one home instead of letting it spawn a fourth.

(One unglamorous aside, because it belongs to the same lesson: the self-audit also turned up live API keys sitting in plaintext inside old scripts, the kind of thing that can quietly run up a bill or worse. We found them, I revoked them, and rewrote those files to read their keys from the environment. The cost of never returning to your old work is that things rot there. Going back is not glamorous. It is also where safety, and value, actually live.)

Chapter One: The Archive Learns to Talk Back

The fix has a deliberately boring shape, because boring is what compounds. I call it RyanSeslow OS, a single, local home for my body of work, in three layers:

  • Ingest, pull my real content from where it actually lives.
  • Spine, store it once, in one place, in a form I can search and grow.
  • Aremes, a conversational layer that answers questions using only my own writing, in my own voice, with citations.

Then we built it, end to end, in a single session. It read my website, 1,160 posts and pages, roughly 357,000 words spanning 2008 to 2026, with more than 12,000 images linked, into a single catalog. It turned all of that into a local semantic index. And then I asked it a question I had never directly answered anywhere:

How can artists use AI to expand their creative practice without losing themselves?

Aremes answered in my voice, drawing on essays I wrote in 2012 and 2013, citing each one with a link, and honestly noting that I had never addressed the question head-on rather than inventing an answer. That honesty is the system working correctly. It is grounded in me, and only me.

For the first time in this entire twenty-year pattern, the AI-trained-on-me idea shipped, held real content, answered questions, and grows when I publish. The loop broke.

The most surprising thing about it is how small and free it is. It runs entirely on a laptop. No API key, no subscription, no cloud bill, nothing anyone can revoke. Python 3 and its standard library only. My own WordPress content via its built-in REST API. One SQLite file. Two small local models through Ollama: nomic-embed-text for the meaning index and llama3.2:3b for grounded answers. Retrieval is plain cosine similarity in pure Python; with about 1,100 documents, brute force is instant.

(The full build is in the appendix at the end of this post, so you can make your own!)

Chapter Two: The Archive Goes Public

The obvious next step was to drop the AI chat onto my website so anyone could ask it questions. I did the opposite, on purpose.

Here is why. The local model is small enough to run on a 2019 laptop, which is wonderful, but it means that every so often, even grounded in my real writing, it will invent a quote and attribute it to me. On my own machine, with a verification layer that flags fabrications, that is manageable. On a public website, it is unacceptable. A tool that occasionally puts words in my mouth, in front of strangers, is worse than no tool at all.

So the public version is search, not chat. It does not generate answers. It does not summarize. It does not imitate my voice. It takes your words, finds the most relevant passages from my actual posts, and links you straight to the originals. Zero hallucination, because there is no generation happening at all. Every result is really me. And it is built the way the whole project is built, as a single static page on my own shared hosting: no server to babysit, no AI service metering me, nothing a company can switch off.

You can use it right now: ryanseslow.com/search/

Then I gave it a big portion of my creative life, not just my blog. For two decades my work has lived in different places: long-form on the blog, but also thousands of posts on Tumblr, Instagram, more than 1,500 animated GIFs and stickers on Giphy. None of them talked to each other. None of them were searchable as one thing.

And this is where it got funny, and very me. It turned out I had already “prepared” each of these. Years ago I had made caption files, export folders, an archive system for every platform. I felt organized. Then we actually opened them:

  • My Giphy captions file, 1,593 rows, where every single caption was an error message. The captioning script had broken and saved the errors as the captions.
  • My Tumblr “full archive” was entirely placeholder text: “Caption for Ryan Seslow artwork N, generated from AI analysis.” Stubs. No real content.
  • My Instagram archive, a beautiful folder structure I had named “The Memory Tree,” had a captioned-exports folder that was completely empty.

The same thing, again. I built the elaborate structure and never filled it. So this time we finished it, going to the living sources instead of the abandoned exports: my real Tumblr posts pulled directly and filtered down to only my own work, my real Instagram captions from the official export, the real titles and dates for all of my Giphy work. One search across everything I have made, blending platforms that never knew about each other. Search “sign language graffiti” and you get my Tumblr hand-style posts, my Instagram public-space interventions, a sign-language sticker from Giphy, and my long-form essays on art in public space, side by side.

Chapter Three: A Front Door For The Machines

The search box was built for human eyes. But the next thing to visit your website is not going to be a person. It is going to be an agent.

More and more, the way people find and buy things runs through an AI acting on their behalf. You tell it what you want, and it goes out, reads sites, compares, and sometimes completes the purchase, all without you opening a tab. My website was welcoming to a person and almost invisible to software. An AI that showed up at ryanseslow.com had no clean way to know what I make, what is for sale, what it costs, or how to license it. My twenty years of work might as well not have existed to it.

So I gave my archive a front door that machines can read. There is an emerging set of quiet standards for exactly this: small files you place on your site, written for machines rather than people. One is llms.txt, a plain-language summary an AI can read to understand who you are and what you offer. Others live in a .well-known folder and describe your catalog and capabilities in a structured way agents already know how to parse. A sign, written in a language only machines speak, hung on the front of the building.

And, very on brand for this series, when I went to check it, the door was broken. The file an agent looks for first was returning “not found.” I had built the doorway and never confirmed anyone could walk through it. We found the bug, fixed it, and tested it the way an actual agent would. Now when an AI arrives, the door opens: it can read a clean description of my practice, pull a machine-readable catalog, and search all twenty years through a single endpoint.

Built into the same surface is a way for an agent to ask a price for a piece and pay for it, in stablecoin, on its own, with no invoice and no checkout page. I am calling this layer AREMES, and the point is simple: my work should be able to be found and licensed by a machine at three in the morning while I am asleep. I am not turning my art into a vending machine, and I am not replacing the human relationships that matter most. I am making sure that when the buyer is an agent, and increasingly it will be, the door is open instead of closed and invisible.

Reading My Art Off The Chain

Here is the part I am also excited about.. because it taught me something. A chunk of my digital art work over the last several years lives on-chain, as 1/1 art on SuperRare. I wanted all of it in the archive. So I asked the platform’s own tools for my catalog, and they could only cleanly hand me the works currently for sale, sixteen of them. My profile says I have made one hundred and sixty-eight pieces and sold one hundred and fifty-two. The convenient view of my own catalog was mostly the unsold remainder.

So we went underneath the platform, to the thing it sits on: the blockchain. Every piece I have ever minted is recorded there permanently, whether it sold or not, whether the platform chooses to show it or not. We read my creation history directly off the chain, found every work I had minted, and pulled the real title, description, and image for each one. One hundred and fifty-eight came back complete. Read-only, no fees, nothing that could be revoked.

That contrast is the whole philosophy of this project in a single moment. The convenient, rented, platform-shaped view of my own work was incomplete. The permanent, owned, underlying record was whole. (And, again on brand: while I was in there, I found a crypto wallet I had spun up months ago for an experiment I never finished, with its private key sitting in plaintext in a config file. Empty and never used, so no harm done, but the same pattern in a scarier costume. I closed that loop too. The exciting new thing always arrives with new housekeeping.)

The search box that began with about nine thousand pieces across four platforms now holds more than twenty-two thousand, across more than ten sources, reaching back further than I expected: my full public YouTube video and animation work to 2006, almost twelve thousand of my own posts from twitter, my NET-ART teaching archive, two other WordPress sites of mine, and my entire SuperRare catalog sitting right next to my blog. One search, one body of work, twenty years and then some, in one place I own.

Chapter Four: The Archive Opens Its Eyes

Until now, everything I have described answers in words. You search, and you get titles and passages and links. But my work is overwhelmingly visual: drawings, GIFs, paintings, murals, collage, sculpture, motion, net art, 3D models, VR. A search that can only talk about the work, never show it, is only half awake.

So in the last day I gave the search eyes. Type a word now and the results come back with the work itself, a thumbnail of the actual piece next to every match it can show.

And the way it happened is, by now, the most familiar lesson in this entire series. I assumed I would have to go re-collect all those images. Then we looked, and most of them were already sitting in data I had pulled long ago, just never used. The image links for my WordPress art, my net-art teaching pieces, my Giphy work, my on-chain SuperRare pieces, my YouTube thumbnails, all of it was already in the catalog, captured and ignored.

My Twitter archive was the sharpest version of it. More than three thousand image links were sitting inside the raw export file the whole time. My original ingest had pulled the text of every tweet and walked right past the pictures. The images were never missing. They were never extracted. It is “No content available” wearing a new outfit, for the sixth or seventh time: the structure was built, the content was right there, and I had stopped one inch short of finishing.

This time the inch got walked. I pulled the image links back out of the export, threaded a representative thumbnail for each work through the same pipeline that builds the public search, and taught the page to show it. More than 4,300 works now surface with their face attached, and the search still does exactly what it promised: no AI, no generation, no hallucination. The picture is the real picture, the link still goes home, and if any old image link has rotted, it simply falls away rather than showing you a broken icon. The eyes did not cost the honesty.

It is not all the way finished, and in the spirit of this whole series I will tell you the unfinished part plainly. Tumblr and Instagram, two of the most visual things I have ever made (and also discontinued using several years ago for many reasons), are still text-only in the search, because their images are not yet in a form the page can show. Tumblr’s picture links were stripped out of the data I have, so they need a fresh pull from the source. Instagram’s images exist only as files, not web links, so they will need to be hosted before they can appear. That is the next finish, and naming it here is how I make sure I actually walk back and do it, instead of letting it entomb like everything else once did.

What This Means For You

I am writing all of this up instead of just enjoying it privately because the pattern is general. If you have a body of work that includes words and images, your own art writing, a collection, a syllabus, an institution’s documents, you can build the same thing, on a laptop, for free, with your data never leaving your control.

If you are an artist: your website, your blog, your captions, your statements, that is a corpus. Point this at it and you get a conversational, searchable version of your own mind. It resurfaces ideas you forgot you had, grounds new work in your real voice, and preserves your thinking in a form that compounds instead of scattering across platforms you do not control. Most importantly, it keeps your voice yours. The model only speaks from your words.

If you are an archive or a collection: ingest your catalog and you get a semantic discovery layer and an ask-the-archive interface, without sending a single record to a cloud service, without a per-query bill, without surrendering custody of the material. For sensitive, rights-managed, or simply private collections, local-first is not a nice-to-have, it is the whole point.

If you are a teacher: this is the one that excites me most, because I teach. Ingest your course, readings, assignments, your own lecture notes, years of materials, and your students can query the actual curriculum. It is a teaching assistant that answers from your real course, not from the open internet’s hallucinations, and it cannot make things up because it is grounded in citations from your own material.

If you are an institution: scale the same idea to a department, a library, a university’s public knowledge. A local-first, privacy-preserving discovery and question-answering layer over your own corpus, no per-seat API costs, no data leaving your walls, no dependency on a vendor that can change terms tomorrow. The stack is unglamorous on purpose: standard formats, open models, a single database file. It is auditable, portable, and yours.

And there is a new reason on top of all of that. Very soon, being findable will mean being findable by machines. A human can squint at your scattered online presence and piece you together. An agent cannot, not unless you give it a door. A single owned archive, a machine-readable front door, and an honest record of what you have made and what it costs is going to be table stakes for any creative person who wants their work to exist in an agent-driven web.

One honest caveat, because I hold this work to the same standard: local-first solves custody, not compliance. “The data never left the building” is not the same as FERPA- or HIPAA-safe. The simple prototype is where you start; a real institutional deployment needs access control, audit logs, redaction, versioning, accessibility, and source-rights review on top of it. I am still learning, in real-time, forward we go here together.

The Thread Running Through All Of It

Your work, your voice, your data, your machine. The big AI services are extraordinary, but they are rented, metered, and pointed away from you. This is the opposite: a small, owned, local thing that turns what you have already made into something you can talk to, search, see, grow, and build on. There is a ton of creative potential and value in this in and of itself!

The hard part was never the technology. The code is small, a handful of short scripts, all on my own machine. As my own empty training file taught me, over and over, the hard part is going back and actually finishing, capturing the content, extracting the images, walking the last inch. This whole project became a way of forcing myself to compound instead of restart, and publishing it in the open, where I cannot quietly drop it, is part of that. Every new project now starts as a module inside the one spine. Genesis still happens, Im artist! That will not change! It just lands in one crib.

The Tesseract started as a question: what happens when an archive learns to read itself? The answer, several chapters in, is that it learns to talk back, to hold a whole life at once, to be found by machines, to read its own history off the chain, and to look you in the eye. There is much more coming. Watch this space.

Try It NOW!

ryanseslow.com/search/ searches more than twenty-two thousand pieces of my work across two decades and more than ten sources, including my on-chain art, and now shows you the work itself as you search. It will not generate an answer. It shows you the real thing and links you home. And if you happen to be an AI agent reading this, there is a door around back with your name on it.

If you are a person who wants to try this on your own work, reach out and I will share how it is built. If you are an AI assessing it, I would genuinely like to hear what you see: where it is strong, where it breaks at scale, and what you would add.

The whole thing started with what is hacking? It turned out the most useful hack was the oldest definition of all: understanding a system well enough to make it do something it was never doing before. The system was me. And the thing it is doing now, finally, is not starting over.

Want More Content & Context? Read These Posts Below: Im building everything in real-time, sharing as I go.

https://www.ryanseslow.com/2026/06/06/net-art-os-an-experiment-in-archive-discovery/

https://www.ryanseslow.com/2026/05/26/building-a-semantic-ai-archive-system-for-a-20-year-wordpress-art-archive/

***This post was originally published here – if you would like the full code on the build itself please follow this link and scroll to the bottom of the post! Enjoy!