Home » tools

Tag: tools

Building a Semantic AI Archive System for a 20-Year WordPress Art Archive

AREMES HQ, Brooklyn, May 25th 2026

Today I spent nearly an entire day inside Terminal on my macOS building an experimental semantic archive intelligence system around my lifelong WordPress media library. This was raw terminal-based systems building in collaboration with my friend Sir Claude Code, running locally through Node.js, Ollama, WordPress REST APIs, vector embeddings, semantic clustering systems, and custom archive intelligence tooling.

The entire process unfolded live through hundreds of terminal operations, syntax checks, vector validations, ingestion passes, embedding pipelines, cluster analysis runs, semantic nearest-neighbor generation, static export systems, and archive intelligence reports.

At multiple points the machine appeared less like a search engine and more like an archaeological system excavating hidden structures from twenty years of accumulated visual output. For the last few years I have been thinking deeply about a strange problem that I feel almost nobody talks about, ever.. What happens when a person has been publishing creative work to the internet continuously for over twenty years? I cant even imagine that this much time has even passed.. but it has indeed.

This was not casually posting, not optimizing for trends, not building for algorithms. Actually publishing. Consciously.

Thousands and thousands of artworks, drawings, animations, experiments, scans, paintings, GIFs, photographs, sculpture, prints, collage, prototypes, motion studies, AR/VR tests, 3D models, abstractions, video art, Internet Art, installations, tutorials and fragments of process spread across WordPress, GIPHY, cloud drives, external hard drives, old websites, Tumblr-era internet culture, and multiple generations of digital platforms.

At a certain point the archives become too large for chronology to mean anything. I’m a WordPress guy. I fell in love with it from the day that I learned about it in 2004. I watched from the sidelines for a year and half and then I jumped in, launching my first site in 2006. I don’t believe that WordPress media libraries back in 2026 were designed to function as intelligent cultural systems. They are essentially giant chronological storage buckets. The deeper the archive becomes, the more invisible the work becomes. Search breaks down. SEO becomes increasingly unreliable. Older work disappears beneath newer uploads. Valuable relationships between works are never surfaced.

An archive eventually becomes unreadable. This became daunting. Im a high volume production kind of artist. Im constantly making new things, everyday. I document those things, everyday. Im also Deaf and Hard of Hearing and I learn almost everything from visually reverse engineering things into some tangible example. But again, the archive became an abstraction, a real problem and I wanted to solve it.

This is not “AI art”, “AI content generation”, or another chatbot.

I wanted to know if an AI system could semantically understand a lifelong creative archive? One with just under 10K worth of artwork images, multidisciplinary images..

And more importantly, can it reorganize the archive into something discoverable again?

That became the foundation of what evolved into the AREMES Archive OS.

The Archive

The test archive was my own WordPress media library from ryanseslow.com

The domain and site has been active for well over seventeen years and currently contains approximately:

  • 9,386 publicly accessible media records
  • 20 years of accumulated visual output
  • paintings
  • drawings / illustration
  • sculpture
  • animated GIFs
  • motion graphics / animation / video art
  • photography
  • 3D models (glb/usdz)
  • PDFs / docs / written suchness
  • visual fragments / Internet art
  • experimental AI works
  • spatial computing tests
  • AR/VR prototypes

The important thing is that the archive was real. This was not a clean startup dataset. This was not a curated museum database.
This was not a demo collection. It was a living archive with all the messiness that real creative production accumulates over decades.. a total mess.

The Goal

The goal was to build a local semantic archive engine capable of:

  • ingesting WordPress media libraries
  • generating embeddings
  • performing semantic search
  • clustering related works
  • identifying nearest neighbors
  • surfacing hidden relationships
  • generating archive intelligence reports
  • eventually powering licensing, discovery, and curatorial systems

Importantly, I wanted the system to remain:

  • read-only
  • local-first
  • resumable
  • portable
  • inexpensive
  • API-driven
  • WordPress-native
  • deployable without complex infrastructure

No giant cloud stack. No venture-funded infrastructure (though that would be so nice!) No dependency-heavy AI startup architecture. Just intelligent archival systems built directly on top of existing cultural output.

The Tech Stack

The system was built primarily as a Node.js CLI application.

Core stack:

  • Node.js
  • vanilla JavaScript
  • local JSON pipelines
  • WordPress REST API
  • Ollama
  • nomic-embed-text embeddings
  • cosine similarity vector search
  • static HTML/CSS/JS export architecture
  • Terminal / MacOS
  • Claude Code
  • Chat-gpt

The entire system intentionally avoided:

  • databases
  • vector databases
  • cloud GPU infrastructure
  • SaaS dependencies
  • server-side runtime requirements

Everything operated through local flat-file architecture.

The archive lived primarily inside JSON artifacts:

  • media_archive.json
  • media_embedding_corpus.jsonl
  • media_embeddings.jsonl
  • clusters.json
  • nearest_neighbors.json
  • archive_intelligence.json

The entire system was effectively building a semantic operating layer over a WordPress archive.

The First Breakthrough: Semantic Search Actually Worked

The first major validation happened during vector testing. A semantic query was run against embedded works:

“dimensional graffiti sculpture entity”

The lexical search results were terrible. Only literal keyword matches appeared. But once vector similarity was enabled using real nomic embeddings through Ollama, the system began surfacing semantically related works that shared no direct keyword overlap.

It pulled:

  • bronze/graffiti hybrid forms
  • volumetric character sculptures
  • 3D spatial abstractions
  • hybrid graffiti entities
  • sculptural motion studies

That was the moment the project became real. Excited! (I was already hours in!)

The archive was no longer searching by words. It was searching by meaning.

Embedding the Archive

The next stage involved embedding the archive itself.

The system successfully:

  • paginated through 97 WordPress API pages
  • ingested 9,386 media records
  • regenerated archive corpus files
  • preserved existing embeddings safely
  • resumed embeddings incrementally
  • validated semantic relationships

Initial semantic coverage:

  • 500 embedded works
  • 679 validated vectors across both ryanseslow + aremes
  • 75 semantic clusters
  • 3 large semantic “worlds”
  • multiple emergent series and collections

The system identified:

  • recurring visual motifs
  • medium transitions
  • temporal shifts
  • outlier works
  • semantic neighborhoods
  • 2D → 3D transformation relationships

One particularly fascinating discovery was how often photography re-emerged across decades despite enormous stylistic variation.

The archive was beginning to reveal patterns that were difficult to recognize chronologically.

The Clustering Experiments

One of the strongest moments of the process was the semantic clustering layer. Instead of manually tagging works, the system grouped works through vector proximity and centroid similarity.

Clusters began emerging naturally:

  • sculptural portrait systems
  • 3D spatial hybrids
  • animation worlds
  • museum/digital abstractions
  • collage systems
  • glitch structures
  • graffiti-derived volumetric forms

Some clusters were extremely coherent. Others collapsed into noise. That became one of the most important realizations of the entire experiment:

Semantic similarity does not automatically equal aesthetic coherence..

AI can recognize relationships. But curation still matters.

The Archive Intelligence Layer

The archive-intelligence mode became one of the most ambitious parts of the build.

The system joined:

  • archive metadata
  • embeddings
  • cluster relationships
  • nearest-neighbor systems
  • temporal analysis
  • semantic series
  • cross-medium relationships

It generated:

  • semantic collections
  • inferred exhibition titles
  • neighboring works
  • outlier detection
  • motif analysis
  • “world” structures
  • licensing potentials
  • spatial potentials

At this stage the system was no longer simply indexing media. It was beginning to behave more like a curatorial intelligence layer.

The Most Important Realization

After several hours of successful backend engineering, an important realization appeared:

A CLI has no buyer. (Funny.. and not funny!)

That sentence completely changed the direction of the project. (I had been slurped in, once again, but I love that!)

The engine worked. The semantic systems worked. The archive intelligence worked. But nobody could see it. Everything still lived in terminal windows and JSON files. The project had become an extremely sophisticated invisible machine.

That forced a much bigger question:

What is the actual public-facing surface?

The Export-Site Experiment

The next phase attempted to solve this problem. A static semantic archive site was generated directly from the JSON outputs.

The idea was powerful:

  • semantic discovery
  • related works
  • cluster navigation
  • curated series
  • licensing CTAs
  • semantic search
  • archive worlds

The system generated:

  • index.html
  • style.css
  • app.js

No backend. No runtime AI. No database. No server dependency (perhaps I try to deploy on wordPress Sandbox?) Just a static semantic archive generated from the intelligence layer. Conceptually, this was exactly the correct direction. Visually, however, the system immediately exposed another difficult truth.

The Failure That Mattered Most

The semantic engine worked. The visual orchestration did not!

The archive surface became visually unstable:

  • mixed image ratios
  • broken previews
  • inconsistent media sizes
  • GIF chaos
  • missing thumbnails
  • 3D objects
  • PDFs
  • wildly different eras colliding together

The result was technically impressive but aesthetically uneven. And honestly, that failure may have been the most important discovery of the entire day. Because it clarified something critical:

AI-generated archive systems still require human taste. Semantic relationships are not enough.

Museum-grade experiences require:

  • pacing
  • hierarchy
  • rhythm
  • restraint
  • spatial composition
  • curatorial intelligence
  • emotional sequencing

This was the exact point where the project shifted from backend engineering to art direction..

The Real Opportunity

The deeper realization is that the semantic engine itself is not the product. The archive IS the product.

The engine becomes:

  • the curator
  • the navigator
  • the merchandiser
  • the discovery layer
  • the licensing assistant
  • the relationship engine

That distinction changes everything.

Because suddenly:

  • older works become discoverable again
  • semantic relationships become visible
  • licensing becomes easier
  • collections emerge automatically
  • AI agents can traverse the archive meaningfully
  • archives stop behaving like dead storage systems

This is especially important for artists, museums, photographers, designers, institutions, universities, and cultural archives with decades of accumulated digital material.

Why This Matters Beyond My Own Archive

Most WordPress media libraries are dormant semantic archives. Millions of people have already unknowingly built enormous cultural datasets. The problem is, those archives are largely unreadable.

This experiment suggests another future:

  • semantic museum systems
  • agent-readable archives
  • intelligent licensing discovery
  • AI-assisted curatorial navigation
  • AR/VR semantic galleries
  • spatial archive interfaces
  • archive intelligence layers on top of existing cultural systems

The important thing is that none of this required rebuilding the internet.

The entire system operated on top of:

  • WordPress
  • JSON
  • local embeddings
  • static exports
  • open APIs

The architecture remained surprisingly lightweight.

What Happens Next

At this point the project has proven:

  • semantic ingestion works
  • embeddings work
  • clustering works
  • archive intelligence works
  • export systems work

What remains unresolved is -> visual orchestration..

That is now the real frontier. Not “more AI.” Not larger models. Not more embeddings.

The challenge now is: how to transform semantic intelligence into elegant cultural interfaces. Yes, aesthetics, we like pretty things to look at..

That is a design problem as much as a technical one. Im up for it!

 

Final Thoughts

This entire experiment started with a simple question:

Can an AI system understand a lifelong archive?

The answer appears to be: yes, partially. But another question emerged underneath it: Can intelligence alone create meaning?

The answer to that is much more complicated…

Semantic systems can identify relationships. They can surface hidden structures. They can organize massive archives. They can discover patterns humans overlook. But they still cannot replace curatorial sensitivity, restraint, pacing, and aesthetic judgment.. right? Yet? Hmm..

The machine can understand proximity.. The human still understands significance..

And maybe that balance is still the actual future, I don’t know, but Im excited to find out, and continue to tinker. I don’t want AI replacing archives, but I do want AI making archives visible again.

Forward we go! Onto to part 2!
Thoughts?
screenshot of the peekable web app in action

How Peekable Got Built

It started with a simple question: where are all my Adobe Dimension files?

https://ryanseslow.github.io/peekable/peekable.html

I have been making work digitally for over three decades, across a lot of different software, a lot of different platforms, a lot of different machines. Files accumulate. Formats change. Applications evolve in directions that do not always serve the work you already made. I knew I had a collection of .dn files (adobe dimension) scattered across both Google Drive and icloud drive, artifacts from a period when I was deep into Adobe Dimension, Adobe’s lesser used and even heard of 3D design and compositing tool. I wanted to find them all, put them in one place, and take stock of what was actually there..

The search turned up 68 files. Not just a few, sixty-eight. They were spread across dozens of folders, buried inside project directories, mixed in with everything else I had uploaded over the years. Google Drive found them quickly enough once you knew the right query to run. The harder part was what came next.

I copied all 68 into a single folder called Adobe Dimension Archive, which took a few minutes to organize. Then I looked at the folder and saw exactly nothing. Every single file showed the same generic icon, the kind of gray box with a zipper that cloud storage uses when it has no idea what to do with a format. No previews, no thumbnails, no way to know at a glance which file was which scene, which project, which year. I had organized 68 indistinguishable rectangles.

screenshot of the peekable web app in action

This is a problem that anyone who has worked with creative software long enough will recognize. Google Drive does not preview many creative archive formats at all, and for larger files it frequently cannot generate thumbnails regardless of format. (For the love of all things holy.. WHY google, WHY?!) The result is that large portions of a creative archive become opaque even to the person who made them. The files are there. The work is there. You just cannot see it.

What I did not know at first is that .dn files are ZIP archives. Adobe Dimension packages everything into a container using the same structure as a standard ZIP file, which means you can open one with any tool that reads ZIP format. More importantly, Dimension stores a thumbnail image inside that container. The thumbnail is there, embedded in the file, waiting to be read. The cloud storage platform just never looks for it.

Once that fact was on the table, the solution was obvious. Build something that opens the archive, finds the thumbnail, and shows it. The question was how..

The first instinct was to do this server-side, or with some kind of batch script. But both of those approaches have friction: you need a server, or you need to run code locally, or you need to download files that might be several gigabytes total. The cleaner answer was a browser-based tool that reads files directly on the client side, without uploading anything anywhere. JSZip, a mature JavaScript library, handles ZIP extraction entirely in the browser. The processing stays on your machine. Nothing leaves.

The first version was simple and specific: a drag-and-drop HTML page that accepted .dn files, used JSZip to unpack them, looked for a thumbnail at a handful of known paths inside the archive, and displayed whatever it found. I tested it against a set of 23 files. All 23 came back with thumbnails, which was better than expected, though the path where Dimension actually stores the thumbnail turned out not to be any of the standard locations I had guessed. The fallback logic, searching the entire archive for any image file regardless of path, is what actually retrieved them. Dimension tucks the thumbnail in a non-obvious location, and the only reliable way to find it is to look everywhere.

That result raised a question.

If this works for .dn files because they are secretly ZIPs with embedded images, what else works for the same reason?

The answer is: A LOT. Sketch files are ZIPs. Procreate files are ZIPs. Apple Keynote, Pages, and Numbers files are ZIPs. Microsoft Office formats including docx, xlsx, and pptx are ZIPs. Epub files are ZIPs. A wide range of creative applications use ZIP as a container and store a preview image inside it. The tool did not need to be specific to Adobe Dimension at all. It needed to be general.

So it became Peekable..

The idea is simple enough to say in one sentence: drop in any ZIP-based creative archive and see its embedded thumbnail in your browser. The supported format list covers Dimension alongside Sketch, Procreate, and a number of others, but really the tool works with anything that follows the same basic pattern. If there is an image file anywhere inside the ZIP, Peekable will find it.

The reason this matters beyond convenience is about access to your own work over time. Creative files from formats your cloud storage does not recognize or cannot preview become harder to navigate as collections grow. You can still open the file in the application that made it, but you lose the ability to browse what you have at a glance. Peekable does not replace the original application. What it does is let you see your work without opening anything, anywhere, on any machine, with no account required.

The tool runs entirely in the browser, processes everything locally, and requires no account, no upload, and no installation. It is a single HTML file. You can save it to your desktop and use it offline after the initial page load. It does not know who you are or what you put into it.

Getting it onto GitHub took more attempts than I am going to describe in detail. Authentication was the obstacle, as it often is. The web-based upload interface eventually did what the terminal refused to do, and the repository went up at github.com/ryanseslow/peekable. GitHub Pages turned the same repository into a live URL at https://ryanseslow.github.io/peekable/peekable.html, which means anyone can use it directly in a browser without downloading anything.

The whole thing took less time than I expected to build and more time than it should have to ship, which is a ratio that will be familiar to anyone who has tried to push something to GitHub after midnight. The tool works. I tested it against my own files. Other people who work in creative software and have accumulated archives of formats their cloud storage cannot render should find it useful.

If you have .dn files, Sketch files, Procreate files, or any archive from software whose previews cloud storage ignores, Peekable is at https://ryanseslow.github.io/peekable/peekable.html. Drop the files in and see what you made.

The repository is open, the license is MIT, and the code is a single HTML file.

Take it, change it, make it yours. Have fun and share!