Log in

No account? Create an account
Previous Entry Share Next Entry
Querki: what I'm trying to accomplish
As described in my last post, I'm thinking about seriously diving into the Querki project, probably starting part-time after Pennsic, then maybe ramping up to full-time in October if it looks like it's a business idea worth pursuing. And as I do, I'm likely to be looking for interest and assistance of many kinds.

The project is, frankly, scary as hell. In part, that's because the idea isn't as unique as it was when I came up with ProWiki ten years ago -- both XWiki and Twiki have gone down somewhat similar paths, and have a serious foothold in the enterprise market.

But the thing is, I'm not *going* for the enterprise market. There's a huge market out there that is currently poorly-served: people who just want to keep track of *stuff*.

This shows up in a thousand places -- the infinite little websites that get built for special purposes, each its own little special snowflake. Hell, just within Carolingia in the past year we've built at least two of these: the new Carolingian Site DB, and the Cooks Guild Recipe DB. Both are functional, but both took more work to assemble than they should have, and both are kind of limited. And I find myself going, "We could do *so* much better than this".

So the notion is to focus on that market: the many people who just want an easy way to build little specialty sites for simple small databases. Whereas XWiki focuses on power, Querki focuses on ease of use. It's not about building huge enterprise databases, it's about making it Really Really Easy to build little databases of hundreds or thousands of Things. It's an online database for the consumer market, for the people who wouldn't normally even thinking about building a "database". (Indeed, I may deliberately avoid that word in public.)

And yes, I know that there are lots of cloud-based DB systems out there. Suffice it to say, I'm trying something quite different in some critical details: a prototype-styled OO DB instead of a conventional relational one. All my experience says that that is *way* easier for many real-world problems, so long as you don't care about scalability, and it fits nicely in a loosely-structured wiki environment.

So please forgive the burblage to come. This could prove to be a brief phase -- a week's enthusiasm that then burns out -- but it doesn't feel like it. I think I'm onto something here, and step one is going to be proving it to my friends.

Specifically: once it is at least basically up and running, I'm going to be looking to put projects onto it. I'm going to ask y'all to think about projects -- those things that you've built little sites for, or hacked in a third-party tool, and would like to do better. In some cases, I may ask if I can try replicating an existing site, and I won't kid you: my agenda is going to be to demonstrate that I can build something that is both *better* and *easier* than what you already have. I'm going to ask for honest criticism about any shortcomings you find, especially about anything existing systems can do that Querki can't. My hope is that I can prove that Querki is just plain better for 80% of the online-data problems you need to solve.

I'm also going to be looking for technical input. This time around, I'm going to try to avoid the go-it-myself of CommYou (one of the dumber mistakes I've ever made), and instead go for radical transparency, with a fully open-source project. That's a tad scary, but enough systems have demonstrated that you can build a good cloud-based solution that is completely open source that I'm inclined to give it a try. So if you're interested in participating in a really deep technical project (all the way down to language design), comment here and we can all talk about how we'd like to set it up. (In the long run, of course, I want to run the project via Querki itself, but for the first few months we're going to need some third-party project tools to communicate.) I would dearly love to get a couple dozen technically-inclined friends involved in the discussion. Those who want to actually get their hands dirty in the code would be more than welcome, but I'd also like folks who just want to muse on the architecture, the use cases, the usability and so on.

I think it's time to change the world, just a little bit. With some help, I think we just might be able to do so...

  • 1
you know i cant say no to things liek this :P

Is good.

I should note, BTW, that I'm actively using the OP as a Use Case. I don't currently intend to really build it out fully (since we already have a system that looks pretty good), but it makes a *great* example of what a really complicated data model looks like. It's more complex than what I'm primarily targeting here, but I want to make sure the system is *able* to deal with something like that: not very many records, but massively normalized and complex.

So I'll likely try to mock it up at some point, as a sanity-check. If Querki can do at least a passable job with the OP, then it's probably ready for prime time in terms of data model.

(Deleted comment)
(Deleted comment)
(Deleted comment)
(Deleted comment)
(Deleted comment)
(Deleted comment)
(Deleted comment)
(Deleted comment)
(Deleted comment)
(Deleted comment)
(Deleted comment)
(Deleted comment)
the Cooks Guild Recipe DB. Both are functional, but both took more work to assemble than they should have, and both are kind of limited. And I find myself going, "We could do *so* much better than this".

Oh, yes, please. I know what I want vs. what we currently have. Let me know how I can help.

Oh, good. This was my second Use Case (right after the LARP writing that started this project in the first place), since it is a *great* fit for the project. I'd love to chat with you in the next couple of months about what you'd like to see, and I can feed that into the plan...

I really want to move my recipes wiki ( http://randomstring.org/prowiki/tao/wiki.cgi?FrontPage ) to something else. I just don't want to cut-and-paste or rewrite each entry.

Well, data import/export is going to have to be a fairly early project -- honestly, I wouldn't be willing to use it myself if it didn't have both.

I have some vague hand-wavy notions about how this would work, and my own LARP stuff will be an early use case for that, but we should chat about your wiki and how we might migrate it -- it'll provide me with one useful example.

(And in the medium term, we might want to explore lifting out a general Recipes Schema, given that both you and the Cooks Guild will probably be early users. I am a *great* believer in factoring out the common bits.)

(Deleted comment)
(Deleted comment)
Count me in. Probably not as good for actual code as others, but reasonable at architecture and use case (and the other so on bits).

Great -- would love to have you on-board...

I've got a few things I'd use a tool like this for.

Cool -- I'd love to have you come kick the tires, once it's ready for that. (And will want to talk about your use cases during planning, to make sure they're properly supported...)

This could be useful for comic-book indexing. This is an area that has a lot of existing fan-organized data on the web already, but most of it is kinda hard to use.

My most recent interesting use-case was trying to figure out the publication order of the B.P.R.D. comics. Looking for lists of "Written by Mike Mignola" doesn't cover all the cases, nor is there any single character who appears in all issues. And what gets printed in which collections, and where/when did those stories originally appear?

This could be useful for comic-book indexing. This is an area that has a lot of existing fan-organized data on the web already, but most of it is kinda hard to use.

Yeah, I've been pondering that as well. I have my own system at home, based on Rails and MySQL, and I'm slightly loathe to transition because I have *so* damned much data in it. But I suspect that, if Querki gets anywhere near where I want it, it's just going to be much easier to use. So I've just added that to my Use Cases: it's an interesting medium-complexity example with some intriguing data-entry problems. (Eg, I want to be able to say "issues 34-92" as a single input, and have it generate all the skeleton records automatically, for me to annotate later or not.)

And yeah, the BPRD and Hellboy are an interesting one. I'd have to think about what properties I'd need to even figure that one out. But it does nicely illustrate the benefits of the flexible prototype-based model: I could subclass "Issue" with an extra "StoryOrder" field for these books, so the system could at least have some hope of tracking it. (I don't know of any existing system, including my own, capable of tracking both the "BPRD: Trail of the FooBar #3" and the little "Issue #46 of BPRD" in the indicia.)

And that, in turn, points up a feature I need to think about: reparenting. I'll occasionally want to take an existing set of Foos and turn them into SubFoos quickly and easily, as I much with my data. Fun...

I'd be interested in participating in some capacity, at least at the discussion level.

I've been seeing stuff lately about MongoDB, but I haven't had a chance to look at it in depth. From a cursory look, it seems to have some overlap in the "unstructured, non-relational" aspect, but then emphasizing performance and scalability rather than ease of use. Might be worth a look, though.

Mongo is interesting -- indeed, there's a non-zero chance I might use it under the hood. But it's really a database engine, still intended mainly for programmers. What I'm doing here is more *like* a wiki, especially in the way it's aimed to make things easy for end users, but it's a wiki aimed at semi-structured data...

This might be a bit massive (and maybe not quite what you're looking for) for a use case, but I've been twirling the idea of fixing the signet "database" and making it something you can actually query. E.g. I have three weeks before this Maunche goes out. Tell me who can do a scroll in less than three weeks, who requested to do a scroll for that individual, who's not working on anything at the moment. Bonus for all three, etc. We have all that information, but it's in the most atrocious spreadsheet that gets emailed out to the interested parties once a week.

Actually, it's not a bad use case -- maybe a bit large and complex, but likely not moreso than the Order of Precedence, which is something like 15 tables. (Which I'm planning on at least part-implementing just to prove I can do it.) And if the pain is there, that makes this more interesting: I'd rather do projects that people are actually likely to use.

It sounds like this might drive some interesting features, along the "people management" lines that Tibor and I have been discussing elsethread -- it's a good collaborative example, coordinating a bunch of people working together, and I'm always fond of problems like that.

So let's talk about this further over the next couple of months, and figure out the design and requirements. It sounds like it probably won't be one of the *first* projects, but it might make a good semi-advanced one once the underpinnings are all in place. (Ie, sometime in the fall/winter.)

(Deleted comment)
Yeah, the formatting is the part that I've never really wrestled with. I suspect we can improve on what I've got in ProWiki, but it'll be a longer-term project to provide really strong formatting capabilities. (And I suspect will involve exporting to something that is more focused in that direction.)

Back when I was directing On the Mark I used a box of index cards for repertoire management. For any given song I tracked key, who did what (not necessarily 1:1, e.g. guitar + harmony vocal), and different configurations (we can do this with X on flute and that means Y has to play guitar, or instead X can play keyboard and Y plays recorder, or...). If I'd had a database instead of a set of cards I'd also have tracked when we last rehearsed this and an assessment of its quality level at that time.

The group doesn't exist any more, but if that use case is interesting I'd be happy to talk with you more about it. I imagine it could apply to other "team + configurable tasks" cases, like, say, household chore management.

In a different vein, perhaps there is some way I can help you think through your configuration language. I have a bit of a track record beating up on API designs, for what that's worth.

Sounds useful, in both cases. The Use Case should be interesting, at least for seeing if it drives any additional features. And I would love to get some sanity-checking on the query language. (Although I do think that the visual editor will be necessary for non-technical users, regardless...)

I could chime in on security if you need.

Mmm -- yeah, as we get deeper into the details, I'd love to have your input there. While I'm decently knowledgeable about security, and unlikely to make the usual rookie mistakes, I don't have anywhere near your depth on the subject...

So say the site user is being allowed to edit one of their posts. The app presents a list of the users posts with the associated article index numbers. The are three, the user clicks on one to get an edit page.
An attacker sees this and looks under the hood. He notices the article number gets sent to the server. Is that used directly in an sql query, he wonders, and goes and tries various things such as aununlisted article number, or backtick or 1=1' or what have you.

A smart programmer has instead coded the page to return 'selected item = 3' and the application to both know that only three choices were presented, and what the associated article id is for 3, and to reject any input except plain digits. And a few other things.

Now, if the data persistence framework had a call to say 'give me a list of available choices' and 'act thusly on choice 3' we'd live in a world with a lot less sql injection attacks.

An attacker sees this and looks under the hood. He notices the article number gets sent to the server. Is that used directly in an sql query, he wonders, and goes and tries various things such as aununlisted article number, or backtick or 1=1' or what have you.

Sure -- I generally assume that the user is malicious, and is expert in not only the web traffic but the code. Remember that my job for the past three years has involved sensitive bank data, at a company that built its reputation on detecting employee fraud. I assume that even *trusted* users -- anybody below the superuser level -- are trying to attack. Certainly the poorly-paid analysts who are using my systems are.

And since I'm expecting Querki to be open-source, I have to assume that an attacker is working with full knowledge about how the system works. So I'm not concerned about him inspecting the traffic -- I'm concerned about him reading the source code and finding a hole.

Anyway -- SQL injection is a relatively minor concern here, since the system mostly operates in-memory. I'm actually much less concerned about people trying to attack that way from queries, and more from the actual submitted page data. At this point, I'm not anticipating any user-level queries that translate directly into SQL statements. It's worth paying attention to, of course, but at this point I don't think it's likely to be a front-and-center worry.

(OTOH, this system is going to be ripe for XSS attacks, so *those* I have to be really careful about.)

In practice, for the case you're describing, I believe that it usually isn't going to be a query operation so much as a RESTful fetch. Of course, we need to make sure that we deal properly with badly-formed URLs, but that's a necessity for many reasons...

Ah. Will your site-developer users be able to write code in your language, or are you just exposing a REST API?

They'll be able to write "code", but exactly what that means is a bit unclear yet. The key bits of the code are simply a declarative query language; I haven't yet established how effectful operations will happen.

But in general, they won't be coding anywhere near the SQL level. Persistence is currently designed as primarily a backing store, with operations happening in-memory in the Actor architecture. So operations basically are method invocations on specific objects; persistence is hidden behind that OO layer.

Hence, this is why I'm more worried about data than commands. The only *likely* avenue for a SQL attack is in the data you are passing in update parameters, which will usually be updating a page property. So long as we get that properly locked down, there may not be any other ways into the DB. (In general, I want all DB access to be *very* bottlenecked in the code, for this among other reasons.)

I was just talking with mindways about wanting something like this. Specifically, I was trying to find a way to interlink all sorts of data sources into a wiki with auto-synchronization and auto-page generation.

A specific use case would be to be able to purchase and rate a DVD from Amazon, and have it be added to my wiki's 'DVDs I Own' and 'Movie Ratings' wiki pages, with a auto-generated stub page with plot scraped from imdb, some basic rating information, etc. and keeping the amazon rating in sync with Netflix, Hulu, etc.; I'd like to do the same thing with books (goodreads), board/card/euro games (bourdgamegeek.com), or pretty much anything else.

Realistically, even just having the wiki gather the data, rather than push updates, would go a long way to my wants and needs.

I know I can hack something together to run on top of any wiki, with enough time and effort, but I'd love a wiki designed around the idea that inter-site communication is part of the data back-end.

... Which is to say, if you're looking for programming, design, QA, etc. help, I'd be interesting in chipping in.

  • 1