Log in

No account? Create an account
Previous Entry Share Next Entry
Anyone have a good index of javascript-injection hacks in URLs?
(This one is for the programmers out there, and especially for security geeks.)

As I was doing some updates yesterday, it occurred to me that Querki now allows you to name your Things pretty much anything you want. Including "javascript:...do something malicious...". Since we generate relative URLs to pages (and therefore, the URL is basically this name), this is Bad.

I've fixed the obvious hack by the simple expedient of screening out any URLs that begin "javascript:", but I'm guessing that that isn't enough -- that there are other ways to be malicious with a URL.

So I'm looking for suggestions. Take it for granted that Querki allows you to specify URLs, and that those URLs can be *fairly* arbitrary relative URLs, so I can't just whitelist a simple legal syntax -- I probably need to think in terms of blacklisting the badness. Do you know a good comprehensive list of the possible syntaxes that could be used for Javascript injection when placed inside an href? (Better yet, do you know an existing regex pattern to detect them?)

  • 1
(Deleted comment)
Do you want to take special care when referencing something that COULD be suspicious?

Well, I'm not going to spend much time vetting the links -- that's way out of scope. My concern is primarily with closing down any vectors for injecting Javascript into a Querki page. I've closed some already, and know of a few more that need to be dealt with, but I'm not an expert on the subject.

You may want to look at one of my favorite books on the topic

Useful-looking reference -- thanks! I may well Kindle it for the holidays.

(Bobby Tables was part of the design from the outset -- while I'm not a security specialist per se, any architect worth his salt needs to know basics like that. Suffice it to say, all SQL commands are done with prepared statements; so far, they appear to be safe.)

I'm confused why you're talking several times about "relative URLs" here. A URL starting with "javascript:" isn't a valid relative URL, so if you escape things properly to ensure that you're always generating a valid relative URL, you should be fine even if people name their Things maliciously. If a URL starts with an alphanumeric (plus + . and -) string followed by a colon, that's interpreted as a scheme; you can fix that by prepending "./" to the start of your relative URL or (in most cases) URL-escaping the :.

Well, keep in mind that I'm allowing relatively arbitrarily-named Things, and that a reference to another Thing in the Space is handled as a relative URL. (Pretty much like a typical wiki in that respect.)

These sorts of URI nuances aren't an area of expertise for me, hence the question. It sounds like the rule about the syntax of schemes is probably the key for me to incorporate -- thanks!

Having tried it out: yeah, that's a nicely general solution, and seems to work well. Thanks!

Yes, that sounds simple and straightforward, and preserves human-readability. Could somebody use "../" to step up out of the usual hierarchy of URL's? Is that a problem?

It seems like it would work, but I can't see any reason why that would be a problem -- pages *are* allowed to refer to any other public pages, including ones in other Spaces. (Which they could do trivially with an absolute "http://" link anyway.)

The one thing that this *does* raise as a potential problem, though, is that having a dot at the beginning of the name is the signifier of an OID. So I may have to disallow that anyway, to avoid ambiguity...

In general the reason whitelisting is preferred over black is there are so many forms of encoding that it is very difficult to make the blacklist sufficiently comprehensive. For example, you need to know the various unicode expressions of your blacklist entries as well as ascii.

The OWASP ESAPI might be of use to you:

Whee -- quite a lot there. I know some of the cheat sheets, but hadn't come across the API before. Thanks for the pointer -- there's a lot there for me to absorb, but it looks useful...

This costs real money and only supports Java code, but it is cool like whoah.


Yaas. $200/month is outside my budget for the time being (that's a considerable amount more than we're actually paying to *run* the system), but once we have an income stream that isn't actually too terrible...

Not a real security geek, and don't even play one on TV, but... I think any approach based on blacklisting will demand constant updating, and you'll never really have confidence in it. I would lean instead towards an encoding system that turns any user-specified name into a clean sequence of letters and numbers, no matter what characters were in the original name. Or do these URL's have to be human-memorable?

It's possible, but certainly not ideal. If Kihou's suggestion above actually works (and it does certainly make sense), it's very clean and easy, so that's preferable. We'll see if it stands up to the ugly realities of the Internet...

I think the important thing is not so much policing the names of things, but correctly enclosing things. (the same as the issue with little Bobby Tables).

So javascript:whatever isn't an issue as long as when it's included in a link, it's actually href="html:relative_url", not href="relative_url_or_anything", and properly html encoded so it can't break out of the quote jail.

Similarly, it's not an issue in normal text (like the page title) as long as is encoded to the point that that's what appears on the page.

Obviously, you also may want to prohibit a -few- things (specifically, ../ due to the dual meaning), but that's still basically an enclosure issue, not putting something ambiguous into the uri.

  • 1