I don't think `/{year}/{slug}.html` is what people mean when they talk about "ug...

simonw · on Feb 13, 2024

When we implemented URLs for Django our nemesis was Vignette, a popular CMS at the time (~2003) which frequently included commas in long weird URLs.

It's hard to find an example Of one of those now, because the kind of sites that tolerate weird comma-infested URLs in 2003 aren't the kind of sites that meticulously maintain those URLs in working order for 20+ years!

ahmedfromtunis · on Feb 13, 2024

Wow, when I woke up this morning I had no clue that THE Simon Wilson would be replying to my comment!

Right now, I’m knee-deep in coding my Django app. I totally dig how the framework kinda "forces" you to write neat URLs ― it’s one of my favorite things about it. This might seem silly, but I actually take immense pride in crafting simple, elegant URLs, even if the majority of the users won't even notice it.

As for the comma infested URLs, the website of one of the major news outlets in my country manifests such behavior. It always puzzled me as to what tech stack they were using. I'm not sayin they still use it today (as Vignette went belly up in 2009), but this can be a heritage from those days.

I really enjoy using Django since I first got to know it back in the 2.2 days, I’ve used nothing else for my projects, big or small. I’m head over heels for every bit of it and having recommending it for years to my friends!

Big thanks to you, Simon, for helping create this awesome piece of tech!

throwaway062o · on Feb 13, 2024

My recollection of the "old days" may be a bit hazy, but I think comma delimited parameters were a work around for frameworks that did not support multiple values (or users not knowing how to handle it)

Example of a "correct" url

?value=A&value=B&value=C

Complete frameworks would have a method that returned the values as a list. Some like PHP required ugly work arounds where you had to name the parameter using the array syntax: value[]=A&value[]=B&value[]=C

Even if the framework supported multi-values, many preferred the shorter version: value=A,B,C and split the values in code instead

simonw · on Feb 13, 2024

Django actually has a special mechanism for dealing with ?value=&A&value=B

    values = request.GET.getlist("value")
    # values is now ["A", "B"]

We built it that way because we had seen the weird bugs that cropped up with the PHP solution, where passing ?q[]=x to a PHP application that expected ?q=x could result in an array passed to code that expected a string.

AlienRobot · on Feb 13, 2024

I don't know if it's something from the old days or not, but iirc URLs have a semicolon separator (;) that would go before the ?. I have never seen it being used. I'm betting it's even less support than commas!

toast0 · on Feb 14, 2024

My understanding is you can use ; instead of both ? And &

Handy if anything actually supported it, because then you could plop parameters on the end of urls without looking to see if there was already a ?

I'm sure it's used somewhere, but I can only remember it being used by Yahoo Link Tracking ;_ylt=y64encodedgunk

recursive · on Feb 14, 2024

Semicolon (;) has no special meaning in a URL. You can ascribe it a meaning in your particular routing, but the spec has nothing to say about it.

https://url.spec.whatwg.org/

uranusjr · on Feb 15, 2024

In the OG RFC 2396, each _path segment_ can specify parameters similar to query parameters, but using a semicolon to separate them to the main segment value instead of question mark. This has effects e.g. when calculating relative URLs. This is now obsolete, but many URL-parsing libraries have an API for that for compatibility.

felixfbecker · on Feb 14, 2024

The issue is that commas are technically not allowed in the search params without being percent encoded

rvnx · on Feb 13, 2024

value[]=A&value[]=B&value[]=C is an idea that apparently came out of PHP in ~2000, not of standards.

So, people who learnt programming in 2000, until ~2010 it's quite normal to see the commas as delimiter of multiple parameters.

recursive · on Feb 13, 2024

As far as I know, the "standard" way is value=A&value=B&value=C. This is what comes out of a plain form submission.

cxr · on Feb 13, 2024

It's Willison.

slig · on Feb 13, 2024

Here's one example: https://g1.globo.com/Noticias/Ciencia/0,,MUL347115-5603,00-P...

brlewis · on Feb 13, 2024

From a more popular site:

https://en.m.wikipedia.org/wiki/Girl,_Interrupted_(film)

Symbiote · on Feb 13, 2024

That's not a "weird, comma-infested URL". That's the title of the page.

The only Cool URI failure on Wikipedia is the ".m" which is added to the mobile view.

adamrezich · on Feb 13, 2024

one wonders why this is still the case after all these years...

cqqxo4zV46cp · on Feb 13, 2024

Because Jimmy Wales :didn’t get enough money :(

simonw · on Feb 13, 2024

Wikipedia gets a pass from me because the comma is part of the name of the actual film.

brlewis · on Feb 13, 2024

I may have misunderstood your initial comment. Was Vignette a nemesis because letting people migrate to Django from it while preserving URLs involved commas, or was it just a nemesis in general and you're pointing out a flaw in how they did URLs? If the latter then yeah there's no point in me mentioning a mainstream use of commas in URLs.

simonw · on Feb 13, 2024

We just thought that having URLs with obfuscated IDs and multiple commas in them looked really ugly.

kqr · on Feb 13, 2024

I feel like this is something I've seen a lot of in older ASP-based products also.

elzbardico · on Feb 13, 2024

Django URLs was probably one of the points that made us use it when we finally decided to ditch Vignette around 2007

stordoff · on Feb 13, 2024

I'm reminded of HUDOC, where navigating the site gives you URLS such as:

    https://hudoc.echr.coe.int/#{%22documentcollectionid2%22:[%22GRANDCHAMBER%22,%22CHAMBER%22],%22itemid%22:[%22001-230857%22]}

Fortunately, most pages list a "clean" URL that also works: https://hudoc.echr.coe.int/?i=001-230857

spcebar · on Feb 13, 2024

I think it's a specific reference to one of the tenets of Cool URIs Don't Change, which was that you should drop the file extension from URIs. So, indeed, not that ugly, but also, not cool, according to the good people of the W3C, back in the day.

flgstnd · on Feb 13, 2024

microsoft teams is a good example of ugly urls. it could be a just a couple of letters that are mapped in a backend database but the urls feel like there is a whole javascipt file encoded in there

thih9 · on Feb 13, 2024

The “.html” is a bit ugly though - it exposes the internals, tying the url to something that might change.

It’s not much harder to hide it; E.g. for static files, create a directory and put an index.html there.

gpvos · on Feb 13, 2024

The item itself is an HTML page. That is extremely unlikely to change, very unlike an extension like .php or .asp .

account42 · on Feb 13, 2024

Unlikely to change over what timeframe? Image formats on the web have moved from .gif to .jpeg/.png to .webp to .avif. Video and audio formats have always been a mess. For a time it seemed things would move to .xhtml.

That the page is sent to your browser as HTML is not a defining attribute and could very well depend on HTTP content negotiation.

plagiarist · on Feb 13, 2024

Given the endurance of legacy code, my opinion is that any PHP page is more likely to remain a .php than the HTML remain an .html.

MobiusHorizons · on Feb 13, 2024

I think the point being made is that the contents of the file will be html whether it’s a static file on disk or dynamically generated using php. This may be more obvious when thinking about dynamically generated svg or pdf. Php nodes or python would be implantation details. HTML is the content type, and that is not likely to change.

plagiarist · on Feb 13, 2024

I agree with the extension from that perspective.

Symbiote · on Feb 13, 2024

That might be true now, but was not the case when PHP was fresh, new and exciting.

hot_gril · on Feb 13, 2024

This is an aspirational abstraction. HTML will probably outlast most websites, and those .gifs are probably /foo.gif on every site too. Even if that somehow changes, it won't break the existing URLs. Less confusing to just call it what it is for the time being.

gpvos · on Feb 13, 2024

That is a very formal way of looking at it. Moreover, this is rather simple hypertext, not an image. HTML, or a remarkably similar and compatible descendant of it, is likely to remain in use for centuries.

plagiarist · on Feb 13, 2024

That's an implementation detail that doesn't make sense in the addressing scheme. Like adding "brick house" to the end of every mailing address when the destination is made of bricks.

MrVandemar · on Feb 13, 2024

What about if an mp3 is at the end of a URL? Is that an implementation detail that doesn't make sense? Just take of the .mp3 extension?

wolrah · on Feb 13, 2024

> What about if an mp3 is at the end of a URL? Is that an implementation detail that doesn't make sense? Just take of the .mp3 extension?

Yes, why not? Just because file extensions matter to certain systems doesn't mean they do for others, and nothing about a URL to a file is required to match its DOS/Windows friendly file name.

> GET /<artistname>/<albumname>/<songname>/download HTTP/1.1

> Host: fakemusicstore.example

< HTTP/1.1 200 OK

< Content-Type: audio/mpeg

< Content-Disposition: attachment; filename="<artistname> - <songname>.mp3"

hot_gril · on Feb 13, 2024

It's nice in browser history to see foo.mp3 and know it's an mp3.

wolrah · on Feb 19, 2024

> It's nice in browser history to see foo.mp3 and know it's an mp3.

TBH I agree, I personally do my best to ensure the extension in the URL matches the document type on sites I run, but my point was that it's not in any way required and it's actually somewhat common for it to not be the case where the person I was replying to seemed to think it mattered.

MrVandemar · on Feb 14, 2024

Not seeing any advantage, and serious disadvantage.

The hiding of the index file name in a folder is kind of a quirk, it's automatic behavior that is being taken advantage of to make "nice" URIs, but it's actually hiding useful information.

File extensions, while a DOS/Windows thing, I've found to be an extremely useful convention on unix, and linux, and just about any other system I've used (though I can't remember what we did on VAX box we used to use in the 90s).

plagiarist · on Feb 13, 2024

If the extension is there because that's what the file is on the server, that's wrong. If the extension is there because the endpoint will return that type of content, I'm fine with it.

apitman · on Feb 13, 2024

What if they wanted to start offering raw markdown with content negotiation instead of HTML?

KptMarchewa · on Feb 14, 2024

Serve it as .html endpoint too. It's pure syntax

patmorgan23 · on Feb 13, 2024

Is it though? What if the website owner decides that what to make the page more dynamic and switch to PHP?

Wicher · on Feb 13, 2024

Then the page that PHP outputs is still HTML.

cqqxo4zV46cp · on Feb 13, 2024

Content negotiation!

mcny · on Feb 13, 2024

I agree with you but Facebook disagrees with us.

       https://m.facebook.com/story.php/?id={redacted}&story_fbid={redacted}

tithe · on Feb 13, 2024

+1 though for not having "cgi-bin/".

remram · on Feb 13, 2024

I've put blobs of JSON in a URL before. It was dirty but I thought it was better than having pages with no direct URLs or breaking the browser's history.

u320 · on Feb 13, 2024

And that's before Google analytics throws a bit of its own trash on there as well.