Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
A verbal command line for the world (swombat.com)
47 points by ThomPete on Oct 26, 2011 | hide | past | favorite | 41 comments


IF Siri can become a CLI for the world, then that is brilliant. But the most powerful feature of the CLI is the pipe - "passing the output of one program to the next, to make magic happen" - and I remain skeptical if something resembling that can be made to work using a voice interface. Basically, I want to be able to say "Find a bus 19 that allows me to be at work at least 15 minutes before my first meeting each day, add it to my calendar and update every day at 9 pm", the equivalent of which is pretty simple in a Unix CLI. But, not having the command line to review and edit, I also need to be able to say this without first carefully considering the exact structure of the sentence.

Being able to control and input data to an increasing number of applications reliably is undoubtedly very good, but considering the many many many failed attempts to replace text entry to describe anything mildly complex suggests that it's not going anywhere.


I agree that piping will be the next big step, and that's not there yet. But it's on the radar, and Siri does seem to be built with interactions in mind, thanks to its context tracking.

So for example, a stepping stone for piping might be:

"Find me a bus 19 that leaves from home to arrive at work before 9am every day for the next week."

"Ok, here are the buses that match your requirements."

"Add those to my calendar."

"Ok, I've added them."

"Also remember to perform this task again every Sunday evening."

"Ok, I'll remember."

That's not exactly what you suggested, but it's close enough and not that far from what Siri is already capable of offering in theory, if it was plugged into the right services.


But, not having the command line to review and edit, I also need to be able to say this without first carefully considering the exact structure of the sentence.

That's no doubt difficult, but you just seem to be implying it can't be done, without giving any reasons why.

Being able to control and input data to an increasing number of applications reliably is undoubtedly very good, but considering the many many many failed attempts to replace text entry to describe anything mildly complex suggests that it's not going anywhere.

No it doesn't. Simply that a lot of people have tried something and failed to achieve it doesn't mean anything. Lots of people failed to make flying machines, until someone did.

You may be able to give particular reasons why it is difficult and why you think it not likely to be achieved, but the fact, by itself, that no one has been able to do it so far, doesn't demonstrate anything.


"Simply that a lot of people have tried something and failed to achieve it doesn't mean anything."

Yes it does. It doesn't prove anything, but it does mean something.

I'd point out that we've surrounded command lines with a lot of accoutrement over the years, like history, conditionals, and various ways of outputting things such that we can debug our system. The utility of the command line would plunge if we didn't have any of those, and it is completely unclear how to add those to Siri without turning it into simply another programming environment, a voice-activated interface to Yahoo Pipes.

I'd submit that there is a certain irreducible complexity in the command line, and despite how fuzzy and wonderful it may make you feel when using it, wrapping a voice interface around this complexity makes it worse, not better. Or we'd already be doing it. If this is trying to do the mathematically impossible task of reducing a task's complexity below the number of bits it actually has, then yes, it is impossible, too.


"It doesn't prove anything, but it does mean something."

No it doesn't. The history of science and technology is filled with things that people couldn't achieve, that were considered impossible, and then were done.

There is no way you can simply look at the fact that it hasn't been done and is considered hard and conclude anything about whether it can be done or not.

To hopefully avoid a potential misunderstanding: you can give specific reasons why X seems hard to do and not likely to be achieved, and these can mean something. But it's these reasons that mean something, not the fact that it hasn't been achieved so far.


I'd point out that we've surrounded command lines with a lot of accoutrement over the years

I would counter your claims with the simple fact that the earliest command-line interfaces had none of these. A process change only has to offer an incremental improvement over what came before to drive adoption. Those of us who used the earliest shells and the job-control languages that preferred them know that the bar for usefulness can be set quite low and the tool will still be used and improved upon.


"The utility of the command line would plunge if we didn't have any of those"

If I'd meant "been eliminated", I would have said that. Nevertheless, I cite the "your mom" argument. "People" are terrified of the nice command line, and we think that they're going to switch to a vastly, vastly more user hostile one, no matter how friendly and chipper it may seem on first glance? If "people" wanted command lines that much they'd already be using them.


> The voice recognition is ok. The natural language processing is passable. But all those are things that can be improved...

These AI fields have been notoriously slow to improve. Apple is great at applying new technology, e.g. the integration mentioned. They may be able to adapt somewhat with all the data gathered (pronunciation, word-usage patterns, typical requests), but I wouldn't expect significant improvements.


I expect them to improve because there are obvious, straightforward, practical improvements they can make already. As an obvious example, the NLP is currently limited in terms of the variability of ways to add things to your shopping list. Adding more variants of that would be easy.

The huge amounts of data they're gathering about what people actually use this for is indeed a huge advantage. With that kind of data, they can build a centralised dictionary of common patterns for asking just about anything. Because of Siri's practical orientation, there is no need to build a "generic NLP engine" (which is what you're, quite rightly, suggesting is slow to improve). They merely need to improve the effectiveness of the NLP engine they already have, by making it understand more common patterns. That's achievable through expert-system-like functionality that's existed for 30 years.

In terms of the voice recognition, that has been progressing steadily for 30 years too, and I imagine Siri will benefit from improvements to the technology base, as others will.


There's a tension: the great advantage Siri has is a constrained domain (as you note); adding patterns, services and "ways to add things" etc expands this domain. (And adding new patterns, if they amount to grammar rules, can increase the domain dramatically.) It's a tradeoff between expressiveness and error.

My information is that the progress in voice recognition has been terribly slow, and in the last 10 years or so has been made mostly by limiting domains.


"In terms of the voice recognition, that has been progressing steadily for 30 years too, and I imagine Siri will benefit from improvements to the technology base, as others will"

Will it progress to a point where ambiguity is reduced to nil? The strength of the command line abstraction is its completely unambiguous interface - do what I say.

The do what I mean interface of Siri could be limited to actions with insignificant consequences - the human brain incorporates the best speech recognition available and still makes mistakes. Any speech AI performing significant actions would need to outperform the brain.

edit: That is not to suggest AI speech recognition outperforming the brain is not possible. There are probably metrics and methods for disambiguating common sources of confusion which could be performed in the blink of an eye rather than the minutes, hours or weeks later you find yourself thinking "Oh, THAT'S what he said!"


The do what I mean interface of Siri could be limited to actions with insignificant consequences - the human brain incorporates the best speech recognition available and still makes mistakes. Any speech AI performing significant actions would need to outperform the brain.

Or actions which can be undone.

I'd be very concerned if the US army decided to use Siri to control its nuclear missiles. Less so if someone uses Siri to change their thermostat or query their fridge contents.


"I'd be very concerned if the US army decided to use Siri to control its nuclear missiles. Less so if someone uses Siri to change their thermostat or query their fridge contents"

Indeed... A command line for non-critical parts of the world.

For launching missiles? Nothing less than a bash script! ;)


God willing we will someday have a computer which, when you tell it "Launch the nuclear missiles," is smart enough to say "No."


I just wanted to add that Apple can do magic - but not that kind of magic!

Apple's magic is what we should all do:

  (1). pick a market subset (a niche) that will be really delighted
       by our product, both because of what they value (due to their
       psychology and the problems of their circumstances), and aspects
       of their circumstances that we can use to really make our product
       rock.
  (2). make the product do that.
This is "marketing"; not in the sense of advertising what we have, nor in the sense of choosing to make what people want; but in the sense of choosing the people who will want what we want to make (though truth be told, Apple also excels at advertising, spending a fortune, winning awards etc).

This is all awesome, and to be celebrated and emulated. But Apple doesn't do technology research - they beg, buy or borrow technology, and then commercialize it (e.g. desktop, touch, siri).

Technology research is magic of a different order... AI research the darkest of all.


Agree 100%. Voice recognition doesn't seem much better than Dragon Dictate was in the 90s. I'm sure it's improving slowly, but it's still a far cry from natural or efficient. I see no indication that this will ever improve drastically but maybe I just lack vision.


The only time I'd want a spoken command line is when I can't type; that is basically only while driving (or while dodging being shot/bombed). Admittedly that's a big use case (the driving part), so siri makes some sense there. Otherwise I'll stick to typing -- I prefer textual communication even with another human in the same room, so maybe I'm an outlier.


I've been finding myself using Siri for texting even while walking or doing less dangerous activities like driving. I find that I can "type" much faster with my voice, and Siri is damn accurate.


The problem is that Siri has the drawbacks of the CLI, without many of its features. Don't get me wrong: as I said previously, Siri gets me excited like no other iPhone feature does.

But the reality is that, like CLIs, Siri has an unforgivably precise syntax, but unlike CLIs it lacks contextual help (tab completion), easy ways to string commands together and encapsulate them as one (pipes and scripts), job control, etc.

Right now, Siri is a poor voice based CLI, yes. But I think in the future -when natural language processing improves- these types of applications will be less and less of a CLI, becoming their own UI paradigm.


If Siri is really the equivalent of a CLI for phones, then it's destined to be a niche technology.

The main reason why most users don't use a command line isn't, as the article says, because "most people have been satisfied enough with that [the GUI], so they haven't ever bothered learning to use a command line". It's because command lines are only better for very advanced users and/or people with an exceptional memory.

Fact is, with a command line you don't "talk" to a computer. You actually program it, with a kind of interpreted language and a plethora of libraries (executables). This means that you actually need to be VERY exact in what you write, you have to remember the "magic incantation" in every detail, or it will not work - or, likely enough, disaster will follow.

Vocal commands can be useful for the mass for very simple tasks, if the incantation is very easy to remember. To be able to actually talk to computers, on the other hand, we need actual AI, of the Turing test kind. We aren't anywhere near that for now.


The main reason why most users don't use a command line isn't, as the article says, because "most people have been satisfied enough with that [the GUI], so they haven't ever bothered learning to use a command line".

They've been satisfied enough with the GUI because the alternative, the command line, was several orders of magnitude harder to learn than the GUI. Siri is comparable, or possibly easier, in terms of learning difficulty.

Fact is, with a command line you don't "talk" to a computer. You actually program it

But with Siri, you talk to it. That's where the NLP piece comes in. "Wake me up at 7am" works, so does "Can you set an alarm for 7am?", etc. The syntax is deliberately loose and not programming-language-like. So you don't need to be exact at all.


That's precisely the reason why the CLI parallel falls over, in my opinion.

A CLI is an interface that lets the user communicate using the the computer's language.

Siri is an interaface that [attempts] to let the user communicate using their own language.

It's a fundamental difference I think. Indeed the only thing that Siri and CLIs have is that they are non-graphical.


It's a fundamental difference I think. Indeed the only thing that Siri and CLIs have is that they are non-graphical.

This is actually the similarity that matters. In a GUI interface, you have to page through "commands" to find what you are looking for. In CLIs and Siri, you simply go straight to the command by typing or speaking.

Programmers love CLIs because, after the initial cost of learning the commands, you can do things much faster than a GUI. But, speed of executing commands is less important on the desktop than on a phone, which is why I suspect most normal users don't bother to learn a CLIs.

But when you're on a phone with trying to tell your boss you're running late while coming down a flight of stairs, Siri is going to come in real handy.


That's good for usability, but then it's not like the command line ;) My point is that to have both things you would really need an "expert system"


It may not be exactly "like" a command line (whatever that means), but it will effectively function like one, as far as most people are concerned. To a large extent, you could argue that the command line is already an expert system - after all, it only understands a few specific commands and needs very exact syntax to work. Siri is already one step ahead of that, by having looser syntax.


This probably explains why we have very different points of view: My idea of an expert system isn't at all similar to a Unix shell. No problem though :)


I loved this quote

>("Add buy ketchup to my shopping reminders" failed, but "Add ketchup to my shopping list" worked)

This is exactly the way I would expect it to work. Would you say to a personal assistant: "Add buy ketchup to my shopping reminders"? It isn't natural. The second phrase is much more natural and was understood as one would expect.


The funny thing is that he is clearly thinking like a computer programmer in his first sentence. I wonder if naive users would generally have better results on first contact with Siri, compared to technical users who try to over-structure their commands?


You're very possibly right. That said, the "Add ketchup to my shopping list" version only sounds good for shopping lists.

If I wanted a list of work reminders, I might have: "Add send contract to john to my work list." which would be less natural... also, in my mind, what I was adding to my reminders was the task to buy ketchup, not "ketchup" by itself.

Anyway, I'm sure they'll add more supported syntaxes, and in parallel people will learn and adapt to the syntaxes that are actually supported.


"When is the next bus 19 coming?"

This might be where the problems will start to surface. I suspect at the moment Siri nows that some keywords likely refer to the calendar, and acts accordingly. If you add bus schedules, you create a big problem - "when" will make Siri search your calendar, which will not be the right database for bus schedules.

Then again this might be easily fixed by simply searching all connected services for "bus 19", but this too might reach a limit of feasibility with a growing number of databases.

Maybe Siri can learn that "bus" most likely refers to the public transport database. It's probably not completely stupid, but it still seems unclear how much it can really do.

Of course having such interfaces in the future would be cool, it is just not clear that Siri is the first step in that direction.


I wonder how third party apps would interact with Siri.

"What is the music that is playing now?" Shazam and SoundHound would want to be the service provider for this kind of query. How Siri would select which app to use?


Same as android does when you trigger a task that more than one app can handle: it asks you (and you can, off course, tell it to remember you choice so it doesn't bother you again in the future).


All this excitement about Siri makes me wonder, would people really talk to their phones in public places? Sometimes social barriers are much harder to overcome than technological ones.


Especially if I install loudspeakers in my classroom that whisper 'rm -R ~/' over and over like a mantra...

Back on topic, some years ago if you saw someone walking along having an animated conversation with no visible participant, you would have assumed a form of mental illness. Now, you just assume they are using their hands free set. I think it will catch on.


I think it will, and Apple have made a brilliant design choice in allowing Siri to fire up when you bring the phone to your ear (and you're not in a phone call). People are already comfortable talking on the phone in public, and it's not immediately obvious to onlookers that the "person" being talked to is the phone itself.


I have actually seen people talking on the phone to people just holding the phone in front of them, so there's already precedent for that as well. I don't think social barriers will last long.


At least in my area, people already talk to their phones in public. You just have to think of Siri as the "person" on the other end of the line and it isn't strange at all.


I suspect that it will become the norm if it is popular enough.


Siri on Mac desktops would be great. Organize your work and surf the web while eating your lunch, hands off the keyboard.


Talking with your mouth full?


For some reason, people have missed out on one of the biggest avenues of potential in R&D originated environments like Self and Smalltalk -- every object basically has a CLI.

Yes, I'm serious. If you poke around in the Smalltalk environments, you will find not only that everything is an object, but also that every single object everything from complex libraries to the humble character has something like a CLI that you can write little scripts against.

Squeak Smalltalk not only has something like a CLI for directly manipulating objects on a low level, there is also a framework called Morphic which lets you directly manipulate every individual object with a little GUI. It's as if every object also had its own lightweight IDE attached to it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: