I’m now a Web Developer at Amazon

This past Monday I started my new position as a web developer at Amazon! So far I’ve mostly been training, but it looks like I will be able to do some actual productive work this coming week, which is pretty exciting. The culture shock of going from a very small company to an enormous one is definitely real, but not as bad as I expected. In fact, it reminds me a lot of college — working directly with many of the same people every day, but always meeting new smart people as well.

(Needless to say, nothing I have said or will say on this website is anything other than my own opinion, not Amazon’s…)

Anyway, I am really feeling great about this move!

JavaScript character utility CharFunk 1.1.0 released

CharFunk is a little library I wrote a few years ago to make it easier to do things with Unicode text. I revisited it recently to clean up and improve the code, and add tests and a few features. The API is pretty simple:

  • CharFunk.getDirectionality(ch) – Used to find the directionality of the character
  • CharFunk.getMatches(string,callback) – Returns an array of contiguous matching strings for which the callback returns true, similar to String.match()
  • CharFunk.isAllLettersOrDigits(string) – Returns true if the string argument is composed of all letters and digits
  • CharFunk.isDigit(ch) – Returns true if provided a length 1 string that is a digit
  • CharFunk.isLetter(ch) – Returns true if provided a length 1 string that is a letter
  • CharFunk.isLetterNumber(ch) – Returns true if provided a length 1 string that is in the Unicode “Nl” category
  • CharFunk.isLetterOrDigit(ch) – Returns true if provided a length 1 string that is a letter or a digit
  • CharFunk.isLowerCase(ch) – Returns true if provided a length 1 string that is lowercase
  • CharFunk.isMirrored(ch) – Returns true if provided a length 1 string that is a mirrored character
  • CharFunk.isUpperCase(ch) – Returns true if provided a length 1 string that is uppercase
  • CharFunk.isValidFirstForName(ch) – Returns true if provided a length 1 string that is a valid leading character for a JavaScript identifier
  • CharFunk.isValidMidForName(ch) – Returns true if provided a length 1 string that is a valid non-leading character for a ECMAScript identifier
  • CharFunk.isValidName(string,checkReserved) – Returns true if the string is a valid ECMAScript identifier
  • CharFunk.isWhitespace(ch) – Returns true if provided a length 1 string that is a whitespace character
  • CharFunk.indexOf(ch) – Returns the first index where the character causes a true return from the callback, or -1 if no match
  • CharFunk.lastIndexOf(ch) – Returns the last index where the character causes a true return from the callback, or -1 if no match
  • CharFunk.matchesAll(string,callback) – Returns true if all characters in the provided string result in a true return from the callback
  • CharFunk.replaceMatches(string,callback,ch) – Returns a new string with all matched characters replaced, similar to String.replace()
  • CharFunk.splitOnMatches(string,callback) – Splits the string on all matches, similar to String.split()

This allows you to do some things you would have a hard time doing in JavaScript otherwise. JavaScript RegExps are notoriously useless for dealing with non-ASCII data. For example, imagine you wanted to do something simple like replace all non-word characters with an underscore. This is easy:

"The United States of America".replace(/[^\w]/g,"_");
    //returns "The_United_States_of_America"

Unless of course, you are dealing with non-ASCII letters:

"Российская Федерация".replace(/[^\w]/g,"_"); 
    //returns "___________________" 
"جمهورية مصر العربية".replace(/[^\w]/g,"_"); 
   //returns "____________________"

That’s not what we want.

Fortunately, CharFunk can handle this using replaceMatches:

function notLetterOrDigit(ch) {
    return !CharFunk.isLetterOrDigit(ch);
}

CharFunk.replaceMatches("جمهورية مصر العربية",notLetterOrDigit,"_"); 
    // returns "جمهورية_مصر_العربية"

CharFunk.replaceMatches("Российская Федерация",notLetterOrDigit,"_"); 
   //returns "Российская_Федерация"

This is just one small example of what CharFunk can do. I hope that web developers working on international projects — which is pretty much any web app these days — will find this useful!

Cases of User Experience – Seattle Tech Forum

This month’s STF was all about User Experience. It was sponsored by boutique software house Webtellect. Webtellect creates custom software solutions, primarily based on Microsoft technologies.

Our first speaker was Christopher Johnson, a Designer at Google on the Hangouts team in Mountain View. His talk was titled “What the Hell is UX, Anyway?”. He explained how the technology market has shifted towards design-led companies. The obvious example is Apple, but you can also see this at Microsoft, Facebook, Path, Pinterest, Instagram, Google… the list goes on. The iPhone really kicked off this trend in 2007. Mobile has forced software creators to focus on simplicity and design — with a smaller screen, you have put more care into what you put on it.

This shift took hold at Google when Larry Page became CEO and rallied the company to create “one beautiful, intuitive user interface”. This effort became known as “Project Kennedy” and the results are evident across all of Google’s properties. This new emphasis on design brought Google’s users not only prettier, more consistent screens, but also a better User Experience.

According to Johnson, UX is a story. For a software user there is a problem to solve, and a beginning, a middle and an end. He then gave us five tips to becomming a UX focussed organization so that we can give the story of our user’s experience a happy ending:

  • Let your designers out of their cage
  • Prototype early and often
  • Treat design as an equal partner with engineering
  • Talk with and watch users
  • Embrace the design sprint

Johnson left us by answering the question in the title of his talk: What the hell is UX? UX is everyone’s responsibility.

Jeremy Foster, Developer Evangelist at Microsoft, gave a talk titled “Ultimate User Experience”. Primarily it was an illustration of design principles using Microsoft’s new metro UI.

Foster compared good design to the movement of air. Poor design is like air at rest — particles of air moving around in every direction are like a user with no clear indication of what to do. Good design, like a communicative sound pushing waves of air towards your ear, gives the user direction. He made a strong argument for Microsoft’s new edge-to-edge design language, listing principles like “content before chrome”, “minimize distractions”.

Every time I hear a passionate Microsofty explain the ideas behind the new UI and show it off, I feel like they really got it right. Windows 8 is beautiful and perfectly crafted to the consumer computing experience. I’m still not convinced it will fully succeed in a work environment, but I was still impressed by Foster’s talk.

Finally we heard from Charlie Claxton, VP of Creative Strategy at Produxs, with a presentation titled “The Habit of Design”. Claxton also gave an STF presentation on UX last July. He covered concepts such as framing, social proof, loss aversion and operant conditioning.

Maybe the most interesting part of his presentation centered around Eugene Pauly, an amnesiac who suffered from short term memory loss. Pauly could not remember what he did five minutes earlier or remember the layout of his new home. However, he was still able to learn in a way, by being made to repeat the same experience daily and thereby forming habits. Claxton claims that good design should be aimed at creating and leveraging habits.

This well-attended STF was very educational. Join us on April 17th for our next topic, “Legal Matters for Technology Company”.

If Business had Glass

Not many people are talking about how Google Glass could be used by business and industry. Google seems to be aiming Glass straight at consumers, with a design that is surprisingly sleek and marketing videos full of personal moments. Apple has proven that there are plenty of people who will eagerly spend money on the latest sexy gadget. Still, I think the enterprise might be an even better early adopter than hipsters with money to burn.

There are endless possibilities for Glass to create immediate business value. Almost any employee using an mobile device or tablet could be made more effective if their hands and eyes were left free to do real work.

  • A warehouse manager could use Glass to look at shelves and obtain immediate information about what is in stock, how fast product is moving, and more.
  • A forklift driver in that warehouse could use Glass to navigate the warehouse while keeping their hands on the controls.
  • A doctor could (with your permission and the proper ten pages of paperwork) record their view of an operation for future reference (and for liability protection).
  • A delivery driver could use Glass to safely navigate their route and deal with changes while they are driving.
  • A host on a cruise ship could access useful and timely information about their guests and provide quick answers to guest’s questions, all without breaking eye contact.
  • A repair person could use schematics and instructions as they work on fixing complex machinery, without ever setting down their tools.

These are just a few possibilities in an endless list, which will grow as the devices capabilities improve. I can imagine even more applications in government, athletics and entertainment.

Beyond the incredible possible applications, there are other reasons business and industry would be a great fit as an early adopter for Glass. Businesses get to skip over the issue of “will people want to wear these sci-fi looking things in public”. Businesses might have fewer qualms with the high early price point provided it creates compelling, measurable value — companies routinely spend upwards of $2k on ruggedized mobile devices (although cheaper Android and iOS solutions are competing here). Businesses can fund the development of useful applications without the need for bootstrapping or venture capital.

I’m still really excited about the consumer possibilities of Glass, including Games. But I hope that Google and Business get together and leverage Glass to bring new possibilities to enterprise computing.

Cases of Digital Marketing – Seattle Tech Forum

This week’s STF was sponsored by tech-talent firm Chameleon Technologies. The topic was Digital Marketing and it was extremely well attended, by a fairly different crowd than we normally see — lots of marketing people of course. This is a topic I am not very familiar with, so it was pretty interesting stuff.

Our first speaker was Content Harmony‘s Marketing Director, Kane Jamison, who spoke about “Current Internet Marketing Trends & How They Will Affect Your Organization”. He actually posted most of his content over at his blog, which is well worth checking out. It was full of useful information about digital marketing and how it is changing. I really liked his data-driven format, with lots of interesting data points to hang his presentation from.

One of the most interesting things Jamison mentioned (which he doesn’t seem to cover in his blog post) is how the change to more HTTPS Google searches has effected website owners. Links from Google search using HTTP includes search keyword information, which helps website owners tune their SEO approach (“gee, lots of people are coming here looking for XYZ, we can write more about that”). However links from searches using HTTPS do not include this data, which is a reasonable privacy and security protection. Now that Google is moving people to use HTTPS more often (due to more authenticated usage to support Google Plus and other services), this means fewer searches from Google include search keyword information. That is impacting how website owners approach SEO.

Next we heard from Josh Dirks, Founder and CEO of Project Bionic. He started by telling us that he was the grandson of an auctioneer and son of a preacher, and he had a fun speaking style to prove it. His talk was titled “Welcome to Your Social Nervous System”.

Dirks argued that social media is not about marketing at all but rather about customer service and community building. He illustrated this by relating today’s big data and social media revolution back to the days of rural villages, before the industrial revolution. In those days most folks lived in a village of 50 to 300 people and had few secrets. According to Dirk, businesses had to pay close attention to their customers and provide excellent products and services, because there was no escaping their reputation in such a small world. In the industrial revolution this all changed. Huge factories hundreds of miles away mass produced products, and there was no mechanism for them to receive feedback other than the one dimensional channel of communication called revenues. Media was all one-way: Newspapers, TV and Radio. Then the internet and social media came along and re-connected customers and companies. Now word about your business travels fast on Twitter, Facebook, Yelp, and so on. Businesses that listen and interact with customers online in a dialog are going to succeed in this new era, while those locked in the one-way world of the past will fall behind.

There was also a third speaker scheduled, but they couldn’t make it due to illness. Fortunately, Dirk and Jamison provided us plenty to discuss and think about.

Cases of Big Data – Seattle Tech Forum

On Wednesday I attended STF to learn about Big Data. It was a packed house again, and I nearly didn’t get a seat (thanks Tiger).

First on deck was Avkash Chauhan (who also spoke at the October 2012 STF), a Senior Engineer with Windows Azure and HDInsight at Microsoft, who’s talk was titled “Data Visualization: Tools and Techniques”. He demonstrated how visualizations can be used to understand large data sets. He started out with a visualization showing tweets related to the Egyptian Revolution. He explained how the various connections and distribution of nodes could be used to visually understand what was happening on Twitter at that time. I can’t find the exact graphic he used but this video is fairly similar:

Chauhan went on to demonstrate how to use two different open source visualization tools, NodeXL (an Excel plugin) and Gephi, to create similar visualizations. Both tools allow you to import public data sets like Youtube or Twitter, or use your own data. You can then toggle hundreds of controls to create exactly the right kind of visualizations to help you understand your data.

The second speaker, Arpit Gupta, spoke about “Philosophy of Big Data” (which you can watch on YouTube). In many ways this was your typical subject overview talk about Big Data, but by using humour and really great examples Gupta made this far more valuable and interesting than most overview presentations.

In one slide he listed sixteen different industries which involved applications of Big Data. He asked the audience to pick two of them to talk about. He proceeded to give some really interesting examples from the worlds of Fraud & Security and Search Quality, but it seemed like he could easily have talked about the other fourteen topics as well. He also brought an interesting perspective to the table explaining that Big Data was really a new marketting term for something that had been around a long time. The main change (besides the invention of the term) is that only recently has it been cheap enough to do really good analysis on all that data.

Next we heard “Big Data – 10x Better” from Ying Li, the Chief Scientist and co-founder of Concurix. The “10x Better” refers to Concurix’s main goal: to create an operating system specifically engineered for data centers, one that will deliver at least a 10x price/performance improvement over current Linux and Windows servers. They are doing this by focusing on improving the usage of multiple cores. According to Li, current operating systems and software platforms are not very well suited to leverage multi-core, with net performance actually getting worse beyond about 8 cores. Concurix believes they can radically improve that situation.

To that end, Li has been doing research on various machine and OS configurations by benchmarking calculations of the Mandelbrot Set. She gave some very detailed information and visualizations showing the behavior of a multi-core system in terms of core utilization, garbage collection, etc. During Q&A I had to ask whether the Mandelbrot Set was a good place to start, given that it was such an emberrassingly parellel problem, and did that mean that Concurix did not care about figuring out how to improve multi-core use for less parellel problems (which is much more difficult a problem). She responded that the Mandelbrot Set was just a starting point and that they definately did intend to work on improvements for less parellel types of software problems.

Finally Jim Caputo, Engineering Manager for BigQuery at Google, gave his talk “Big Data for the Masses: How We Opened Up the Doors to Googles Dremel”. He opened with some great stats regarding Big Data at Google such as the fact YouTube currently has 72 hours of video uploaded every minute. He went on to talk about how BigQuery provides an SQL-like ad-hoc query interface over these kinds of very large data sets. As an example he ran a query against one of the sample BigQuery data sets (which is apparently not publicly available yet). This query across 14+ billion rows, 1TB of data in 12 tables, returned it’s results in just 30 seconds.

Caputo went on to explain the technology behind BigQuery, called Dremel, and how it differed from BigTable and MapReduce. Dremel achieves a much lower latency by using a completely column oriented storage approach and a totally diskless data flow. This means that queries involving just a few columns need only touch storage on machines where those columns are stored. It also means that machines involved later on in processing the query won’t be doing any disk I/O.

Data can be uploaded directly and quickly into BigQuery. An online console can be used to query this data immediately. Developers can also create there own interfaces using the BigQuery API directly or via one of the many available client libraries.

Finally we heard from the a representative of this month’s sponsor, ComputeNext. He described ComputeNext as “Expedia for Cloud services”. They compare IAAS providers on performance, pricing, availability, and other metrics, and provides ways of easily accessing those providers. This could be a very useful service for companies requiring a varying range of services, especially if spread across many countries where available offerings might differ.

Cases of Data Storage & Data Management – Seattle Tech Forum

Last Wednesday I attended the “Cases of Data Storage & Data Management” session of the Seattle Tech Forum. We had some great speakers and a lot more attendance than recent STFs.

NuoDB sponsored the entire session, and also provided our first speaker, Barry Morris, CEO and Co-Founder. His talk was titled “Establishing A Successful Relational Database Strategy for the 21st Century”. He started by talking about how SQL and relational database in general were the biggest inventions in database technology in the 20th century. He outlined all the ways that a set-level data store beats a record-level/document oriented store. The SQL/RDB paradigm has brought us benefits such as ACID, and has built up a huge amount of value in existing data, tools, and large numbers of people trained to do useful things with SQL.

He went on to explain that since SQL is a 20th century invention, and is not well suited to deal with 21st century problems such as:

  • Commodity data centers (lower cost, low management requirements)
  • Big data
  • Modern workloads
  • 24×7 operation
  • Geo-distribution
  • Developer empowerment

According to Morris, this has led to a database crisis, generating many bad ideas like sharding, master/slave replication, and complicated caching schemes.

His solution is NuoDB, a database designed to supply all the powerful benefits of an RDB (such as ACID), while bringing all the flexibility and low overhead of a NoSQL database. NuoDB’s organization and ability to scale arises from emergent properties resulting from simple, deterministic behaviors of each machine, just like natural systems. With NuoDB, there is apparently no central control, no master data, no supervisory role for any machine in the database. Machines come and go as they please, and once added to a database can quickly become a useful part of that database — automatically, without configuration.

Apparently NuoDB has already been in trials with customers, and goes into beta in January.

Next we heard a presentation titled “Emerging Trends in BI and Bigdata Warehousing” by Amol Shanbhag, Senior Data Warehousing Engineer at Expedia. He started with the question “how big is Big Data?”, supplying these interesting statistics:

  • The Library of Congress adds 5TB a month
  • The internet will move 18 exabytes per month in 2013 (I may have written this down wrong since Google is telling me we’re already at 21EB per month)
  • One zetabyte is twice as big as today’s Internet

After covering some more Big Data facts and trends, Shanbhag moved on to talk about the use of NoSQL in BigData. He emphasized his belief that “NoSQL” should really be thought of as “Not only SQL”, as there is a need for both. He claimed that especially for analytics one of the benefits of NoSQL is that you have a faster time to insights about your data. He also explained the difference between OLTP (MongoDB, Couch, Azure, etc) and OLAP (Hadoop, etc) NoSQL systems.

He then examined Hadoop in some depth, explaining it’s benefits and use cases for Expedia and in general.

Finally we heard from Mike Miller, Chief Scientist and Co-Founder at Cloudant; Affiliate Professor of Particle Physics, UW. Miller had spoken at the November session as well. This time his talk was titled “Moving Beyond the No/New/SQL Debate: Introducing the Application Data Layer”.

He gave a great overview of Cloudant, which he said you can think of as the “Akamai of Data Content”. Cloudant provides a scalable, managed CouchDB/BigCouch database as an enterprise grade service. The goal of Cloudant is to allow you to focus on your application, not data operations.

At one point Miller also said a goal is “to get a data center on every cell tower”, which was (I think?) tongue in cheek, but speaks to their desire for high performance and ubiquity, and the strength of the NoSQL model. Since CouchDB is a NoSQL database with no guarantees of immediate consistency, such radical decentralization is actually a very realistic possibility.

Miller touted the auto sharding behavior of CouchDB. He contrasted that with the pain of app level sharding. Apparently it took two whole years for Google to shard it’s F1 ad network data. He gave another example of Hothead Games, who were unable to scale their game’s MySQL database when the game went viral — until they moved to Cloudant.

In the Q&A we had some great questions (which I took poor notes on). At one point it seemed like there might be a debate brewing between Morris and Miller on the pros and cons of NoSQL, but sadly it never materialized.

I would like to compliment both Morris and Miller on their presentations which focussed mostly on their own companies but somehow did not come off as a sales pitch — most speakers can’t pull that off. I also enjoyed the technical depth Shanbhag was able to explore. It was a very interesting and educational evening.

Christmas for Classrooms – a new DonorsChoose.org front-end

This weekend had a bolt of inspiration and built Christmas for Classrooms, which I launched last night.

Just like DonorsChooseGeographic, which I built as part of the Hacking Education competition, Christmas for Classrooms uses the DonorsChoose API. Using this API it lists proposals posted by teachers across the USA. These proposals are for things like:

People can then click on a project and view details, and hopefully choose to make a gift!

The site also boasts a fun HTML5 canvas animated snow-fall background, which I might write about if I have time.

I’ve put up a Facebook page and Twitter account as well where I’m trying to “market” certain projects.

Anyway, I’m hoping this drums up a little funding for our nation’s classrooms — Go take a look and make a donation!

We need a standard for namespacing localStorage keys

There is a trap hidden within the promise of widespread use of HTML5′s localStorage feature. The trap is the fact that the localStorage for a particular subdomain is shared by all scripts that run on that domain. This includes both application code and library code.

If multiple scripts running on a subdomain uses localStorage, it is easy to imagine them conflicting with each other. For example, if two or more scripts use some of the same localStorage item keys, they will trample on each other’s data, possibly causing each other to choke on unexpected data or to be fooled by plausible but incorrect data set in place by another script. Or, if one of the scripts uses localStorage.clear() when it decides it’s cache has gone stale, it will clear out every other script’s data as well. This might cause unnecessary repeated downloads of data that was in localStorage only moments before.

It might be reasonable to expect that all the application code (by which I mean non-library code) on a single subdomain should be coordinated in their usage of localStorage (though if multiple teams are involved even this might be a stretch). However it’s definitely not reasonable to expect that different libraries will be so coordinated. At least not right now.

It would have been nice if localStorage had had namespacing built into it’s API. Perhaps the localStorage API can be expanded to include this in the future.

In the meantime, another approach would be to introduce a library which would provide namespacing for localStorage. Squirrel.js is one such library, and appears to be well thought out (though I haven’t used it). However, I doubt that most library authors would want to add this to their list of required libraries, just as web designers probably don’t want one more script they have to include in all their projects.

So what can we do about it?

The solution I’m advocating is that the web dev community settle on a convention for namespacing use of localStorage keys, plus a few rules of thumb to avoid conflicts. For example, we might have a few simple rules:

  • For libraries, keys should be prefixed with the subdomain of the script’s primary “home” on the web, followed by a colon. For example, “github.com/theAuthor/theScript.js:” or “scriptName.somesite.com:”.
  • For application code, keys should be prefixed with the root subpath within the website that represents the application, followed by a colon. For example, “/stocks:”, “/admin:”, or just “:” if this is code for the whole site.
  • All libraries should supply a way for a different namespace to be used. For example, “SomeScript.setLsNamespace(…)” or “new SomeScript({ lsns: ‘…’})”.
  • Libraries should avoid using localStorage.clear().

This convention/standard needs a name so that libraries can note that they have “XYZ compliant usage of localStorage”.

So my question for you is: what’s the best place to flesh out such a convention and launch it into the world? Will you help me sort it out and spread the word?

Update: I’ve created a Github repository to try to facilitate more involvement. Come join in!

Soundslice – a totally new approach to learning guitar music

Given the fact I’ve complained about existing guitar music websites, I figure I just have to comment on this. Adrian Holovaty and PJ Macklin have created a really excellent tool for guitar players to learn songs. Soundslice gives you a realtime tabulature that plays at the same speed as the actual song, a bit like Songsterr. But along with that tab, Soundslice also shows a YouTube video of someone playing the song, synchronized precisely with the tab!

You can even play the video at half speed (but the same pitch!) to work out the more difficult parts of the song. Soundslice also, of course, lets people input their own tabs tied to videos.

Soundslice will be great for people who want to replicate the original work as nearly as possible. That’s the opposite end of the spectrum from the tool I created, OnePageChords, which is more targeted at those of us just trying to capture the essence of the song and less fussed with (or able to pull off!) the details of the original. Still, I know where I’ll be looking when I want to learn a song perfectly. Nice job, Soundslice!