flowchart TD
%% Style definitions
classDef development fill:#F8F4FF,stroke:#6B46C1,stroke-width:2px,color:#1F2937
classDef evaluation fill:#F0FDF4,stroke:#16A34A,stroke-width:2px,color:#1F2937
classDef ci fill:#FEF2F2,stroke:#DC2626,stroke-width:2px,color:#1F2937
classDef deployment fill:#F0F9FF,stroke:#2563EB,stroke-width:2px,color:#1F2937
classDef monitoring fill:#FFFBEB,stroke:#D97706,stroke-width:2px,color:#1F2937
classDef analysis fill:#FAFAF9,stroke:#78716C,stroke-width:2px,color:#1F2937
classDef improvement fill:#FFF7ED,stroke:#EA580C,stroke-width:2px,color:#1F2937
- Use your application extensively to build intuition about failure modes
- Define 3-4 dimensions based on observed or anticipated failures
- Create structured tuples covering your priority failure scenarios
- Generate natural language queries from each tuple using a separate LLM call
- Scale to more examples across your most important failure hypotheses (we suggest at least ~100)
- Test and iterate on the most critical failure modes first, and generate more until you reach theoretical saturation
ARG ALPINE_VERSION=3.12.0 | |
FROM hexpm/elixir:1.11.0-erlang-23.1.1-alpine-3.12.0 as builder | |
ARG APP_VSN="1.0.0" | |
# Replace `your_app` with your otp application name. | |
ENV APP_NAME=your_app \ | |
APP_VSN=${APP_VSN} \ | |
MIX_ENV=prod |
I was at Amazon for about six and a half years, and now I've been at Google for that long. One thing that struck me immediately about the two companies -- an impression that has been reinforced almost daily -- is that Amazon does everything wrong, and Google does everything right. Sure, it's a sweeping generalization, but a surprisingly accurate one. It's pretty crazy. There are probably a hundred or even two hundred different ways you can compare the two companies, and Google is superior in all but three of them, if I recall correctly. I actually did a spreadsheet at one point but Legal wouldn't let me show it to anyone, even though recruiting loved it.
I mean, just to give you a very brief taste: Amazon's recruiting process is fundamentally flawed by having teams hire for themselves, so their hiring bar is incredibly inconsistent across teams, despite various efforts they've made to level it out. And their operations are a mess; they don't real