How to Write & Build a Dockerfile?

👋 Hey there, I’m Dheeraj Choudhary an AI/ML educator, cloud enthusiast, and content creator on a mission to simplify tech for the world.
After years of building on YouTube and LinkedIn, I’ve finally launched TechInsight Neuron a no-fluff, insight-packed newsletter where I break down the latest in AI, Machine Learning, DevOps, and Cloud.
What to expect: actionable tutorials, tool breakdowns, industry trends, and career insights all crafted for engineers, builders, and the curious.
If you're someone who learns by doing and wants to stay ahead in the tech game you're in the right place.

Introduction

Everything in Docker starts with a Dockerfile. You can understand containers, pull images from Docker Hub, and run them all day, but the moment you need to package your own application, you need to know how to write one. A Dockerfile is a plain text file containing a sequence of instructions that tells Docker exactly how to build your image, layer by layer, step by step.

The instructions themselves are not complicated. What trips people up is understanding why certain instructions exist, what the difference is between similar-sounding ones like CMD and ENTRYPOINT or ENV and ARG, and how the order of instructions affects build performance. This guide covers all of that: every essential instruction, the two best practices that will immediately make your Dockerfiles faster and leaner, and a complete production-ready example that ties everything together.

What Is a Dockerfile?

A Dockerfile is a plain text file named exactly Dockerfile with no file extension. It lives in your project directory and contains a series of instructions written in a simple, declarative syntax. When you run docker build, Docker reads that file top to bottom, executes each instruction in order, and produces an image.
Each instruction creates a new read-only layer on top of the previous one. The final stack of layers is your image. That layered structure is what makes Docker images efficient: layers are cached, shared between images, and only rebuilt when something changes.

A minimal example:

FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3
CMD ["python3"]

Three instructions, three layers, a working Python environment. That's the basic shape of every Dockerfile.
Two properties worth knowing from the start: Dockerfiles are reproducible (the same file always builds the same image, assuming the base image and dependencies don't change), and they're auditable (every decision about what goes into an image is written down in a file you can version-control alongside your code).

FROM: Choosing Your Base Image

Every Dockerfile must start with FROM. It defines the base image your new image builds on top of. Everything else you add sits on top of this foundation.

FROM node:18

This pulls the official Node.js 18 image from Docker Hub and uses it as the starting point. Your image inherits everything that image contains: the OS, the Node runtime, npm, and all their dependencies.
The tag matters. FROM node:18 and FROM node:18-alpine produce very different results. The standard node:18 image is based on Debian and weighs in at over 900MB. The node:18-alpine variant is based on Alpine Linux and comes in around 170MB. For production images, Alpine-based variants are almost always the better choice: smaller images pull faster, have a smaller attack surface, and take up less storage.

# Full Debian-based: ~900MB
FROM node:18

# Alpine-based: ~170MB (preferred for production)
FROM node:18-alpine

Always pin a specific version tag rather than using latest. The latest tag moves with the image maintainer's releases. If you build your image today on node:latest and rebuild it six months from now, you might get a completely different Node version, potentially breaking your application.

# Risky in production
FROM node:latest

# Predictable and safe
FROM node:18.19.0-alpine3.19

WORKDIR: Setting the Working Directory

WORKDIR sets the working directory inside the container for all subsequent instructions. Any RUN, COPY, ADD, CMD, and ENTRYPOINT instructions that follow will execute relative to this directory.

WORKDIR /app

If the directory doesn't exist, Docker creates it. You can use WORKDIR multiple times in a Dockerfile to change directories mid-build, though in practice most Dockerfiles set it once early and leave it there.

Using WORKDIR is preferable to RUN cd /app because it's explicit, persistent, and doesn't depend on shell state. Without it, commands run from the root of the container filesystem, which creates messy images where application files end up scattered in the root directory.

COPY and ADD: Getting Files Into the Image

COPY transfers files and directories from your build context (the directory where you run docker build) into the image filesystem.

# Copy a single file
COPY package.json /app/

# Copy all files from current directory into /app
COPY . /app/

# Using WORKDIR, relative paths work cleanly
WORKDIR /app
COPY package.json .
COPY . .

ADD does everything COPY does, plus two additional behaviors: it can fetch files from URLs, and it automatically extracts tar archives into the destination directory.

# ADD can extract archives
ADD app.tar.gz /app/

# ADD can fetch from URLs (though this is generally discouraged)
ADD https://example.com/config.json /app/config.json

In practice, use COPY for the vast majority of cases. It's explicit and predictable. Use ADD only when you specifically need the tar extraction behavior. Fetching from URLs in ADD is considered poor practice because it makes builds dependent on external network availability and the URL content can change, making builds non-reproducible.

Also use a .dockerignore file in your project root to prevent COPY . . from pulling in files you don't want in the image:

node_modules
.git
.env
*.log
Dockerfile
.dockerignore

RUN: Executing Commands at Build Time

RUN executes a shell command during the image build process and commits the result as a new layer. It's how you install packages, compile code, create directories, set permissions, and do anything else that needs to happen before your container runs.

RUN npm install
RUN apt-get update && apt-get install -y curl

RUN has two forms: shell form and exec form.

Shell form runs the command through /bin/sh -c:

RUN apt-get update && apt-get install -y curl

Exec form runs the command directly without a shell:

RUN ["apt-get", "install", "-y", "curl"]

Shell form is more common for RUN because it allows shell features like &&, pipes, variable expansion, and line continuation with \. Exec form is rarely used with RUN.

The critical thing to understand about RUN is that every instruction creates a new layer. This has direct implications for image size and build performance, which is covered in the best practices section below.

ENV and ARG: Variables at Runtime vs Build Time

These two instructions both define variables, but they operate at different points in the container lifecycle and have different visibility.

ENV

ENV sets environment variables that are available both during the build process and in the running container. They persist into containers.

ENV NODE_ENV=production
ENV PORT=3000
ENV DB_HOST=localhost

Inside the running container, these are accessible as normal environment variables. Your application code can read process.env.NODE_ENV in Node.js, os.environ['PORT'] in Python, etc.

One important nuance from the official docs: each ENV instruction creates a new layer, just like RUN. Even if you unset an ENV variable in a later layer, its value persists in the earlier layer and can still be extracted from the image. For sensitive values that should only exist during the build, use ARG instead or set and unset within a single RUN command.

ARG

ARG defines a variable that only exists during the image build process. Once the image is built, ARG values are gone. They're not available in the running container.

ARG VERSION=1.0
ARG BUILD_DATE
RUN echo "Building version $VERSION on $BUILD_DATE"

ARG values can be passed in at build time:

docker build --build-arg VERSION=2.5 --build-arg BUILD_DATE=$(date) .

ARG is useful for things like version numbers, build metadata, or credentials needed only during the build process (like a token to pull a private package). Never put secrets in ENV since they're visible in the image metadata via docker inspect.

Quick reference

	ENV	ARG
Available during build	Yes	Yes
Available at runtime	Yes	No
Visible in image metadata	Yes	No (after build)
Can be overridden at runtime	Yes (via `-e`)	No
Use for	App config, runtime settings	Build params, version numbers

EXPOSE: Documenting Ports

EXPOSE tells Docker that the container listens on a specific network port at runtime.

EXPOSE 3000
EXPOSE 8080

There's an important distinction to understand here: EXPOSE does not actually publish the port or make it accessible from outside the container. It's documentation. It tells other developers (and Docker tooling like Docker Compose) which ports this container intends to use.

The actual port publishing happens when you run the container with -p:

# This is what actually makes the port accessible
docker run -p 8080:3000 myapp

That said, EXPOSE is still worth including because it documents intent clearly, it integrates with Docker Compose's port auto-assignment when using expose: in compose files, and it shows up in docker inspect output.

CMD and ENTRYPOINT: Defining What Runs

This is the most misunderstood area in Dockerfiles. Both CMD and ENTRYPOINT define what runs when a container starts, but they behave differently and serve different purposes.

CMD

CMD sets the default command and arguments that run when a container starts. It can be overridden entirely by passing arguments to docker run.

CMD ["node", "server.js"]

If someone runs docker run myapp, the container executes node server.js. If they run docker run myapp npm test, the npm test overrides the CMD entirely and that runs instead.

CMD should almost always use exec form (the JSON array syntax), not shell form. The exec form runs the process directly as PID 1 inside the container, which means it receives signals like SIGTERM correctly when Docker stops the container. Shell form wraps the command in /bin/sh -c, so the actual process is a child of the shell and may not receive stop signals properly, leading to containers that don't shut down cleanly.

# Shell form: node is a child of /bin/sh, may miss SIGTERM
CMD node server.js

# Exec form: node is PID 1, receives signals correctly
CMD ["node", "server.js"]

ENTRYPOINT

ENTRYPOINT defines the executable that always runs when the container starts. Unlike CMD, it cannot be overridden by passing arguments to docker run. Arguments passed to docker run are appended to the ENTRYPOINT command instead.

ENTRYPOINT ["node"]
CMD ["server.js"]

With this setup, docker run myapp runs node server.js. docker run myapp app.js runs node app.js. The entrypoint (node) stays fixed; only the argument changes.

ENTRYPOINT is best when you want to treat the container as an executable for a specific tool or application. Use CMD alone when you want a default command that can be fully replaced. Use ENTRYPOINT plus CMD together when you want a fixed executable with overridable default arguments.

# Container always runs node, argument can be changed
ENTRYPOINT ["node"]
CMD ["server.js"]

To override ENTRYPOINT at runtime, use the --entrypoint flag:

docker run --entrypoint bash myapp

Best Practice: Layer Optimization

Every RUN instruction in a Dockerfile creates a new image layer. Layers add up fast, and they never shrink. If you install packages in one RUN layer and then clean up the package manager cache in a separate RUN layer, the cleanup layer doesn't reduce the image size. The files still exist in the earlier layer, and Docker stores all layers.

# Bad: three separate RUN instructions = three layers
RUN apt-get update
RUN apt-get install -y curl
RUN apt-get install -y git

# Good: chained into one RUN = one layer
RUN apt-get update && \
    apt-get install -y curl git && \
    rm -rf /var/lib/apt/lists/*

The rm -rf /var/lib/apt/lists/* at the end removes the package manager's cache files in the same layer they were created. This is what actually reduces image size. If that cleanup were in a separate RUN, it would do nothing for size.

Chain related commands with && and use \ for line continuation to keep the Dockerfile readable. Sort packages alphabetically within a single install command to make diffs easier to read and avoid accidental duplicates:

RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        curl \
        git \
        vim \
    && rm -rf /var/lib/apt/lists/*

The --no-install-recommends flag tells apt not to install recommended packages that aren't strictly required. This alone can significantly reduce image size.

Best Practice: Instruction Order and Build Cache

Docker caches the result of each instruction. When you rebuild an image, Docker checks each instruction from top to bottom. The moment it finds an instruction whose inputs have changed, it invalidates the cache for that instruction and all instructions below it, rebuilding from that point forward.
This cache invalidation behavior is what makes instruction order critical. Put instructions that change frequently at the bottom. Put instructions that rarely change at the top.

The classic example is dependency installation in a Node.js project:

# Bad order: every code change reinstalls all dependencies
FROM node:18-alpine
WORKDIR /app
COPY . .           # Copies everything including your app code
RUN npm install    # Cache busted every time any file changes

When you change a single line in server.js, the COPY . . instruction detects a change, invalidates the cache, and npm install runs again from scratch. On a project with hundreds of dependencies, that's a lot of wasted time.

# Good order: dependencies only reinstall when package.json changes
FROM node:18-alpine
WORKDIR /app
COPY package*.json ./   # Copy only the dependency manifest
RUN npm install         # Only runs if package.json changed
COPY . .                # Copy app code last

Now npm install only re-runs when package.json or package-lock.json changes. Everyday code edits skip straight to COPY . . and use cached dependency layers. This turns a 2-minute build into a 10-second one.

The rule is simple: stable things go at the top, frequently changing things go at the bottom.

A Complete Production-Ready Dockerfile

Here's a Dockerfile for a Node.js application that applies every principle covered in this guide:

# Use Alpine for a minimal base image, pin a specific version
FROM node:18-alpine

# Set working directory for all subsequent instructions
WORKDIR /app

# Copy dependency manifests first to leverage build cache
COPY package*.json ./

# Install only production dependencies, npm ci is faster
# and more reliable than npm install for CI/CD environments
RUN npm ci --only=production

# Copy application source code after dependencies are installed
COPY . .

# Set runtime environment variable
ENV NODE_ENV=production

# Document the port the application listens on
EXPOSE 3000

# Use exec form so node runs as PID 1 and receives signals correctly
CMD ["node", "server.js"]

Let's walk through every decision made here:

FROM node:18-alpine uses Alpine for a small image footprint. WORKDIR /app keeps the filesystem organized. COPY package*.json ./ copies both package.json and package-lock.json (the * glob matches both) before running install, so the dependency install layer is cached independently from application code. RUN npm ci is used instead of npm install because npm ci installs exactly what's in package-lock.json with no version resolution, making builds deterministic and faster. The --only=production flag excludes devDependencies, keeping the image lean. COPY . . comes after the install step so application code changes don't bust the dependency cache. ENV NODE_ENV=production is set at the image level rather than relying on the caller to pass it at runtime. EXPOSE 3000 documents the port for other developers and tooling. CMD ["node", "server.js"] uses exec form so the process receives OS signals correctly.

Building and Running Your Image

With your Dockerfile written, two commands take you from file to running container.

Building the image

# Build from the current directory, tag as myapp version 1.0
docker build -t myapp:1.0 .

# Build with a build argument
docker build --build-arg VERSION=2.0 -t myapp:2.0 .

# Force rebuild without cache
docker build --no-cache -t myapp:1.0 .

The . at the end is the build context: the directory Docker sends to the daemon as the source for COPY instructions. Usually this is your project root. The -t flag sets the image name and tag.
During the build, Docker prints each instruction step. You'll see CACHED next to steps it pulled from cache and actual execution output for steps it ran fresh.

Running the container

# Run the container, map port 8080 on host to 3000 in container
docker run -d -p 8080:3000 --name myapp-container myapp:1.0

# Override the default CMD
docker run -d -p 8080:3000 myapp:1.0 node other-server.js

# Pass environment variables at runtime (overrides ENV from Dockerfile)
docker run -d -p 8080:3000 -e NODE_ENV=staging myapp:1.0

Key Takeaways

A Dockerfile is a plain text file of instructions that Docker executes top to bottom to build an image, with each instruction creating a new immutable layer
FROM must be the first instruction. Pin specific version tags, prefer Alpine variants for smaller images
WORKDIR sets the working directory for all subsequent instructions. Always use it instead of RUN cd
Use COPY for files; use ADD only when you need tar extraction. Always use a .dockerignore file
RUN executes commands at build time. Chain related commands with && in a single RUN to reduce layers and keep cleanup in the same layer as installation
ENV variables persist into running containers. ARG variables exist only during the build. Never store secrets in ENV
EXPOSE documents the port but does not publish it. Actual port publishing happens with -p in docker run
CMD sets a default command that can be fully overridden at runtime. Use exec form (JSON array) so the process runs as PID 1 and receives signals correctly
ENTRYPOINT sets a fixed executable that cannot be replaced (only appended to) by docker run arguments. Use ENTRYPOINT plus CMD together for a fixed binary with overridable default arguments
Put stable instructions at the top, frequently changing ones at the bottom. Copy dependency manifests before source code so dependency install layers get cached independently
Use npm ci instead of npm install in Dockerfiles for deterministic, reproducible builds

Conclusion

A well-written Dockerfile is one of the most important artifacts in a containerized project. It's the single source of truth for what goes into your image, it's version-controlled alongside your code, and it determines how fast your CI/CD pipeline builds and how lean your production images are.
The instructions themselves are straightforward once you understand what each one does and when it runs. The bigger gains come from the two practices covered here: combining RUN commands to minimize layers, and ordering instructions so that slow, stable steps like dependency installation get cached and only run when they actually need to.
From here, the natural next step is multi-stage builds, where you use one stage to compile or build your application and a second, minimal stage for the final image, leaving all build tools and intermediate files behind. That pattern takes these same principles further and is how production images for compiled languages like Go and Java stay small.

🔗Let’s Stay Connected

📱 Join Our WhatsApp Community
Get early access to AI/ML, Cloud & Devops resources, behind-the-scenes updates, and connect with like-minded learners.
➡️ Join the WhatsApp Group

✅ Follow Me for Daily Tech Insights
➡️ LinkedIN
➡️ YouTube
➡️ X (Twitter)
➡️ Website