Question
In C#, how can a string be converted to a byte[] in .NET without manually choosing an encoding?
I am planning to encrypt the string. Although encryption APIs may sometimes work with text directly, I want to understand why encoding is involved when converting text to bytes.
Why does encoding matter here at all? Can't I just retrieve the raw bytes that the string is already stored as internally? Why is converting a string to bytes dependent on character encoding?
Short Answer
By the end of this page, you will understand that a C# string is text, not an arbitrary byte sequence, and converting it to byte[] always requires an encoding such as UTF-8 or UTF-16. You will learn how .NET stores strings internally, why there is no single universal byte representation for text, how to convert strings to bytes consistently, and what to use when preparing text for encryption, storage, or network transfer.
Concept
A string in C# represents characters, not raw bytes.
When you want a byte[], .NET must answer a question:
- Which bytes should represent those characters?
That question is exactly what character encoding solves.
Why encoding is required
The same text can have different byte representations depending on the encoding.
For example, the text Hello could be represented as:
- UTF-8
- UTF-16
- ASCII
- UTF-32
Each encoding turns characters into bytes differently.
Example
The character A might be:
- UTF-8:
41 - UTF-16 little-endian:
41 00 - UTF-32 little-endian:
41 00 00 00
So when you ask for bytes, .NET cannot guess which representation you want unless you specify one.
How .NET stores strings internally
In .NET, a string is stored in memory as a sequence of UTF-16 code units. That is an implementation detail of how the runtime keeps text in memory.
But that internal representation is as a portable text encoding choice for:
Mental Model
Think of a string as a sentence written on a whiteboard.
The sentence is the meaning: the characters and words.
An encoding is the rulebook for copying that sentence into a sequence of numbered boxes.
Different rulebooks produce different box contents:
- UTF-8 uses one set of rules
- UTF-16 uses another
- ASCII uses a limited older set
The whiteboard text is not "already a file format." It is just text in memory.
So asking for the bytes of a string without an encoding is like asking:
"Write this sentence into boxes, but don't tell me which writing system to use."
The runtime needs that rulebook.
Another useful mental model:
string= human-readable textbyte[]= raw machine dataencoding= translator between the two
Without the translator, the conversion is ambiguous.
Syntax and Examples
Core syntax
using System.Text;
string text = "Hello";
byte[] bytes = Encoding.UTF8.GetBytes(text);
string decoded = Encoding.UTF8.GetString(bytes);
Example: converting text to bytes
using System;
using System.Text;
class Program
{
static void Main()
{
string text = "Hello, café";
byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);
byte[] unicodeBytes = Encoding.Unicode.GetBytes(text); // UTF-16 LE in .NET
Console.WriteLine("UTF-8 bytes: " + BitConverter.ToString(utf8Bytes));
Console.WriteLine("UTF-16 bytes: " + BitConverter.ToString(unicodeBytes));
}
}
This prints different byte sequences for the same string because different encodings are being used.
Example: preparing text for encryption
using System.Text;
string plaintext = "Secret message";
byte[] plaintextBytes = Encoding.UTF8.GetBytes(plaintext);
// Encrypt plaintextBytes here
Step by Step Execution
Consider this code:
using System;
using System.Text;
string text = "Aé";
byte[] bytes = Encoding.UTF8.GetBytes(text);
Console.WriteLine(BitConverter.ToString(bytes));
Step by step
1. string text = "Aé";
A string containing two characters is created:
Aé
These are characters, not output bytes yet.
2. Encoding.UTF8.GetBytes(text);
.NET converts each character into UTF-8 bytes.
Abecomes41ébecomesC3-A9
So the final byte array becomes:
41-C3-A9
3. BitConverter.ToString(bytes)
This formats the byte array as a readable hexadecimal string.
Real World Use Cases
Encoding-aware string-to-byte conversion is used everywhere in real programs.
1. Encrypting user input
Before encrypting a password hint, note, or token payload, text is converted to bytes:
byte[] data = Encoding.UTF8.GetBytes(userMessage);
2. Writing text to files
A file is just bytes. If you save text, you must choose how characters become bytes.
File.WriteAllText("notes.txt", text, Encoding.UTF8);
3. Sending JSON over HTTP
HTTP bodies are bytes, even if the content is text.
var content = new StringContent(json, Encoding.UTF8, "application/json");
4. Storing text in databases or caches
Data may move across systems with different defaults. Explicit encoding avoids corruption.
5. Hashing text consistently
If you hash text, the bytes must be consistent or the hash changes.
byte[] input = Encoding.UTF8.GetBytes(text);
Different encodings produce different hashes for the same visible text.
Real Codebase Usage
In real projects, developers usually do not avoid choosing an encoding. Instead, they make the choice explicit and consistent.
Common patterns
Use UTF-8 as the default text encoding
UTF-8 is the most common choice for APIs, config files, JSON, logs, and cross-platform systems.
byte[] bytes = Encoding.UTF8.GetBytes(value);
Keep encode/decode paired
If one part writes with UTF-8, the reading part should also use UTF-8.
byte[] bytes = Encoding.UTF8.GetBytes(text);
string textAgain = Encoding.UTF8.GetString(bytes);
Validate input before conversion
Guard clauses are common when handling nullable or empty values.
if (text == null)
throw new ArgumentNullException(nameof(text));
byte[] bytes = Encoding.UTF8.GetBytes(text);
Centralize encoding decisions
Larger codebases often define a shared encoding policy.
public static class TextEncoding
{
Encoding Default = Encoding.UTF8;
}
Common Mistakes
1. Assuming a string already has one obvious byte representation
A string is text, not a fixed byte array.
Mistake
// Conceptually wrong: there is no encoding-free text-to-bytes conversion
Fix
Choose an encoding explicitly:
byte[] bytes = Encoding.UTF8.GetBytes(text);
2. Encoding with UTF-8 and decoding with UTF-16
Using different encodings in each direction corrupts data.
Broken code
byte[] bytes = Encoding.UTF8.GetBytes("café");
string text = Encoding.Unicode.GetString(bytes);
Fix
Use the same encoding both ways:
byte[] bytes = Encoding.UTF8.GetBytes("café");
string text = Encoding.UTF8.GetString(bytes);
3. Using ASCII for non-ASCII text
ASCII cannot represent many characters like é, 中, or emoji.
Broken code
Comparisons
Common encoding choices in C#
| Encoding | Bytes for A | Supports Unicode well? | Common use | Notes |
|---|---|---|---|---|
| ASCII | 41 | No | Legacy text | Only supports 0–127 reliably |
| UTF-8 | 41 | Yes | Web, APIs, files, JSON | Most common modern choice |
UTF-16 (Encoding.Unicode) | 41-00 | Yes | In-memory .NET string model, some Windows APIs | Uses 2 or more bytes per character conceptually |
| UTF-32 |
Cheat Sheet
Quick reference
Convert string to bytes
byte[] bytes = Encoding.UTF8.GetBytes(text);
Convert bytes to string
string text = Encoding.UTF8.GetString(bytes);
Get UTF-16 bytes
byte[] bytes = Encoding.Unicode.GetBytes(text);
Convert encrypted bytes to safe string form
string base64 = Convert.ToBase64String(encryptedBytes);
Convert Base64 string back to bytes
byte[] bytes = Convert.FromBase64String(base64);
Rules to remember
- A C#
stringis text, not raw binary data. - Converting text to bytes always needs an encoding.
- Use the same encoding for encoding and decoding.
- UTF-8 is usually the best default.
- Do not treat arbitrary binary data as text.
- Use Base64 when binary data must be stored in a string.
Common choices
Encoding.UTF8-> best default for most applications
FAQ
Why can't I convert a C# string to bytes without an encoding?
Because a string represents characters, not a unique byte sequence. An encoding defines how those characters are turned into bytes.
Does .NET store strings internally as UTF-16?
Yes, .NET strings use UTF-16 code units internally. But that internal storage format is not the same as choosing a portable external byte format.
Should I use UTF-8 or UTF-16 in C#?
Usually use UTF-8 for files, APIs, encryption input, and network communication. Use UTF-16 only when a specific API or format requires it.
If .NET uses UTF-16 internally, can I just use Encoding.Unicode?
You can, but that is still an explicit encoding choice. It is not the same as avoiding encoding. It is often less portable than UTF-8.
Why does encryption require bytes instead of strings?
Encryption algorithms operate on binary data. If your original data is text, you must first encode it into bytes.
Can I store encrypted bytes in a string?
Not safely as regular text. Use Base64 to convert encrypted bytes into a string-safe representation.
Will the same text always produce the same bytes?
Only if you use the same encoding. Different encodings produce different byte sequences for the same text.
Mini Project
Description
Build a small C# console program that shows how the same string becomes different byte arrays depending on the encoding. This project helps reinforce that text and bytes are not the same thing, which is especially important before hashing, encryption, file writing, or sending data over a network.
Goal
Create a program that converts one string into UTF-8 and UTF-16 byte arrays, prints both results, and decodes the bytes back into the original text.
Requirements
- Create a string containing at least one non-ASCII character such as
éor中. - Convert the string to bytes using both
Encoding.UTF8andEncoding.Unicode. - Print the byte arrays in hexadecimal form.
- Decode both byte arrays back into strings using the matching encoding.
- Show that the original text is preserved when the correct encoding is used.
Keep learning
Related questions
AddTransient vs AddScoped vs AddSingleton in ASP.NET Core Dependency Injection
Learn the differences between AddTransient, AddScoped, and AddSingleton in ASP.NET Core DI with examples and practical usage.
C# Type Checking Explained: typeof vs GetType() vs is
Learn when to use typeof, GetType(), and is in C#. Understand exact type checks, inheritance, and safe type testing clearly.
C# Version Numbers Explained: C# vs .NET Framework and Why “C# 3.5” Is Incorrect
Learn the correct C# version numbers, how they map to .NET releases, and why terms like C# 3.5 are inaccurate and confusing.