Leetcode #393: UTF-8 Validation
In this guide, we solve Leetcode #393 UTF-8 Validation in Python and focus on the core idea that makes the solution efficient.
You will see the intuition, the step-by-step method, and a clean Python implementation you can use in interviews.

Problem Statement
Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters).
Quick Facts
- Difficulty: Medium
- Premium: No
- Tags: Bit Manipulation, Array
Intuition
The problem structure lets us track state with bitwise operations.
Bit operations are constant time and avoid extra memory.
Approach
Apply XOR/AND/OR and shifts to maintain the required invariant.
Aggregate the result in a single pass.
Steps:
- Identify a bitwise invariant.
- Combine values with bit operations.
- Return the aggregated result.
Example
Number of Bytes | UTF-8 Octet Sequence
| (binary)
--------------------+-----------------------------------------
1 | 0xxxxxxx
2 | 110xxxxx 10xxxxxx
3 | 1110xxxx 10xxxxxx 10xxxxxx
4 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
Python Solution
class Solution:
def validUtf8(self, data: List[int]) -> bool:
cnt = 0
for v in data:
if cnt > 0:
if v >> 6 != 0b10:
return False
cnt -= 1
elif v >> 7 == 0:
cnt = 0
elif v >> 5 == 0b110:
cnt = 1
elif v >> 4 == 0b1110:
cnt = 2
elif v >> 3 == 0b11110:
cnt = 3
else:
return False
return cnt == 0
Complexity
The time complexity is , where is the length of the array data. The space complexity is .
Edge Cases and Pitfalls
Watch for boundary values, empty inputs, and duplicate values where applicable. If the problem involves ordering or constraints, confirm the invariant is preserved at every step.
Summary
This Python solution focuses on the essential structure of the problem and keeps the implementation interview-friendly while meeting the constraints.