- How does the analyzer detect file type without trusting the extension?
- It reads 'magic bytes' — the first few bytes of the file that identify the format. JPEG files start with FF D8 FF, PNG with 89 50 4E 47, PDF with 25 50 44 46. A file named photo.txt that starts with FF D8 FF is actually a JPEG regardless of its name.
- What is the difference between file size and disk usage?
- File size is the actual number of bytes of content. Disk usage is how much storage the file occupies — always equal to or greater than file size due to block allocation (filesystems allocate space in fixed-size chunks). A 1-byte file on a filesystem with 4KB blocks uses 4KB of disk.
- What does 'encoding' mean for text files?
- Character encoding defines how characters are represented as bytes. UTF-8 uses 1-4 bytes per character. ASCII uses 1 byte for 128 characters. Windows-1252 extends ASCII with western European characters. Mismatched encoding causes garbled text — detect encoding before processing cross-platform files.