Data Compression for Machine Learning

Graduate-Level Linear Programming Final Project

Technologies Used

PythonNumPyCVXPYLinear ProgrammingMachine Learning

Project Overview

How much can you compress data before a machine learning model can no longer make accurate predictions? I worked with a team of 3 graduate students to explore this question using mathematical optimization techniques from a graduate-level Linear Programming course (EC ENGR 236A) at UCLA.

The goal was to classify clothing images (T-shirts, dresses, and coats) from the Fashion-MNIST dataset while using as few bits of information as possible. Since there are 3 clothing types, the information-theoretical minimum needed to distinguish between them is 2 bits of information. Our centralized model achieved 87% accuracy using 160 bits and still maintained 67% accuracy with just 4 bits!

Key Results

87% accuracy with 160 bits (centralized)
67% accuracy with 4 bits (centralized)
97.5% reduction in data size
99.93% reduction in data size

The Problem

We worked with 1000 28×28 black-and-white pixel images of a piece of clothing — that's 784 numbers representing brightness values. Normally, a machine learning model would look at all 784 8-bit values to decide if it's a T-shirt, a dress, or a coat. That's 6272 bits of data per image!

Example images from Fashion-MNIST showing T-shirts, dresses, and coats
Example images from the 3-class Fashion-MNIST subset: T-shirt/top, dress, and coat

But the model doesn't actually need all of this data! We can do with much less per image. This is the feature compression problem, and it matters for real-world applications like:

  • Sending data over limited bandwidth networks
  • Scaling models to run on huge datasets
  • Running machine learning on devices with limited memory
  • Privacy-preserving machine learning (sending less data = less exposure)
  • Distributed systems where multiple sensors need to share compressed information

For example, suppose a model is responsible for identifying harmful content on social media. For example, Instagram handles around 100 million posts per day. At this scale, reductions in data size per image means huge cost savings.

There are a few standard approaches to feature-compression. For example, Uniform Scalar Quantization reduces the number of bits per pixel. At best, if each pixel is reduced to one bit, we can get down to 784 bits per image.

Unfortunately, the data is already quite compressed, so we can't do much better with simple methods. But we can do much better! Many of the pixels are totally unimportant for classification! Also, we're throwing away information indiscreetly! There might be some critical pixels where we need more than one bit to represent them accurately.

So instead of assigning the same number of bits to each pixel, what if we create another model to optimize the bit allocation? This model would use training data to decide which pixels are most important for classification, and allocate bits accordingly. That's exactly what we did using Integer Linear Programming.

Two Approaches

Centralized

One encoder looks at the entire image and compresses it into a fixed number of bits. This approach achieved the best results because it can see relationships across the whole image.

Best accuracy: 87% accuracy @ 160 bits
Best compression: 67% accuracy @ 4 bits

Decentralized

Four encoders each look at one quadrant of the image independently, then send their compressed data to a central classifier. This is harder because each encoder can't see what the others are doing, but it's more realistic for distributed sensor networks.

79% accuracy @ 160 bits
Extreme compression unviable

Results

The graph below shows how accuracy improves as we allow more bits for compression. Even with very few bits, the centralized model maintains reasonable accuracy, and approaches the uncompressed baseline (89.8%) with just 160 bits per image. That's a 97.5% reduction in data size!

Centralized vs decentralized accuracy comparison
Centralized vs. decentralized compression: the centralized approach consistently outperforms decentralized at the same bit budget.

Technical Approach

The project used linear programming and integer linear programming, which are mathematical optimization techniques, to find the best way to compress the data. Instead of using neural networks (which learn patterns through trial and error), we formulated the compression problem as a mathematical equation and solved it directly using CVXPY. Unlike neural networks, this approach guarantees the optimal solution given the constraints.

1. Linear Classifier Training

Train a classifier on the full 784 features using LP with regularization. This makes most feature weights zero or near-zero, identifying which pixels actually matter for classification.

2. Bit Allocation (ILP)

Use integer linear programming to decide how many bits to assign each feature. Important features get more bits; unimportant features get zero bits and are discarded entirely.

3. Quantization & Retraining

Compress feature values to their allocated bit depths, then retrain the classifier on the quantized data. The model learns to work with the compressed representation.

What I Learned

  • How to apply graduate-level optimization theory to practical machine learning problems
  • Trade-offs between compression ratio and prediction accuracy
  • Centralized vs. decentralized system architectures and their real-world implications
  • Using CVXPY for convex and integer optimization problems in Python
  • Collaborating with graduate students on a research-level engineering problem

Course Context

This was the final project for EC ENGR 236A: Linear Programming, a graduate-level course at UCLA completed in December 2025. The course covered both mathematical theory and engineering applications of linear programming, rigorously bridging the gap between my two majors (Computer Engineering and Applied Mathematics).

I worked on this project with a team of 3 graduate students. Taking graduate coursework as an undergraduate and collaborating with grad students helped me develop skills in mathematical modeling, optimization algorithms, and translating theory into working code.

David Kahdian

Builder. Problem solver.

Connect

© 2026 David Kahdian. Built with SvelteKit and Tailwind CSS.