Working with Protocol Buffers in Python

This tutorial provides a basic Python programmer's introduction to working with protocol buffers

This tutorial provides a basic Python programmer's introduction to working with protocol buffers. By walking through creating a simple example application, it shows you how to
  • Define message formats in a .proto file.
  • Use the protocol buffer compiler.
  • Use the Python protocol buffer API to write and read messages.


This isn't a comprehensive guide to using protocol buffers in Go. For more detailed reference information, see the Protocol Buffer Language Guide.
. . .

Defining Your Protocol Format

To create your address book application, you'll need to start with a .proto file. The definitions in a .proto file are simple: you add a message for each data structure you want to serialize, then specify a name and a type for each field in the message. In our example, the .proto file that defines the messages is addressbook.proto.

The .proto file starts with a package declaration, which helps to prevent naming conflicts between different projects
syntax = "proto3";
package tutorial;

import "google/protobuf/timestamp.proto";
Next, you have your message definitions. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including bool, int32, float, double, and string. You can also add further structure to your messages by using other message types as field types.

syntax = "proto3";
package tutorial;

import "google/protobuf/timestamp.proto";

message Person {
string name = 1;
int32 id = 2; // Unique ID number for this person.
string email = 3;

enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}

message PhoneNumber {
string number = 1;
PhoneType type = 2;
}

repeated PhoneNumber phones = 4;

google.protobuf.Timestamp last_updated = 5;
}

// Our address book file is just one of these.
message AddressBook {
repeated Person people = 1;
}
In the above example, the Person message contains PhoneNumber messages, while the AddressBook message contains Person messages. You can even define message types nested inside other messages – as you can see, the PhoneNumber type is defined inside Person. You can also define enum types if you want one of your fields to have one of a predefined list of values – where you want to specify that a phone number can be one of MOBILE, HOME, or WORK. You'll find a complete guide to writing .proto files – including all the possible field types – in the Protocol Buffer Language Guide. Don't go looking for facilities similar to class inheritance, though – protocol buffers don't do that.
. . .

Compiling your protocol buffers

Now that you have a .proto, the next thing you need to do is generate the classes you'll need to read and write AddressBook (and hence Person and PhoneNumber) messages. To do this, you need to run the protocol buffer compiler protoc on your .proto:
If you haven't installed the compiler, then run following commands If you want to use latest stable version of the compiler, simply do:
apt-get install libprotobuf-dev
apt-get install protobuf-compiler
Otherwise, if you want to build from source, don't forget to replace the URL with the latest version of the protocol buffers compiler.
apt-get install build-essential
tar xvfz protobuf-2.6.0.tar.gz
cd protobuf-2.6.0
./configure && make install

Run the following command to install the Python protocol buffers plugin:python install
pip install protobuf

Now run the compiler, specifying the source directory (where your application's source code lives – the current directory is used if you don't provide a value), the destination directory (where you want the generated code to go; often the same as $SRC_DIR), and the path to your .proto. In this case, you would invoke:
protoc -I=. --python_out=. ./addressbook.proto
Because you want Python classes, you use the --python_out option – similar options are provided for other supported languages.

This generates addressbook_pb2.py in your specified destination directory.
. . .

The Protocol Buffer API

Unlike when you generate Java and C++ protocol buffer code, the Python protocol buffer compiler doesn't generate your data access code for you directly.

Instead (as you'll see if you look at addressbook_pb2.py) it generates special descriptors for all your messages, enums, and fields, and some mysteriously empty classes, one for each message type:
class Person(message.Message): __metaclass__ = reflection.GeneratedProtocolMessageType class PhoneNumber(message.Message): __metaclass__ = reflection.GeneratedProtocolMessageType DESCRIPTOR = _PERSON_PHONENUMBER DESCRIPTOR = _PERSON class AddressBook(message.Message): __metaclass__ = reflection.GeneratedProtocolMessageType DESCRIPTOR = _ADDRESSBOOK

The important line in each class is __metaclass__ = reflection.GeneratedProtocolMessageType. While the details of how Python metaclasses work is beyond the scope of this tutorial, you can think of them as like a template for creating classes.

At load time, the GeneratedProtocolMessageType metaclass uses the specified descriptors to create all the Python methods you need to work with each message type and adds them to the relevant classes. You can then use the fully-populated classes in your code.

The end effect of all this is that you can use the Person class as if it defined each field of the Message base class as a regular field. For example, you could write:
import addressbook_pb2 person = addressbook_pb2.Person() person.id = 1234 person.name = "John Doe" person.email = "jdoe@example.com" phone = person.phones.add() phone.number = "555-4321" phone.type = addressbook_pb2.Person.HOME

Note that these assignments are not just adding arbitrary new fields to a generic Python object. If you were to try to assign a field that isn't defined in the .proto file, an AttributeError would be raised. If you assign a field to a value of the wrong type, a TypeError will be raised. Also, reading the value of a field before it has been set returns the default value.
person.no_such_field = 1 # raises AttributeError person.id = "1234" # raises TypeError

For more information on exactly what members the protocol compiler generates for any particular field definition, see the Python generated code reference.

Enums

Enums are expanded by the metaclass into a set of symbolic constants with integer values. So, for example, the constant addressbook_pb2.Person.PhoneType.WORK has the value 2.

Standard Message Methods

Each message class also contains a number of other methods that let you check or manipulate the entire message, including:
  • IsInitialized(): checks if all the required fields have been set.
  • __str__(): returns a human-readable representation of the message, particularly useful for debugging. (Usually invoked as str(message) or print message.)
  • CopyFrom(other_msg): overwrites the message with the given message's values.
  • Clear(): clears all the elements back to the empty state.

These methods implement the Message interface. For more information, see the complete API documentation for Message.
. . .

Parsing and Serialization

Finally, each protocol buffer class has methods for writing and reading messages of your chosen type using the protocol buffer binary format. These include:
  • SerializeToString(): serializes the message and returns it as a string. Note that the bytes are binary, not text; we only use the str type as a convenient container.
  • ParseFromString(data): parses a message from the given string.

These are just a couple of the options provided for parsing and serialization. Again, see the Message API reference for a complete list.

Now let's try using your protocol buffer classes. We will create two person objects, and then use to serialization and parsing

#! /usr/bin/python import addressbook_pb2 import sys address_book = addressbook_pb2.AddressBook() # Person 1 person1 = address_book.people.add() person1.id = 1001 person1.name ="John Doe" person1.email = "jdoe@example.com" phone_number1 = person1.phones.add() phone_number1.number = "12345-67890" phone_number1.type = addressbook_pb2.Person.PhoneType.WORK # Person 2 person2 = address_book.people.add() person2.id = 1002 person2.name ="Alex" person2.email = "alex@example.com" phone_number2 = person2.phones.add() phone_number2.number = "100122-5889" phone_number2.type = addressbook_pb2.Person.PhoneType.WORK # let's stringify our Address object so # that we can use it transfer the data across services data = address_book.SerializeToString() # printing out our raw protobuf object print "Raw data: ", data # let's go the other way and parse # our raw protobuf object we can modify # and use address_book = addressbook_pb2.AddressBook() address_book.ParseFromString(data) for person in address_book.people: print "===========================" print "Person ID:", person.id print "Name:", person.name print "E-mail:", person.email for phone_number in person.phones: if phone_number.type == addressbook_pb2.Person.PhoneType.MOBILE: print "Mobile phone #: ", elif phone_number.type == addressbook_pb2.Person.PhoneType.HOME: print "Home phone #: ", elif phone_number.type == addressbook_pb2.Person.PhoneType.WORK: print "Work phone #: ", print phone_number.number

Now lets run our main.py
python main.py
Raw data:
0
John Doe�jdoe@example.com"

12345-67890
,
Alex�alex@example.com"

100122-5889
===========================
Person ID: 1001
Name: John Doe
E-mail: jdoe@example.com
Work phone #: 12345-67890
===========================
Person ID: 1002
Name: Alex
E-mail: alex@example.com
Work phone #: 100122-5889

Congratulations! You are now using protocol buffers from Python.
. . .

Conclusion

So, in this tutorial, we had a good look at how you can get up and running with the protocol buffer data format within your own Python-based applications. Hopefully, you found this tutorial useful, if you have any further questions or comments then please feel free to let me know in the comments section below!
Final code for this tutorial can be found here: https://github.com/gufranmirza/python-pb

On a mission to build Next-Gen Community Platform for Developers