Working with Protocol Buffers in Python
This tutorial provides a basic Python programmer's introduction to working with protocol buffers
This tutorial provides a basic Python programmer's introduction to working with protocol buffers. By walking through creating a simple example application, it shows you how to
Define message formats in a .proto file.
Use the protocol buffer compiler.
Use the Python protocol buffer API to write and read messages.
Defining Your Protocol Format
To create your address book application, you'll need to start with a .proto file. The definitions in a .proto file are simple: you add a message for each data structure you want to serialize, then specify a name and a type for each field in the message. In our example, the .proto file that defines the messages is addressbook.proto. The .proto file starts with a package declaration, which helps to prevent naming conflicts between different projects
syntax = "proto3";
package tutorial;
import "google/protobuf/timestamp.proto";
Next, you have your message definitions. A message is just an aggregate containing a set of typed fields. Many standard simple data types are available as field types, including bool, int32, float, double, and string. You can also add further structure to your messages by using other message types as field types.
syntax = "proto3";
package tutorial;
import "google/protobuf/timestamp.proto";
message Person {
string name = 1;
int32 id = 2; // Unique ID number for this person.
string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
string number = 1;
PhoneType type = 2;
}
repeated PhoneNumber phones = 4;
google.protobuf.Timestamp last_updated = 5;
}
// Our address book file is just one of these.
message AddressBook {
repeated Person people = 1;
}
In the above example, the Person message contains PhoneNumber messages, while the AddressBook message contains Person messages. You can even define message types nested inside other messages – as you can see, the PhoneNumber type is defined inside Person. You can also define enum types if you want one of your fields to have one of a predefined list of values – where you want to specify that a phone number can be one of MOBILE, HOME, or WORK.
You'll find a complete guide to writing .proto files – including all the possible field types – in the Protocol Buffer Language Guide. Don't go looking for facilities similar to class inheritance, though – protocol buffers don't do that.
Compiling your protocol buffers
Now that you have a .proto, the next thing you need to do is generate the classes you'll need to read and write AddressBook (and hence Person and PhoneNumber) messages. To do this, you need to run the protocol buffer compiler protoc on your .proto:
If you haven't installed the compiler, then run following commands If you want to use latest stable version of the compiler, simply do:
apt-get install libprotobuf-dev
apt-get install protobuf-compiler
Otherwise, if you want to build from source, don't forget to replace the URL with the latest version of the protocol buffers compiler. apt-get install build-essential
tar xvfz protobuf-2.6.0.tar.gz
cd protobuf-2.6.0
./configure && make install
Run the following command to install the Python protocol buffers plugin:python install
Now run the compiler, specifying the source directory (where your application's source code lives – the current directory is used if you don't provide a value), the destination directory (where you want the generated code to go; often the same as $SRC_DIR), and the path to your .proto. In this case, you would invoke:
protoc -I=. --python_out=. ./addressbook.proto
Because you want Python classes, you use the --python_out option – similar options are provided for other supported languages.
This generates addressbook_pb2.py in your specified destination directory.
Unlike when you generate Java and C++ protocol buffer code, the Python protocol buffer compiler doesn't generate your data access code for you directly.
Instead (as you'll see if you look at addressbook_pb2.py) it generates special descriptors for all your messages, enums, and fields, and some mysteriously empty classes, one for each message type:
class Person(message.Message):
__metaclass__ = reflection.GeneratedProtocolMessageType
class PhoneNumber(message.Message):
__metaclass__ = reflection.GeneratedProtocolMessageType
DESCRIPTOR = _PERSON_PHONENUMBER
DESCRIPTOR = _PERSON
class AddressBook(message.Message):
__metaclass__ = reflection.GeneratedProtocolMessageType
DESCRIPTOR = _ADDRESSBOOK
The important line in each class is __metaclass__ = reflection.GeneratedProtocolMessageType. While the details of how Python metaclasses work is beyond the scope of this tutorial, you can think of them as like a template for creating classes.
At load time, the GeneratedProtocolMessageType metaclass uses the specified descriptors to create all the Python methods you need to work with each message type and adds them to the relevant classes. You can then use the fully-populated classes in your code.
The end effect of all this is that you can use the Person class as if it defined each field of the Message base class as a regular field. For example, you could write:
import addressbook_pb2
person = addressbook_pb2.Person()
person.id = 1234
person.name = "John Doe"
person.email = "jdoe@example.com"
phone = person.phones.add()
phone.number = "555-4321"
phone.type = addressbook_pb2.Person.HOME
Note that these assignments are not just adding arbitrary new fields to a generic Python object. If you were to try to assign a field that isn't defined in the .proto file, an AttributeError would be raised. If you assign a field to a value of the wrong type, a TypeError will be raised. Also, reading the value of a field before it has been set returns the default value.
person.no_such_field = 1 # raises AttributeError
person.id = "1234" # raises TypeError
Enums are expanded by the metaclass into a set of symbolic constants with integer values. So, for example, the constant addressbook_pb2.Person.PhoneType.WORK has the value 2.
Each message class also contains a number of other methods that let you check or manipulate the entire message, including:
IsInitialized(): checks if all the required fields have been set.
__str__(): returns a human-readable representation of the message, particularly useful for debugging. (Usually invoked as str(message) or print message.)
CopyFrom(other_msg): overwrites the message with the given message's values.
Clear(): clears all the elements back to the empty state.
Parsing and Serialization
Finally, each protocol buffer class has methods for writing and reading messages of your chosen type using the protocol buffer binary format. These include: SerializeToString(): serializes the message and returns it as a string. Note that the bytes are binary, not text; we only use the str type as a convenient container.
ParseFromString(data): parses a message from the given string.
These are just a couple of the options provided for parsing and serialization. Again, see the Message API reference for a complete list. Now let's try using your protocol buffer classes. We will create two person objects, and then use to serialization and parsing
#! /usr/bin/python
import addressbook_pb2
import sys
address_book = addressbook_pb2.AddressBook()
# Person 1
person1 = address_book.people.add()
person1.id = 1001
person1.name ="John Doe"
person1.email = "jdoe@example.com"
phone_number1 = person1.phones.add()
phone_number1.number = "12345-67890"
phone_number1.type = addressbook_pb2.Person.PhoneType.WORK
# Person 2
person2 = address_book.people.add()
person2.id = 1002
person2.name ="Alex"
person2.email = "alex@example.com"
phone_number2 = person2.phones.add()
phone_number2.number = "100122-5889"
phone_number2.type = addressbook_pb2.Person.PhoneType.WORK
# let's stringify our Address object so
# that we can use it transfer the data across services
data = address_book.SerializeToString()
# printing out our raw protobuf object
print "Raw data: ", data
# let's go the other way and parse
# our raw protobuf object we can modify
# and use
address_book = addressbook_pb2.AddressBook()
address_book.ParseFromString(data)
for person in address_book.people:
print "==========================="
print "Person ID:", person.id
print "Name:", person.name
print "E-mail:", person.email
for phone_number in person.phones:
if phone_number.type == addressbook_pb2.Person.PhoneType.MOBILE:
print "Mobile phone #: ",
elif phone_number.type == addressbook_pb2.Person.PhoneType.HOME:
print "Home phone #: ",
elif phone_number.type == addressbook_pb2.Person.PhoneType.WORK:
print "Work phone #: ",
print phone_number.number
python main.py
Raw data:
0
John Doe�jdoe@example.com"
12345-67890
,
Alex�alex@example.com"
100122-5889
===========================
Person ID: 1001
Name: John Doe
E-mail: jdoe@example.com
Work phone #: 12345-67890
===========================
Person ID: 1002
Name: Alex
E-mail: alex@example.com
Work phone #: 100122-5889
Congratulations! You are now using protocol buffers from Python.
So, in this tutorial, we had a good look at how you can get up and running with the protocol buffer data format within your own Python-based applications.
Hopefully, you found this tutorial useful, if you have any further questions or comments then please feel free to let me know in the comments section below!