BigARTM Developer’s Guide

This document describes the development process of BigARTM library.

You should not follow this guide if you are using pre-built BigARTM library via command-line interface or from Python environment. (refer to to Basic BigARTM tutorial for Windows users or Basic BigARTM tutorial for Linux and Mac OS-X users depending on your operating system).

Downloads (Windows)

Download and install the following tools:

All explicit links are given just for convenience if you are setting up new environment. You are free to choose other versions or tools, and most likely they will work just fine for BigARTM. Remember to match the following: * Visual Studio version must match Boost binaries version, unless you build Boost yourself * Use the same configuration (32 bit or 64 bit) for your Python and BigARTM binaries

Source code

BigARTM is hosted in public GitHub repository:

https://github.com/bigartm/bigartm

To contribute a fix you should fork the repository, code your fix and submit a pull request. All pull requests are regularly monitored by BigARTM maintainers and will be soon merged into BigARTM’s master branch. Please, keep monitoring the status of your pull request on travis, which is a continuous integration system used by BigARTM project.

Build C++ code on Windows

The following steps describe the procedure to build BigARTM’s C++ code on Windows.

  • Download and install GitHub for Windows.

  • Clone https://github.com/bigartm/bigartm/ repository to any location on your computer. This location is further refered to as $(BIGARTM_ROOT).

  • Download and install Visual Studio 2012 or any newer version. BigARTM will compile just fine with any edition, including any Visual Studio Express edition (available at www.visualstudio.com).

  • Install CMake (tested with cmake-3.0.1, Win32 Installer).

    Make sure that CMake executable is added to the PATH environmental variable. To achieve this either select the option “Add CMake to the system PATH for all users” during installation of CMake, or add it to the PATH manually.

  • Download and install Boost 1.55 or any newer version.

    We suggest to use the Prebuilt Windows Binaries. Make sure to select version that match your version of Visual Studio. You may choose to work with either x64 or Win32 configuration, both of them are supported.

  • Configure system variables BOOST_ROOT and Boost_LIBRARY_DIR.

    If you have installed boost from the link above, and used the default location, then the setting should look similar to this:

    setx BOOST_ROOT C:\local\boost_1_56_0
    setx BOOST_LIBRARYDIR C:\local\boost_1_56_0\lib32-msvc-12.0
    

    For all future details please refer to the documentation of FindBoost module. We also encourage new CMake users to step through CMake tutorial.

  • Install Python 2.7 (tested with Python 2.7.6).

    You may choose to work with either x64 or Win32 version of the Python, but make sure this matches the configuration of BigARTM you have choosed earlier. The x64 installation of python will be incompatible with 32 bit BigARTM, and virse versus.

  • Use CMake to generate Visual Studio projects and solution files. To do so, open a command prompt, change working directory to $(BIGARTM_ROOT) and execute the following commands:

    mkdir build
    cd build
    cmake ..
    

    You might have to explicitly specify the cmake generator, especially if you are working with x64 configuration. To do so, use the following syntax:

    cmake .. -G"Visual Studio 12 Win64"
    

    CMake will generate Visual Studio under $(BIGARTM_ROOT)/build/.

  • Open generated solution in Visual Studio and build it as you would usually build any other Visual Studio solution. You may also use MSBuild from Visual Studio command prompt.

    The build will output result into the following folders:

    • $(BIGARTM_ROOT)/build/bin/[Debug|Release] — binaries (.dll and .exe)
    • $(BIGARTM_ROOT)/build/lib/[Debug|Release] — static libraries

At this point you should be able to run BigARTM tests, located here: $(BIGARTM_ROOT)/build/bin/*/artm_tests.exe.

Python code on Windows

  • Install Python 2.7 (this step is already done if you are following the instructions above),

  • Add Python to the PATH environmental variable

    http://stackoverflow.com/questions/6318156/adding-python-path-on-windows-7

  • Follow the instructions in README file in directory $(BIGARTM_ROOT)/3rdparty/protobuf/python/. In brief, this instructions ask you to run the following commands:

    python setup.py build
    python setup.py test
    python setup.py install
    

    On second step you fill see two failing tests:

    Ran 216 tests in 1.252s
    FAILED (failures=2)
    

    This 2 failures are OK to ignore.

At this point you should be able to run BigARTM tests for Python, located here: $(BIGARTM_ROOT)/src/python_tests/python_tests.py.

  • [Optional] Download and add to MSVS Python Tools 2.0. All necessary instructions can be found at https://pytools.codeplex.com/. This will allow you debug you Python scripts using Visual Studio. You may start with the following solution: $(BIGARTM_ROOT)/src/artm_vs2012.sln.

Build C++ code on Linux

Simply run CMake on from the root of the project.

The following script had been tested in Ubuntu.

sudo apt-get install git make cmake build-essential libboost-all-dev -q -y
git clone https://github.com/bigartm/bigartm
cd ~/bigartm
mkdir build
cd build
cmake ..
make -j8

~/bigartm/build/src/artm_tests/artm_tests

It is also possible to use BigARTM from Python on Linux. Just make sure to setup protobuf library as described $(BIGARTM_ROOT)/3rdparty/protobuf/python/README, and then you can simply run python scripts under $(BIGARTM_ROOT)/python_tests/ or $(BIGARTM_ROOT)/python_client/.

Compiling .proto files on Windows

  1. Open a new command prompt

  2. Copy the following file into $(BIGARTM_ROOT)/src/

    • $(BIGARTM_ROOT)/build/bin/CONFIG/protoc.exe

    Here CONFIG can be either Debug or Release (both options will work equally well).

  3. Change working directory to $(BIGARTM_ROOT)/src/

  4. Run the following commands

    .\protoc.exe --cpp_out=. --python_out=. .\artm\messages.proto
    .\protoc.exe --cpp_out=. .\artm\core\internals.proto
    

Code style

In the code we follow google code style with the following changes:

  • Exceptions are allowed
  • Indentation must be 2 spaces. Tabs are not allowed.
  • No lines should exceed 100 characters.

All .h and .cpp files under $(BIGARTM_ROOT)/src/artm/ must be verified for code style with cpplint.py script. Files, generated by protobuf compiler, are the only exceptions from this rule.

To run the script you need some version of Python installed on your machine. Then execute the script like this:

python cpplint.py --linelength=100 <filename>

On Windows you may run this master-script to check all required files:

$(BIGARTM_ROOT/utils/cpplint_all.bat.

Intel Math Kernel Library

BigARTM can utilize Intel Math Kernel Library to achieve better performance. This only applies when ModelConfig.use_sparse_bow is false (in sparse version BigARTM has better built-in algorithm that does not use Intel MKL).

To enable MKL usage on Windows add the path to MKL library to your PATH system variable

set PATH=%PATH%;"C:\Program Files (x86)\Intel\Composer XE 2013 SP1\redist\intel64\mkl"

To enable MKL usage on Linux create a new system variable MKL_PATH and set it as follows

export MKL_PATH="/opt/intel/mkl/lib/intel64/"