BigARTM Developer’s Guide

This document describes the development process of BigARTM library.

You should not follow this guide if you are using pre-built BigARTM library via command-line interface or from Python environment. (refer to to Basic BigARTM tutorial for Windows users or Basic BigARTM tutorial for Linux and Mac OS-X users depending on your operating system).

Downloads (Windows)

Download and install the following tools:

All explicit links are given just for convenience if you are setting up new environment. You are free to choose other versions or tools, and most likely they will work just fine for BigARTM. Remember to match the following: * Visual Studio version must match Boost binaries version, unless you build Boost yourself * Use the same configuration (32 bit or 64 bit) for your Python and BigARTM binaries

Source code

BigARTM is hosted in public GitHub repository:

https://github.com/bigartm/bigartm

We maintain two branches: master and stable. master branch is the latest source code, potentially including some unfinished features. stable branch will be lagging behind master, and moved forward to master as soon as mainteiners decide that it is ready. Typically this should happen at the end of each month. At the same point we will introduce a new tag (something like v0.7.3 ) and produce a new release for Windows. In addition, stable branch also might receive small urgent fixes in between releases, typically to address critical issues reported by our users. Such fixes will be also included in master branch.

To contribute a fix you should fork the repository, code your fix and submit a pull request. against master branch. All pull requests are regularly monitored by BigARTM maintainers and will be soon merged. Please, keep monitoring the status of your pull request on travis, which is a continuous integration system used by BigARTM project.

Build C++ code on Windows

The following steps describe the procedure to build BigARTM’s C++ code on Windows.

  • Download and install GitHub for Windows.

  • Clone https://github.com/bigartm/bigartm/ repository to any location on your computer. This location is further refered to as $(BIGARTM_ROOT).

  • Download and install Visual Studio 2012 or any newer version. BigARTM will compile just fine with any edition, including any Visual Studio Express edition (available at www.visualstudio.com).

  • Install CMake (tested with cmake-3.0.1, Win32 Installer).

    Make sure that CMake executable is added to the PATH environmental variable. To achieve this either select the option “Add CMake to the system PATH for all users” during installation of CMake, or add it to the PATH manually.

  • Download and install Boost 1.55 or any newer version.

    We suggest to use the Prebuilt Windows Binaries. Make sure to select version that match your version of Visual Studio. You may choose to work with either x64 or Win32 configuration, both of them are supported.

  • Configure system variables BOOST_ROOT and Boost_LIBRARY_DIR.

    If you have installed boost from the link above, and used the default location, then the setting should look similar to this:

    setx BOOST_ROOT C:\local\boost_1_56_0
    setx BOOST_LIBRARYDIR C:\local\boost_1_56_0\lib32-msvc-12.0
    

    For all future details please refer to the documentation of FindBoost module. We also encourage new CMake users to step through CMake tutorial.

  • Install Python 2.7 (tested with Python 2.7.6).

    You may choose to work with either x64 or Win32 version of the Python, but make sure this matches the configuration of BigARTM you have choosed earlier. The x64 installation of python will be incompatible with 32 bit BigARTM, and virse versus.

  • Use CMake to generate Visual Studio projects and solution files. To do so, open a command prompt, change working directory to $(BIGARTM_ROOT) and execute the following commands:

    mkdir build
    cd build
    cmake ..
    

    You might have to explicitly specify the cmake generator, especially if you are working with x64 configuration. To do so, use the following syntax:

    cmake .. -G"Visual Studio 12 Win64"
    

    CMake will generate Visual Studio under $(BIGARTM_ROOT)/build/.

  • Open generated solution in Visual Studio and build it as you would usually build any other Visual Studio solution. You may also use MSBuild from Visual Studio command prompt.

    The build will output result into the following folders:

    • $(BIGARTM_ROOT)/build/bin/[Debug|Release] — binaries (.dll and .exe)
    • $(BIGARTM_ROOT)/build/lib/[Debug|Release] — static libraries

At this point you should be able to run BigARTM tests, located here: $(BIGARTM_ROOT)/build/bin/*/artm_tests.exe.

Python code on Windows

  • Install Python 2.7 (this step is already done if you are following the instructions above),

  • Add Python to the PATH environmental variable

    http://stackoverflow.com/questions/6318156/adding-python-path-on-windows-7

  • Follow the instructions in README file in directory $(BIGARTM_ROOT)/3rdparty/protobuf/python/. In brief, this instructions ask you to run the following commands:

    python setup.py build
    python setup.py test
    python setup.py install
    

    On second step you fill see two failing tests:

    Ran 216 tests in 1.252s
    FAILED (failures=2)
    

    This 2 failures are OK to ignore.

At this point you should be able to run BigARTM tests for Python, located under $(BIGARTM_ROOT)/python/tests/.

  • [Optional] Download and add to MSVS Python Tools 2.0. All necessary instructions can be found at https://pytools.codeplex.com/. This will allow you debug you Python scripts using Visual Studio. You may start with the following solution: $(BIGARTM_ROOT)/src/artm_vs2012.sln.

Build C++ code on Linux

Refer to Basic BigARTM tutorial for Linux and Mac OS-X users.

Working with iPython notebooks remotely

It turned out to be common scenario to run BigARTM on a Linux server (for example on Amazon EC2), while connecting to it from Windows through putty. Here is a convenient way to use ipython notebook in this scenario:

  1. Connect to the Linux machine via putty. Putty needs to be configured with dynamic tunnel for port 8888 as describe here on this page (port 8888 is a default port for ipython notebook). The same page describes how to configure internet properties:

    Clicking on Settings in Internet Explorer, or Proxy Settings in Google Chrome, should open this dialogue. Navigate through to the Advanced Proxy section and add localhost:9090 as a SOCKS Proxy.

  2. Start ipython notebook in your putty terminal.

  3. Open your favourite browser on Windows, and go to http://localhost:8888. Enjoy your notebook while the engine runs on remotely :)

Working with iPython notebooks remotely

Compiling .proto files on Windows

  1. Open a new command prompt

  2. Copy the following file into $(BIGARTM_ROOT)/src/

    • $(BIGARTM_ROOT)/build/bin/CONFIG/protoc.exe

    Here CONFIG can be either Debug or Release (both options will work equally well).

  3. Change working directory to $(BIGARTM_ROOT)/src/

  4. Run the following commands

    .\protoc.exe --cpp_out=. --python_out=. .\artm\messages.proto
    .\protoc.exe --cpp_out=. .\artm\core\internals.proto
    

Code style

In the code we follow google code style with the following changes:

  • Exceptions are allowed
  • Indentation must be 2 spaces. Tabs are not allowed.
  • No lines should exceed 120 characters.

All .h and .cpp files under $(BIGARTM_ROOT)/src/artm/ must be verified for code style with cpplint.py script. Files, generated by protobuf compiler, are the only exceptions from this rule.

To run the script you need some version of Python installed on your machine. Then execute the script like this:

python cpplint.py --linelength=120 <filename>

On Windows you may run this master-script to check all required files:

$(BIGARTM_ROOT/utils/cpplint_all.bat.