|
|
YOUR CONTRIBUTION
Cooperative development with SVN
Unitex is developed using a SVN server that
facilitates cooperative development. If you are interested in contributing to Unitex, you should
read these explanations.
Linguistic contribution
You can contribute to the development of the RELEX resources by sending
your DELAF dictionaries, grammars or any valuable document
to unitex@univ-mlv.fr. Dictionaries should be
Unicode text files. Grammars should be Unitex GRF files.
Computing contribution
As Unitex is free software, you can modify the source code to develop your own
functionality. If you develop a new function that may be useful for
the Unitex users, please don't hesitate to contact us at the
following address unitex@univ-mlv.fr,
in case it might be suitable for integration into our standard version of
Unitex. In that case, we will give you a write access to the SVN server
that hosts the sources.
Golden rules
Here is a list of rules that Unitex contributors need to respect in order to have their
contributions integrated in the Unitex official version distributed on this web site.
Of course, the LGPL license allows anyone to modify and
distribute Unitex, but if you want to contribute to the official version maintained
at the IGM, it is asked that you respect these rules.
You will find here general programming guidelines
and instructions regarding Unitex code and resources. Thank you
for respecting these rules, so that we can make the system grow preventing it
from being a mess.
Programming rules
- Before you commit, you must do an update in order to check if your modifications generate conflicts.
- Before committing your changes, make sure that you accept the
LGPL license.
- Always explain your modifications in the commit comment.
- Unitex is and must stay cross-platform. Make sure that your code can compile and run correctly on
Windows and Linux/MacOS by respecting the following rules:
C++
- If you work under Windows, please use the Dev-C++
compiler. We chose this software because it is GPL-licensed,
and so, accessible to anyone.
- Do not forget to send the Makefile if it needs to be modified. Make sure that the Makefile
remains functional on both Windows and Linux/MacOs.
- Use only unicode functions that are in the Unitex
Unicode.cpp library. THIS IS VERY IMPORTANT, since
standard functions like fwprintf do not have the same behavior on little-endian and
big-endian systems. If you miss a function, please code it from the primitive functions of this
library.
- Do not use extended strings like
L"This a unicode string".
- Do not use
\n and \r.
Always use the Windows line break sequence, that is to say 0D 0A
in hexadecimal (in little-endian: 0D 00 0A 00).
- If you must use path delimiters, use the macro
#ifdef _NOT_UNDER_WINDOWS
in order to use \ or /, depending on
the system.
- If you need to use a library that depends on the system, use the macro
#ifdef _NOT_UNDER_WINDOWS.
Java
- Always use
File.separatorChar or
File.separator as path delimiter in file names.
- If you add a program named
Foo to Unitex that will be launched from the graphical interface, create and use
a class named FooCommand that inherits from fr.umlv.unitex.process.CommandBuilder.
Add FooCommand.class in the commands array in the the class
fr.umlv.unitex.process.CommandMenuFactory.
- Only use unicode functions that are in the
fr.umlv.unitex.io.UnicodeIO
class. This is very important, since IO functions do not have the same behavior on little-endian and
big-endian systems. If you miss a function, please code it from the primitive functions of this
library.
- After any modification on C++ files, rebuild all programs with
make clean
and make, in order to verify that all programs still compile correctly.
- Every .cpp, .h and .java file must begin with the LGPL disclaimer.
- Every C++ program launched with no parameter must show the
COPYRIGHT string
(located in Copyright.cpp), followed by its synopsis. It must return the value 0.
- Every C++ program must return 0 when terminated correctly or invoked with no parameter, and else any non-zero
value.
- Any substantial modification (such as introducing new constants, variables or functions) must
contain a comment that explains it and tells who the author is. By default, the code is from Sébastien
Paumier.
- Do not hardcode information in programs. Use configuration files instead.
- Do not put non trivial numeric constants in programs. Use constant definitions instead, with
#define directives in C++ and
static final variables in Java.
- Keep in mind that Unitex is and must stay multi-language. Do not make language-specific modifications
when they can be generalized for other languages.
- Comment your code. Java files must contain javadoc comments.
- NEVER USE
printf OR scanf:
use u_printf and u_scanf instead.
- Error messages must be written with the functions of the
Error.cpp library in C++ and
System.err.println(...) in Java.
Other messages must be written in the standard output with
u_printf(...) in C++ and
System.out.println(...) in Java.
- If you need to give unicode arguments to a program (for instance
Reg2Grf),
enclose the arguments into a unicode text file and give the file name as an argument to the program instead.
- If you modify a program, preserve compatibility as far as possible. Keep in mind that some users
call Unitex programs in scripts and do not want to modify them without a very good reason.
- If you create a program or modify the behavior of an existing one, send a detailed description in order to
update Unitex manual. Do the same if you introduce a new configuration or data file.
- Only use 7-bits ASCII characters in file names.
- Beware of letter case in file names, since inconsistencies will lead to errors under Linux-like systems.
- String literals in C++ programs will be supposed to contain only 7-bits ASCII characters. If you use
accented letters, some compilers may interpret them as part of UTF-8 character definitions.
- The very first instruction of the
main function of a C++ program
must be a call to the setBufferMode() function, located in
the IOBuffer.cpp library. If this is not
respected, the Java frame that displays the standard output of programs may
display it asynchronously.
- All C++ programs must conform to the following naming rules. Let us take a program named
Foo. There must be the 3 following files:
-
Main_Foo.cpp: contains the call to setBufferMode(),
followed by a call to the main_Foo() function, and NOTHING MORE.
-
Foo.f: declares the main_Foo() function.
-
Foo.cpp: defines the main_Foo() function.
The underlying idea is to exclude all the Main_Foo.cpp files when compiling Unitex
as a library.
- All C++ programs must parse their parameters with the
UnitexGetOpt.cpp library included in
Unitex. DO NOT USE #include <getopt.h>, since this library is not ported
under Windows. Use #include "UnitexGetOpt.h" instead.
- Every call to
malloc, calloc,
realloc and strdup
must be followed by a NULL test.
If the test fails, the fatal_alloc_error function must be used to
report the problem.
Linguistic resources rules
- If you send linguistic resources, make sure that you accept the LGPLLR license.
- If you send a dictionary, send the .dic version and a .txt file that describes its content
(description of codes used in it, number of entries, authors, ...).
- If you send graphs, send the .grf versions. Put information about the authors
in a comment box.
- Only use 7-bits ASCII characters in file names.
- Beware of letter case in file names, since inconsistencies will lead to errors under Linux-like systems.
- If you want to add a new language to Unitex, you need to send the language directory,
observing the following rules:
- Send a complete directory structure, even if some directories are empty (the best way to do it is
to copy an existing language directory and to delete all the files it contains).
- The name of the language directory must be the capitalized English name of the language. If the language
is a variant of an existing language, put a description of the variant between parenthesis. For instance:
Portuguese (Brazil).
- You must provide an alphabet file.
- You must provide a sample unicode text file. Beware of the copyright of this text. To avoid
problems, we recommend to provide a classical novel that is in the public domain.
- You must provide a sample dictionary that should contain at least the words of the sample text.
- You should provide a sentence delimitation graph, even if it is trivial.
- Any extra resource is welcome.
Configuring Eclipse
Unitex is developped with free tools. In this section, we explain how to install everything you need to
reproduce the environment used by Unitex's main developper.
Step 0: installing g++
First, you need to install g++.
Under Linux/MacOS, you should know how to do it. Under Windows, you should use the
Dev-C++ package. If you install it in the default location
(C:\Dev-Cpp), everything will work perfectly when importing
Unitex C/C++ sources from SVN. You just have to add C:\Dev-Cpp\bin
to your PATH environment variable.
Step 1: installing Eclipse
The latest version of Eclipse is called Galileo. It can be found at
http://www.eclipse.org/downloads/. First, download the
Eclipse IDE for C/C++ Developers for your operating system:
Then, you just have to unzip it, where you want.
Step 2: chosing your workspace
Go into the eclipse directory and launch it (eclipse.exe). Then, chose a workspace directory, where you want:
Step 3: installing Subclipse
Now, you need to install the SVN client Subclipse. Go in "Help > Install New Software...".
Click on the "Add..." button to see the "Add site" frame. Fill the "Name" frame with
Subclipse and the "Location" frame with
http://subclipse.tigris.org/update_1.6.x, and click on the
"OK" button.
Then, in the "Work with" frame, select the Subclipse line and check all three boxes:
Click on "Next", and say that you accept licenses when asked. After a while, the plugin will be installed,
and you will be asked for restarting Eclipse. Accept. Then, go in "Windows > Preferences". In the "Team > SVN" tab,
select SVNKit in the "SVN interface" combo box. Then, click on the "Apply" button.
Step 4: importing Unitex C/C++ sources
Click on "File > Import...". Then, in the SVN tab, click on "Checkout Projects from SVN". Then, use the
following repository address:
https://svnigm.univ-mlv.fr/svn/unitex
When asked, enter your login and password. Anonymous checkout is permitted with
login=password=anonsvn. Then, select the Unitex-C++ folder:
If you work under Windows and if you have installed Dev-C++ as explained in Step 0, sources should be automatically
recompiled immediately after the checkout. Otherwise, you have to right-click on the "C++" folder, then on the
"Properties" tab, and then on "C/C++ Make Project". If you do not work under Windows, you should then replace the existing
build command make SYSTEM=windows by just make. Do not
forget to click on the "Apply" button.
Step 5: importing Unitex Java sources
You must first configure Eclipse for
Java. Return in "Help > Install New Software..." and select the "Galileo" line in the "Work with" combo box. Then, in the
"Programming Languages" tab, select "Eclipse Java Development", and let's install the plugin and restart Eclipse.
Then, repeat the SVN import procedure, but select the Unitex-Java folder.
Your suggestions
Please don't hesitate to mail us any comment, suggestion, bug report or
question. A free software like Unitex grows with the participation
of the users, so any contribution is welcome.
Last modification on this page: February 09, 2011 |