3 OCHP - Open Clearing House Protocol
Tải bản đầy đủ - 0trang
Design of a Reconfigurable Parallel Nonlinear Boolean Function …
135
Grain-128
Filtering function
17
3
13
Grain-128
Feedback function
20
2
13
Trivium
Feedback function
5
2
4
pomaranch
Filtering function
6
3
9
Decim
Filtering function
14
2
92
In order to use the nonlinear Boolean function in stream ciphers, they must fit the
corresponding cryptographic characteristics. According to the analysis above, we can
summary some characteristics of nonlinear Boolean function as follows:
(1) When the number of variables is less, the number of AND terms is more. In
order to increase the complexity of nonlinear Boolean function, when the number of
variables is few, the expression is inevitable complex. This characteristic increases the
difficulty of code breaking and improves the security of cryptographic algorithms.
(2) When the number of variable is more, the number of AND terms is less. In the
operation of Boolean function, the calculation of high order AND terms is always the
bottleneck, in order to improve the speed of algorithm, on the basis of ensuring the
security, we can decrease the number of AND terms as far as possible.
(3) High order AND terms and low order AND terms have a relationship with the
inclusion. When the number of input variables is already determined, high order AND
terms and low order AND terms must have a inclusion relationship, in the process of
calculation, we can utilize the inclusion relationship designing the hardware to
improve the processing efficiency of algorithms.
3
Design of Reconfigurable Hardware of Nonlinear Boolean
Function
According to the calculation characteristics of nonlinear Boolean function
analyzed above, the reconfigurable nonlinear Boolean function of stream cipher
algorithms can be designed and realized with three parts: a kind of improved
ALM (Adaptive Logic Module) is used to realize the reconfigurable hardware of
low order AND terms which account for a large proportion; Tree-like network
structure is used to realize the reconfigurable design of high order AND terms and
output XOR network; The linear part of nonlinear Boolean function can be
accomplished parallel with nonlinear part. Among them the improved ALM
circuit is designed on the basis of characteristics (1) of Boolean function, the
136
S. Yang
Tree-like network is designed on the basis of characteristics (2) and (3) of
Boolean function. The reconfigurable hardware structure of nonlinear Boolean
function is shown as Fig. 1.
Configurabtion
of AND
Input data
Nonlinear part
...
...
High order AND
High order AND
&
Reconfiguration of
AND times
Reconfiguration
of network
...
...
...
...
Low order AND
Low order AND
&
Linear part
...
...
Reconfigurable XOR network
Output of Boolean function
Fig. 1. Reconfigurable hardware structure of nonlinear Boolean function
3.1
Reconfigurable Design of Low Order AND Terms
Through the statistical research on the public cryptographic algorithms, we find
that the times of AND terms are not more than 10 times in many stream cipher
algorithms, so how to design and realize the low order AND terms has realistic
significance.
For any expression of nonlinear Boolean function, the transformation from
arbitrary form of nonlinear Boolean function to standard algebraic form is a
complicated process by using programmable AND-OR array, for example, as to n
input random nonlinear Boolean function, if we transform it to standard algebraic
form, it needs to calculate 2n modulus, with the increase of n, the calculation will
be very complex, and the storage resources occupied by modulus will grow
exponentially. So we consider using LUT to realize the low order AND terms. For
LUT can realize any N input logic function, the time delay is small and each input
is logically equivalent, so it is advantageous to realize the mapping algorithms
and we just need to consider the requirements of input and output terminals.
However, the LUT is actually a memory, for N input LUT requires 2N storage
units. With the increase of the input, the scale of LUT increases exponentially and
area becomes larger. Therefore, in the actual design, we need to value the number
of LUT input and take a more reasonable value.
Design of a Reconfigurable Parallel Nonlinear Boolean Function …
137
Combined with the structure characteristics of nonlinear Boolean function in
stream cipher algorithms and idea of programmable logic module in the circuit of
FPGA, this paper proposes an improved ALM structure with 5 input variables
LUT to realize low order AND terms. The structure of improved ALM is shown
as Fig. 2.
b
c0
LUT0
0
1
LUT1
0
1
LUT2
0
1
LUT3
0
1
a
Input data
4 bit
4 bit
4 bit
4 bit
4 bit
4 bit
4 bit
4 bit
LUT4
0
1
LUT5
0
1
LUT6
0
1
LUT7
0
1
c1
d0
e0
0
1
0
1
F0(a,b,c0,d0,e0)
0
1
F1(a,b,c1,d1,e1)
0
1
0
1
0
1
d1
e1
Fig. 2. The structure of improved ALM
The improved ALM circuit designed in the paper can realize reconfigurable
nonlinear Boolean function with strong adaptation ability by changing the
configuration information. The reconstruct ability is as shown in Table 2.
Table 2.
Reconstruct ability of improved ALM circuit
Type of function
ALM_Config
Output of function
4 variables
c0 = 0
ALM_Dataout0=F40(a,b,d0,e0)
c1 = 1
ALM_Dataout1=F41(a,b,d1,e1)
c0=c0
ALM_Dataout0=F50(a,b,c0,d0,e0)
c1=c1
ALM_Dataout1=F51(a,b,c1,d1,e1)
5 variables
This structure has these reconfigurable characteristics:
138
S. Yang
(1) It can realize a Boolean function of any one of the five input variables, for
example ALM_Dataout0=F50(a,b,c0,d0,e0) or ALM_Dataout1=F51(a,b,c1,d1,e1).
The storage resources are monopolized by Boolean function.
(2) It can simultaneously achieve two Boolean functions of five input variables,
but the function needs to have two identical variables, and the other three
variables have the same expression, such as ALM_Dataout0=F50(a,b,c0,d0,e0) and
ALM_Dataout1=F51(a,b,c1,d1,e1). These two Boolean functions reuse the storage
unit.
(3) It can realize two Boolean functions of four input variables, through
choosing the corresponding terminal, the expression has some flexibility, such as
ALM_Dataout0=F40(a,b,d0,e0) and ALM_Dataout1=F41(a,b,d1,e1). Each Boolean
function monopolizes four LUT units.
(4) According to the requirements of algorithms, we can reconstruct
reconfigurable circuit with better adaptation ability by increasing the number of
LUT units and the steps of MUX.
For two Boolean functions of five variables with the same structure, the
realization of FPGA needs two 32 bit LUT units and 64 MUX units, while our
structure just needs one 32 bit LUT units and 38 MUX units, the area savings rate
reaches 50%, and the time delay has not changed. So our design has a good
applicability for the nonlinear Boolean function with few variables and high
repetition rate.
3.2
Reconfigurable Design of High Order AND Terms
Statistical analysis shows that realization of the high order AND terms are the
critical path and bottleneck problem in the nonlinear Boolean function. Through
the choice of configuration information, our design is to calculate the relationship
between the AND terms in advance, then we adopt tree like structure to generate
the high order AND terms based on the configuration information.
Design of a Reconfigurable Parallel Nonlinear Boolean Function …
Input data
Dn
Ā1ā
Configurabtion
of AND
0 1
Dn-1
0 1
Dn-1
0 1
&
0 1
&
&
Dn-2
...
139
D3
0 1
...
...
D2
0 1
D1
0 1
&
D0
0 1
&
&
&
Output data
Fig. 3. The structure of reconfigurable high order AND terms
The structure of reconfigurable high order AND terms is shown as Fig. 3. By
setting the data selector logic, the structure can accomplish any AND logic with
arbitrary variables, when the input data which may come from the state value of
shift register is not the effective variable in the AND logic, the data selector will
select constant “1” entering to the next level circuits under the control of
configuration information. Due to the constant “1” does not change the output of
AND logic, so it will not affect the transmission of effective variables down to the
next level circuits, then we can achieve any AND logic with arbitrary variables in
the shift register and complete the refactoring operation of AND logic in the
overall XOR logic. Through the control of configuration information, the
structure can reuse the logical resources and time delay, and finally achieve the
goal of improving the utilization ratio of resources and computing efficiency.
3.3
Reconfigurable Design of Output Network
To obtain the output of the final function operation, the reconfigurable output
network of nonlinear Boolean function is to XOR each AND terms, for different
algorithms, the number of the XOR terms is different, so through reconfigurable
design, we can improve the computing speed of nonlinear Boolean function based
on realization of the reconfigurable output network. It is assumed that the
nonlinear Boolean function has p XOR terms, in the traditional implementations
they set p as controller node and use the p-1 XOR gate cascade output, the overall
time delay of the output network is a level of AND gate and p-1 levels of XOR
gate, the logic resources of the design are p AND gates and p-1 XOR gates. With
140
S. Yang
the increase of the number of AND terms, the time delay will increase very
obviously.
Based on the analysis of the characteristics of the above implementations, this
paper proposes an optimized implementation method based on tree structure. As
shown in Fig. 4, it is assumed that the nonlinear Boolean function has p XOR
terms, the first level of tree structure has p/2 XOR terms, the second level has p/4
XOR terms, the n-th level has p/2n XOR terms, then the logic resources finally
are p AND gates and p-1 XOR gates, the output delay of the circuit is a level of
AND gate and log2p levels of XOR gate.
Output of AND
Configuration
of XOR
&
&
&
&
&
&
&
&
Output of XOR
Fig. 4. The structure of reconfigurable output network
Compared with the computing result of traditional implementation way, the
reconfigurable tree output network proposed in this paper can reduce the time
delay from p-1 levels of XOR gate to log2p levels of XOR gate under the constant
of the logic resources and configuration information, and the optimization effect
will be more obvious when the number of terms is more.
4
4.1
Performance and Analysis
Performance of This Design
Based on the analysis above, the prototype has been accomplished with RTL
description using Verilog language and synthesized by Quartus II 10.0 form
Altera Corporation, the prototype has been verified successfully, the result shows
that our design can realize the nonlinear Boolean function of random variables
and times in the 80 levels of cipher algorithms, Table 3 gives the clock frequency
and resource occupancy when the number of variables are 40, 60 and 80.
Design of a Reconfigurable Parallel Nonlinear Boolean Function …
141
Furthermore, our design has been synthesized under 0.18Pm CMOS process
using Synopsys Design Compiler to evaluate performance more accurately, the
performance result shows in Table 4.
Table 3.
The performance of reconfigurable nonlinear Boolean function based on FPGA
Device
EP2S180F1020I4
Table 4.
Maximum clock
variables
frequency
40
233 MHz
172
60
158 MHz
326
80
125 MHz
498
ALUT
The performance of reconfigurable nonlinear Boolean function based on ASIC
Number of
variables
4.2
Number of
Constraint
Area
Combinational
Non combinational
Delay
Slack
40
5 ns
228734
6896
3.22 ns
+0.87
60
5 ns
447468
10032
3.89 ns
+0.66
80
5 ns
603218
14783
4.02 ns
+0.36
Contrasts with Other Designs
Based on the synthesis result above, we make a comparison with the structure
of reconfigurable nonlinear Boolean function with the structure of CPLD and
FPGA which can realize the nonlinear Boolean function too, as to there are two
critical parameters including area and latency in the synthesis result, so we list the
area and latency of these three structures as shown in Fig. 5 and Fig. 6.
142
S. Yang
80bit
FPGA_NBF
60bit
CPLD_NBF
Our Design
40bit
0
200000
400000
600000
800000
1000000
1200000
Fig. 5. The area comparison with other designs
80bit
FPGA_NBF
60bit
CPLD_NBF
Our Design
40bit
0
1
2
3
4
5
6
7
Fig. 6. The latency comparison with other designs
The comparison result shows that when the number of variables is 40, the area
resources occupied of reconfigurable nonlinear Boolean function are 230
thousand gates, and the latency is 3.22 ns, which has been improved greatly
compared with other designs. Meanwhile, with the increase of the number of
variables, the advantages of our design are more obvious.
5
Conclusion
This paper presents a realization of high speed reconfigurable nonlinear
Boolean function, which can satisfy random level, arbitrary variables and any
forms of nonlinear function of stream cipher algorithms. In view of the low order
AND terms, the optimization scheme is proposed based on the implementation of
Design of a Reconfigurable Parallel Nonlinear Boolean Function …
143
LUT structure, which makes it more suitable for the structural characteristics of
the nonlinear function; In the light of high order AND terms, an optimization
scheme based on tree network is proposed; The final output network uses the tree
like structure to improve the computing speed. Synthesis, placement and routing
of reconfigurable design have accomplished on 018mm CMOS process.
Compared with other designs, the result proves our design has an obvious
advantage at the area and latency.
Acknowledgments. This work was supported in part by open project foundation of
State Key Laboratory of Cryptology; National Natural Science Foundation of China
(NSFC) under Grant No. 61202492, No. 61309022 and No. 61309008;
References
1. Barenghi A, Pelosi G, Terraneo F. Secure and efficient design of software block cipher
implementations on microcontrollers [J]. International Journal of Grid & Utility Computing,
2013, 4(2/3):110-118.
2. Chengyu Hu, Bo Yang, Pengtao Liu:Multi-keyword ranked searchable public-key
encryption. IJGUC 2015, 6(3/4): 221-231.
3. Tian H. A new strong multiple designated verifiers signature [J]. International Journal of
Grid & Utility Computing, 2012(3):1-11.
4. Yuriyama M, Kushida T. Integrated cloud computing environment with IT resources and
sensor devices[J]. International Journal of Space-Based and Situated Computing, 2011, 5(7):
11-14.
5. Iguchi N. Development of a self-study and testing function for NetPowerLab, an IP
networking practice system [J]. International Journal of Space-Based and Situated
Computing, 2014, 8(1): 22-25.
6. Xueyin Zhang, Zibin Dai, Wei Li, etc. Research on reconfigurable nonlinear Boolean
funcitons hardware structure targeted at stream cipher [C]. 2009 2nd International
Conference on Power Electronics and Intelligent Transportation System. 2009: 55-58.
7. Ji Xiangjun, Chen Xun, Dai Zibin etc. Design and Realization of an Implementation
hardware with Non-Linear Boolean Function [J]. Computer Application and Software, 2014,
31(7): 283-285.
Temporally Adaptive Co-operation
Schemes
Jakub Nalepa and Miroslaw Blocho
Abstract Selecting an appropriate co-operation scheme in parallel evolutionary algorithms is an important task and it should be undertaken with care. In
this paper, we introduce the temporally adaptive schemes, and apply them in
our parallel memetic algorithm for solving the vehicle routing problem with
time windows. The experimental results revealed that this approach allows
for retrieving better solutions in much shorter time compared with other cooperation schemes. The analysis is backed up with the statistical tests, which
gave the clear evidence that the results are important. We report one new
world’s best solution to the benchmark problem obtained using our adaptive
co-operation scheme.
Key words: Parallel algorithm; co-operation; memetic algorithm; VRPTW
1 Introduction
Solving rich vehicle routing problems (VRPs) is a vital research topic due
to their practical applications which include delivery of food, beverages and
parcels, bus routing, delivery of cash to ATM terminals, waste collection,
and many others. There exist a plethora of variants of rich VRPs reﬂecting
a wide range of real-life scheduling scenarios [6, 19]—they usually combine
multiple realistic constraints which are imposed on feasible solutions. Although exact algorithms retrieve the optimum routing schedules, they are
Jakub Nalepa
Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice,
Poland e-mail: jakub.nalepa@polsl.pl
Miroslaw Blocho
Institute of Informatics, Silesian University of Technology, Akademicka 16, 44-100 Gliwice,
Poland e-mail: blochom@gmail.com
© Springer International Publishing AG 2017
F. Xhafa et al. (eds.), Advances on P2P, Parallel, Grid, Cloud
and Internet Computing, Lecture Notes on Data Engineering
and Communications Technologies 1, DOI 10.1007/978-3-319-49109-7_14
145
146
J. Nalepa and M. Blocho
still very diﬃcult to exploit in practice, because of their unacceptable execution times for massively-large problems. Therefore, approximate algorithms
became the main stream of research and development—these approaches aim
at delivering high-quality (however not necessarily optimum) schedules in signiﬁcantly shorter time. In our recent work [14], we showed that our parallel
memetic algorithm (PMA–VRPTW)—a hybrid of a genetic algorithm and
some local reﬁnement procedures—elaborates very high-quality schedules for
the vehicle routing problem with time windows (VRPTW). Although PMA–
VRPTW was very eﬃcient, selecting the appropriate co-operation scheme
(deﬁning the co-operation topology, frequency and strategies to handle emigrants/immigrants) is extremely challenging and time-consuming—the improper selection can easily jeopardize the PMA–VRPTW capabilities.
1.1 Contribution
We propose two temporally adaptive co-operation schemes in PMA–VRPTW.
In these schemes, the master process samples several time points during
the execution, and monitors the search progress. Based on this analysis, the
scheme is dynamically updated to balance the exploration and exploitation
of the solution space, and to guide the search process as best as possible.
Our experiments performed on the well-known Gehring and Homberger’s
benchmark (in this work, we consider all 400-customer tests with wide time
windows, large truck capacities, and random positions of the customers, which
appeared very challenging [14]), revealed that the new temporally adaptive co-operation schemes allow for retrieving better solutions quickly (the
diﬀerences are statistically important), compared with other means of cooperations. We report one new world’s best solution elaborated using the
new scheme. It is worth mentioning that such temporally adaptive strategies
of establishing the desired co-operation schemes have not been intensively
studied in the literature so far, and they may become an immediate answer
to the problems which require the parallel processes to co-operate eﬃciently
to guide the search process towards high-quality solutions quickly.
1.2 Paper Structure
This paper is structured as follows. Section 2 describes the VRPTW. In Section 3, we review the state of the art on the VRPTW. PMA–VRPTW is
brieﬂy discussed in Section 4. In the same section, we present the temporally adaptive co-operation schemes, which are the main contribution of this
work. Section 5 contains the analysis of the experimental results. Section 6
concludes the paper and serves as the outlook to the future work.